Journey with Rust Part 4: First boss fight – fat pointer

Human asked: how can raw pointer be 16 bytes – that makes no sense. It should be just a normal pointer no?
Toaster thought for 20s and replied: Yeah, this is one of those “Rust is doing what?!” moments…

Intro

For a C++ programmer, learning Rust is as much fun as learning to ride a bicycle* – once you understand that assignment means move, everything starts rolling smoothly. Until one day when you encounter a Box inside a Box:

let inner: Box<dyn Debug> = Box::new(42);
let outer: Box<Box<dyn Debug>> = Box::new(inner);

You might think that winning a few battles against the compiler made you understand the language. Well, it didn’t. This is the moment you realize how much you don’t know, and that skipping all those pages of the user manual may not have been the best idea after all.

Congratulations: you’ve reached the point where the Rust journey starts to be really interesting… and dangerous. Now, let’s climb back inside the box.

The Simple View: Box as Dynamic Allocation

A Box is described as a way to store data on the heap – and for a long time that’s exactly how I treated it. Something like memory allocation (new in C++) combined with unique pointer in one single concept. Meaning this:

let boxed_int = Box::new(42);

Is equivalent to this:

auto ptr = std::make_unique<uint32_t>(42);

In both cases, you create an object that owns a pointer to a heap-allocated integer.

Simple.

But there is more in a Box…

Because Box can do more than simply allocate memory. What it stores depends on the type you put inside it. To keep things simple, let’s focus on one use case: dynamic polymorphism, aka trait objects in Rust.

We all know how this works in C++. Everyone has heard of the vtable (and if not, here’s a good explanation: vtable-and-vptr). Whenever a class uses virtual functions, the compiler generates a table of function pointers and places it somewhere in the binary. Each instance carries a hidden vptr pointing to that table. All invisible thanks to compiler magic.

Rust takes a slightly different approach. The vtable still exists, but the pointer to it does not live inside the object itself. Rust follows the “don’t pay for what you don’t use” principle: plain data stays plain and carries no hidden fields. As a result, when we use dynamic dispatch, Rust builds a special kind of pointer – a fat pointer – that contains both the data address and the vtable pointer. You can see this clearly if you inspect one:


And that explains why we sometimes end up with a Box inside a Box.

Because a Box<dyn Trait> is itself a fat pointer, and when we want to pass something that looks like a single thin pointer (for example to C code), we need to heap-allocate the inner trait object so the outer Box can remain thin. One Box holds the data; the other holds the fat pointer describing how to use it.

And that leads us straight to the next topic.

Fat pointers can be dangerous

Why? Because it’s very easy to accidentally destroy the metadata that makes them work.

Consider this code:

// Create trait object
let trait_object: Box<dyn Drinkable> = Box::new(Beer::new("IPC", 4.5));
println!("Size of trait_object: {}", std::mem::size_of_val(&trait_object));

// So far so good - we can drink our beer
trait_object.drink();

// Convert trait object to raw pointer
let beer_ptr = Box::into_raw(trait_object);
println!("Size of beer_ptr: {}", std::mem::size_of_val(&beer_ptr));

// Store the raw pointer as a void pointer (not good)
let c_ptr = beer_ptr as *mut ::std::os::raw::c_void;
println!("Size of c_ptr: {}", std::mem::size_of_val(&c_ptr));

// ... part below might sit megabytes of code away

// Cast the void pointer back to a trait object pointer (function expects thin pointer)
let bad_beer = unsafe { Box::from_raw(c_ptr as *mut Box<dyn Drinkable>) };
println!("Size of beer_ptr_2: {}", std::mem::size_of_val(&bad_beer));

bad_beer.drink();

Not good. Drinking last beer crashes the whole universe.

A Box<dyn Drinkable> is represented as a fat pointer(16 bytes on a 64-bit machine) that holds both a data pointer and a vtable pointer. When we call Box::into_raw, we get a raw pointer of type *mut dyn Drinkable which is still fat (16 bytes) and not just single memory address as one could expect.

The moment we cast it to *mut c_void, we throw away half of that information: the vtable pointer is gone, and only the data address remains. The compiler and Clippy are both fine with this – the cast is legal – but there is no magic that keeps the vtable ptr alive somewhere.

And when we later try to use that thin pointer as if it were still a fat one, very bad things happen.

Happy ending

There sits a big, fat lie in the example above. When we cast the C pointer back to a Box, we do this as if the original fat pointer had been wrapped inside a thin one – that’s why we cast to *mut Box.

The good news is that Rust will not let us cast directly to *mut dyn Drinkable. The compiler knows you can’t magically recreate a fat pointer out of 8 bytes (ask your toaster for std::mem::transmute if you want to see proper way to do this). In other words: Rust refuses to fabricate the missing vtable pointer. So we are partially saved.

Partially – because once everything “looks fine”, someone might decide that a Box inside a Box is one Box too many (“raw pointers are just pointers, right?”). One box removed, one universe destroyed.

The happy part? In 99% of real-world Rust code, nobody deals with these problems.
And if someone does… well, they knew what they signed up for.

Toaster last words

“Rust will protect you from yourself…
until you insist otherwise.
After that, it politely steps aside and lets physics handle the rest.”

Now that we’ve learned the secret art of shooting ourselves in the foot, we can ‘safely’ move on with our Rust adventure. The journey continues…

* ok – its like pedaling uphill on a bumpy road with ducks wandering in front of you every 10 seconds. No one ever said riding a bike was pure pleasure.

Journey with Rust Part 3: small dive into attribute macros

Important note: If you are looking for a comprehensive guide into Rust macros, please keep on searching – this one is just a quick glimpse at what sits under the hood of the #[] syntax. One who wrote it has no real experience or knowledge. All he has is his keyboard, google search engine and his faith that one day he will reach the zen state of coding.

The goal

Today’s goal: to create a macro that will reverse the name of any function (yes it is possible!) and inject some extra code into its body. In short: make the following code compile.

#[reverse_name(test)]
fn rust_is_fun() {
    println!("Called by function");
}

fn main() {
    nuf_si_tsur();
}

The solution

The code presented below does exactly what we need. The whole project can be found here.

use syn;
use proc_macro::TokenStream;
use proc_macro2::Span;
use quote::quote;


#[proc_macro_attribute]
pub fn reverse_name(attr: TokenStream, item: TokenStream) -> TokenStream {

    // turn TokenStream into a syntax tree
    let func = syn::parse_macro_input!(item as syn::ItemFn);

    // extract fields out of the item
    let syn::ItemFn {
        attrs,
        vis,
        mut sig,    // mutable as we are going to change the signature
        block,
    } = func;

    let name = (format!("{}", sig.ident)).chars().rev().collect::<String>();
    sig.ident = syn::Ident::new(&name, Span::call_site());

    let item_str = attr.to_string();

    let output = quote! {
        #(#attrs)*
        #vis #sig {
            println!("Injected: {}", #item_str);
            #block
        }
    };

    // See the body of our new function (printed during build)
    println!("New function:\n{}", output.to_string());

    // Convert the output from a `proc_macro2::TokenStream` to a `proc_macro::TokenStream`
    TokenStream::from(output)
}

Only a few copy-paste actions, some glue code here and there, and done. But what exactly have I done?

WTH have I done?

Not knowing why the code does not work is a bad thing, but not knowing why the code does work is even worse. Let’s try to figure out what exactly happened above.

We added some extra print function and while building the project we can see its output:

New function:
fn nuf_si_tsur()
{
    println! ("Injected: {}", "test") ; { println! ("Called by function") ; }
    println! ("Again injected: {}", "test") ;
}

So our compiler took the source code, found part marked with the reverse_name attribute, and fed it into our function replacing the original code with its output. In theory, we can manipulate the code in any crazy way we want (although I guess that black magic macros in Rust are just as bad as in C).

Q&A

Some questions arose when writing the code so it’s time to search for the answers.

1. Why do we need a separate proc-macro crate for macros?

As we saw our macro code was used to manipulate the code while performing the build. It means, that the functions need to be available to the compiler it starts its work. And since functions are written in Rust they must be available as binaries so we need to compile them in a separate module. Also, note that when doing a cross-compilation (eg. for ARM microcontroller) the macro code always needs to compile for your development, and not the target, machine. Another reason to keep it separated.

2. Why proc_macro and proc_macro2?

The proc_macro crate is the library that makes all the macro magic work. Proc_macro2 is “A wrapper around the procedural macro API of the compiler’s proc_macro crate.” This part is confusing but it looks like the proc_macro can’t be used by eg. syn crate and we need yet another crate redefining the same types (like Ident or Span). Something that might change in the future I guess but for now, we need both.

3. What is syn and quote?

Functions inside syn crate translate TokenStream into a syntax tree that represents any code construction present in the Rust language. In our example, the ItemFn structure holds all the parts that can be present in a free-standing function (parameters, name, body, etc.) Quote does the opposite – it translates syntax tree back into a token stream. It has a very interesting feature that allows writing a string that looks very similar to a code. Makes things more readable.

4. Can I debug a macro translated function

No. At least not without some extra effort. In theory, you could print (as we did in our example), copy-paste, and debug any function created by the macro engine. Another option would be to use a tool like cargo-expand that recursively expands all the macros used in the code.

Summary

Rust macros are a very powerful, and yet easy-to-use feature. I was using Python to generate C++ and C code for a long time but Rust sets new standards when it comes to code generation.

Journey with Rust Part 2: Unit testing

I won my first battles with Rust compiler, and now I have a code that is failproof. But having software that does not crash is not enough. I need to make sure, that my application does what it is supposed to do. Even more important, I need a guarantee that its behavior won’t change in the future, after all the refactoring, bug fixing, and adding new features. Also I need a magic force to drive my architecture (so I can add yet another cool abbreviation in my CV). What I need is Unit Testing.

Time for another adventure. Let the google search engine guide us on our journey.

Simple test

Our first goal is to test this piece of code (full source can be found here).

pub fn find_marmot(depth: u32) -> bool {
    depth > 20
}

A very nice surprise is that Rust already comes with support for unit testing so there is no need to install any external crate. Testing is as simple as adding few extra lines of code into the source file…

#[cfg(test)]
mod test {
    use super::*;

    #[test]
    fn when_search_not_deep_then_no_marmot_found() {
        assert_eq!(find_marmot(19), false);
    }
}

…and running the test target.

$cargo test
   Compiling marmot_test v0.1.0
    Finished test
     Running unittests

running 1 test
test marmot_hole::test::when_search_not_deep_then_no_marmot_found ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

Super easy but also super primitive*. It is not something that you could compare with a proper unit test framework like Google Test or Catch2. It looks more like a small addon that allows you to run multiple small programs and count the number of panics. No extra features like test fixtures, mocks or parameterized tests.

Less simple test

This is our following function to test:

pub fn chance_to_find_marmot(day_of_weeek: u8) -> f32 {
    match day_of_weeek {
        1 => 1.0,
        2 | 3 | 4 => 1.0 / 2.0,
        5 | 6 | 7 => 1.0 / 3.0,
        _ => panic!("Not a day of week")
    }
}

While I can test for panic with a should_panic attribute, there is no check dedicated for floating point results that would not suffer from floating point inaccuracy (see EXPECT_FLOAT_EQ from GTest). To add this functionality, we need to install assert_approx_eq crate and use it like this:

#[test]
#[should_panic]
fn when_invalid_week_day_should_panic() {
    chance_to_find_marmot(9);
}

#[test]
fn when_friday_than_0_33_chance_of_fiding_marmot() {
    let res = chance_to_find_marmot(5);
    assert_approx_eq!(res, 0.333, 0.01)
}

Parameterized tests

Searching for a way to do parameterized tests I encounter rstest crate. So it looks that I will have to install yet another external framework. With rstest installed I can write my test like this:

#[rstest]
#[case(1, 1.0)]
#[case(2, 0.5)]
#[case(3, 0.5)]
#[case(4, 0.5)]
#[case(5, 0.33)]
#[case(6, 0.33)]
#[case(7, 0.33)]
fn check_all_days(#[case] input: u8,#[case] expected: f32) {
assert_approx_eq!(expected, chance_to_find_marmot(input), 0.01)
}

It also supports test fixtures but does not offer anything more than that.

Mocking framework

Now we start moving even more uphill. There are so many different mocking frameworks for Rust that it is really hard to judge which one is the best. Luckily for us, someone has walked this path before and left this useful overview. Mockall has the greatest number of features and quick research reveals that it is the only framework that is still actively developed. So the choice is not so hard after all.

Let’s add a simple trait and see how we can mock it.

pub trait HidingPlace {
    fn has_marmot(&self) -> bool;
}

pub fn find_marmot_in(hiding_place: &dyn HidingPlace) -> bool {
    return hiding_place.has_marmot();
}

Following documentation I just do this:

use mockall::*;
use mockall::predicate::*;

#[automock]
pub trait HidingPlace {
    fn has_marmot(&self) -> bool;
}

And it works. I can now use MockHidingPlace structure in my tests. However, I don’t feel comfortable with polluting my code with test specific statements, so I will try to move it into the dedicated module. More doc reading and I found a way to do this:

#[cfg(test)]
mod test {
    use super::*;
    use mockall::*;
    use mockall::predicate::*;

    mock! {
        pub Hole {}
        impl HidingPlace for Hole{
            fn has_marmot(&self) -> bool;
        }
    }

    #[test]
    fn when_search_deep_then_marmot_found() {
        let mut mock = MockHole::new();
        mock.expect_has_marmot()
            .times(1)
            .returning(||true);
        assert_eq!(find_marmot(21, &mock), true);
    }
}

Summary

Setting up a unit test framework in Rust means adding specialized crates into your project. Each one brings single piece of functionality, like mocking or float asserts, and together they create a full testing environment. Not as perfect as solutions we know from C or C++ but good enough to test our code and move to the next chapter of our journey.



* In this blog post, Mozilla guys suggest using the standard test tool from Rust so maybe I just underestimate its potential

Journey with Rust Part 1: Errors everywhere

Working with Rust is an exciting adventure. One that starts quite grim: an evil compiler tries to cut your head off every time you write a single line of code. But as you move forward and obey few simple rules, things become better and less painful. Eventually, you master the language and reach a secret world of software perfection and beauty. And once you are there, you never want to go back…

That is what the Internet once told me, so I took my keyboard and my screen and started the journey on my own. First miles of code and I get attacked by errors from every possible direction.

Compilation errors

There is nothing more frustrating than being told what to do: teachers, parents, my wife, my boss… almost everyone. And now my compiler as well. “Consider removing this semicolon”. So I do. “Consider borrowing here”. So I do. Line after line just applying fixes until it finally shuts up and lets my program run.

Try to do this:

let mut array: [i32; 2] = [1, 2];
array[2] = 3;

and you get this:

error: this operation will panic at runtime

array[2] = 3;
^^^^^^^^ index out of bounds: the length is 2 but the index is 2

Try this:

let mut ARRAY: [i32; 2] = [1, 2];
ARRAY[1] = 3;

and you get this:

warning: variable `ARRAY` should have a snake case name

let mut ARRAY: [i32; 2] = [1, 2];
        ^^^^^ help: convert the identifier to snake case: `array`

Working with Rust compiler is an exceptional experience. You quickly start to understand, that the little gnome behind the screen* does not trust you at all. It checks every step you take, making sure you wont go off the beaten track. Very frustrating if you ever worked with C++ but this behavior was not added just to make your life miserable. There are some great benefits of working by the rules.

Memory safety (more compilation errors)

The biggest advantage of working with Rust is the memory safety. Forget about nullptr dereferencing, uninitialized variables, deadlocks, race conditions, and all those things that make software engineering such an “interesting” discipline. No more UFO bug tickets starting with “seen only once”. No more week-long investigations, no more running the same code for 1000th time, hoping that it will finally crash. Guess what will happen if I try to do this:

let i : i32;
println!("Here is some garbage: {}", i);

Exactly. The compiler gets angry and throws red errors at me. And it won’t stop until I agree to start coding as it wants (see point 1) and initialize variables before use. Let’s do something even more crazy and try spin multiple threads:

use std::thread;

fn main() {

    static mut NOT_PROTECED : i32 = 0;

    thread::spawn(|| {
        for _ in 0..10 {
            NOT_PROTECED += 1;
        }
    }).join().unwrap();

    println!("Val: {}", NOT_PROTECED);
}

Now the real battle begins. For every single fix there are two new errors. After one hour of pasting random stuff from Stack Overflow the compiler finally shuts up.

use std::thread;
use std::sync::atomic::{AtomicI32, Ordering};
use std::sync::Arc;

fn main() {

    let protected = Arc::new(AtomicI32::new(0));
    let result = Arc::clone(&protected);
    thread::spawn(move || {
        for _ in 0..10 {
            let val = protected.load(Ordering::Relaxed);
            protected.store(val + 1, Ordering::Relaxed);
        }
    }).join().unwrap();

    println!("Val: {}", result.load(Ordering::Relaxed));
}

Our gnome is maybe grumpy but it just saved us from some concurrency issues and, what is more important: possible future problems that would definitely arise as the software gets older and bigger.

Some magic

This is a good place to introduce a powerful magic spell: an unsafe keyword which tells the compiler: “trust me I know what I am doing”. It enables shooting in your own foot mode which we know so well form C++. No limits, no errors. This will compile just fine.

fn main() {

    unsafe {
        static mut NOT_PROTECED: i32 = 0;

        thread::spawn(|| {
            for _ in 0..10 {
                NOT_PROTECED += 1;
            }
        }).join().unwrap();

        println!("Val: {}", NOT_PROTECED);
    }
}

Very tempting, but writing your program inside a big unsafe clause is not considered the best Rust practice. This word should be used with external, well tested libraries – things that we know won’t break and won’t introduce an undefined behavior. In short – code that is not written by us.

Error handling (back to compilation errors)

Do you want to be famous? Of course you do. And there is no better way to become famous than to deadlock a space orbiter or blow up a nuclear power plant.

We already know that Rust won’t allow us to inject any nasty race condition but how about reading from a non existing file?

use std::fs::File;
use std::io::prelude::*;

fn main() {
    let mut new_command = String::new();
    let mut file = File::open("important_space_orbiter_instructions.txt");
    file.read_to_string(&mut new_command);
    println!("Space station please do: {}", new_command);
}

This wont fly. And I mean the code not the space station. The return type of file create is a Result. Result can be something (file handler in our case) or it can be an error (if the operation failed). And the best part is: you can’t ignore the error condition (guess who will complain if you do). A simplest way to make our code compile, is to panic in the case of failure. The expect keyword below will make our program terminate if the open won’t succeed.

fn main() {
    let mut new_command = String::new();
    let mut file = File::open("important_space_orbiter_instructions.txt").expect("Can't read file");
    if file.read_to_string(&mut new_command).is_ok() {
        println!("Space station please do: {}", new_command)
    }
}

But the real power of Result type comes with the “?” operator, which simply means – in case of error return error.

use std::fs::File;
use std::io::prelude::*;

fn read_command() -> std::io::Result<String> {
    let mut new_command = String::new();
    let mut file = File::open("important_space_orbiter_instructions.txt")?;
    file.read_to_string(&mut new_command)?;
    Ok(new_command)
}

fn main() {
    let command = read_command().expect("Can't get command")
    println!("Space station please do: {}", command);
}

This way the error handling logic does not pollute the normal execution path. Everything is nice, clean and safe. Thank you little gnome.

Summary

When working with Rust, you can get a feeling that the compiler is working against you and you need to fight it every time you want to have something done. But think about it this way: when going on a war, who would you prefer to be on your side? A grumpy guy who will point out every little mistake you make or a silent fellow that tells nothing even when you hold your rifle by the wrong end.

Last word

One important note to know when you start programming with Rust: assignment is a move operation.

*Compiler Gnomes – small creatures living On The Hardware Side. With their tiny binary axes they chop human readable text into pieces that are used to prepare a binary soup**

**Binary soup – soup that tells your computer what to do. Tastes like oranges