Google Summer of Code 2017

It's shaping up to be a busy year for me. I'm in my final year of study (Bachelor of Software Engineering), I have a few personal projects, I also need to start looking for work or creating a job for myself.

GSoC

Oh, and I also made it in to the Google Summer of Code to work with the excellent GNOME which I am incredibly happy about. My GSoC project is this;

The research and implementation of Rust language in to GJS to help reduce or eliminate memory leaks and increase memory safety.

Philip Chimento is the wonderful bloke who maintains GJS, and will be mentoring me throughout my journey. I can't show my appreciation to Philip enough for this opportunity.

GJS is a JavaScript binding for use within GNOME projects (and uses Mozilla SpiderMonkey for the JS engine), of which quite a few use it: gnome-documents, gnome-maps, gnome-shell, polari, sushi, to name a few. Whence it is a critical part of the infrastructure and must be a dependable and reliable tool. As with many C/C++ projects, it has suffered from the seemingly inevitable pointer misuse in the past, such as failing to free after use (resulting in memory leaks).

And that is where I'm hoping the benefits of Rust will shine. I'll try to summarise Rust - not as easy a task as it seems once done.

The Rust ownership system and borrowing

A quick outline of some of the rules that Rust enforces on us via the compiler;

  • variables are immutable by default
  • there can be any number of references (analogous to C pointers) to variables as long as the variable is either
    • immutable
    • has no mutable references
    • is in scope
  • a variable may only have one mutable reference and no other references
  • variables are dropped when they go out of scope
  • null does not exist, variables must be initialised to a value on creation
  • when passing a variable to a function, the function takes ownership
    • Unless you pass the function a reference instead (think, C pointer)

These rules are enforced at compile time. If for example you created a mutable variable and then took out a mutable reference to it followed by an immutable reference (or any new ref), such as;

let mut two = 2;
let mut ref_two_1 = &mut two; // mutable reference
let ref_two_2 = &two; // can’t take a reference as the above ref is mutable

Rust will complain with;

error[E0502]: cannot borrow `two` as immutable because it is also borrowed as mutable
 --> <anon>:4:18
  |
3 | let mut ref_two_1 = &mut two; // mutaerror[E0502]: cannot borrow `two` as immutable because it is also borrowed as mutable
 --> <anon>:4:18
  |
3 | let mut ref_two_1 = &mut two; // mutable reference
  |                          --- mutable borrow occurs here
4 | let ref_two_2 = &two; // can’t take a reference as the above ref is mutable
  |                  ^^^ immutable borrow occurs here
5 | }
  | - mutable borrow ends herble reference
  |                          --- mutable borrow occurs here
4 | let ref_two_2 = &two; // can’t take a reference as the above ref is mutable
  |                  ^^^ immutable borrow occurs here
5 | }
  | - mutable borrow ends her

Basically you can't mix mutable with immutable within the same scope. This very nicely prevents many bugs such as data-races; where a variable is modified while a pointer is pointing at it. I should also explain that in Rust, when you reference a variable it is called "borrowing" - as in you aren't moving the variable or copying it, you are simply borrowing a reference to it. You can of course scope the mutable reference so that it is dropped before the next reference is made;

let mut two = 2;
{
    let mut ref_two_1 = &mut two; // mutable reference
} // The mutable ref is dropped here
let ref_two_2 = &two; // we can now make a new ref
A little more about scope and memory management

In Rust, when you create and initialise a variable, it is allocating the memory required to hold the variable and then storing the required value in that memory (Rust uses jemalloc by default btw) - in C this would be two operations. Then, when the variable goes out of scope, it is dropped much like manually using free() in C.

Note: a variable can't go out of scope if it is borrowed, and rust won't let you borrow (reference) a variable unless it is in scope - the compiler checks this at compile time and you will be moaned at if you try - this prevents use-after-free, and dangling pointers.

Copy vs Move

There is another subtlety with Rust; primitive types are Copy by default, and others are 'moved'. This means that types such as i32, u64, char, u8, bool etc all implement the Copy trait - so when you create a new binding to an existing variable, it is copied, not moved;

let mut a = 7;
let b = a; // a is copied
a = a*2;
println!("{:?}, {:?}",a,b) // prints "14, 7"

This is versus a type such as a Vector which does not implement the Copy trait - let mut a = vec!(7); gives the following error;

error[E0382]: use of moved value: `a`
 --> <anon>:4:5
  |
3 |     let b = a;
  |         - value moved here
4 |     a[0] = a[0]*2;
  |     ^ value used here after move
  |
  = note: move occurs because `a` has type `std::vec::Vec<i32>`, which does not implement the `Copy` trait

This combined with Rust doing most allocations on the stack by default means that many standard operations are very fast. It also means that freeing variables typically happens in reverse. Rust can also allocate on the heap via use of std::boxed::Box, I'll leave the details of this for a later post.

I'm going to cut this introduction to Rust and how it works a bit short as it would be very easy to spin it in to a more detailed tutorial. Hopefully I've included enough detail for the reader to get a feel for how Rust operates, without becoming overwhelmed.

The reason I've tried to write this section is to shed some light on why I think Rust is worth investigating in terms of memory leaks, memory security, and other (standard) C mistakes such as dangling pointers.

The take-away here should be that the Rust compiler enforces rules that help to prevent those issues - it makes it virtually impossible to introduce the usual bugs, and should help the management of memory by taking that responsibility out of the programmers hands somewhat. That's not to say that Rust is a magic bullet, far from it.

The Goal?

Primarily to see what benefits can be gained from using Rust to replace sections of C code that is prone to memory management bugs (usually introduced by us humans).

It's a lofty goal, and will have many interesting challenges along the way. Some of these challenges will be;

  • Using the Rust compiler with an autotools toolchain
  • Collection of metrics and which metrics to use
  • Translating C/C++ code to Rust
  • Resolving any possible problems with Rust and GObject, if code is written that requires use of GObject.
  • Linking C/C++ and Rust generated code

Fortunately for me, some others have already trialled and are using Rust within some GNOME projects, or are working on improving Rust and GObject.

So many thanks over to the people above for their efforts. Your posts will be invaluable.