Autotools and Rust

One of the first tasks for my project (GSoC, Rustify GJS) was simply to get Rust building alongside the C++ code using autotools. To do so I had to learn some of the autotools suite, and how to write the configuration and makefile input.

I can tell you honestly that I'm not a fan of autotools after this. Sure, it does the job, but the insane amount of macros used for setup/configuration and so on is mind-bending.

Rust and compilation

Building with autotools Optimization

There are a few ways to compile Rust, each has pros and cons depending on your end goal. Example use cases for Rust are;

  • Embedded controllers
  • Application development
  • Libraries
  • Embedding in other languages

There are many more use cases than the above of course, but these ones will cover the examples I want to show here. I'll start with the more simple use case, that of compiling Rust on its own for an application.

Compiling Rust Code

This is dead simple, but! There are two ways to do so.

Cargo is the standard way to create and build Rust software. It performs a rather lot of functions: create new projects, compilation, testing, benchmarking, documentation generation, publishing projects as crates, and a few more.

A Quick Binary

Lets create a new Rust project, run; cargo new --bin hello_rust, this creates a new cargo project in a sub directory of the current directory with the name hello_rust which is a binary. The directory structure is:

.
├── Cargo.toml
└── src
    └── main.rs

Rust has also helpfully created an fn main() which prints "Hello, world!". So lets compile it with cargo build. Cargo by default builds a debug version of everything since this is the most commonly requested mode. To build a release version run cargo build --release.

You can also compile with rustc main.rs. However if you use rustc on its own to compile, you will need to do a lot of extra stuff manually such as adding the compiler flags that cargo build --release adds if you want an equivalent release build; this is generally rustc -C opt-level=3 -C debuginfo=0. Using rustc on its own will get pretty harsh once you start to include external crates, linking other libs and so on, so for the rest of this post I will focus on using only cargo since it handles a lot of stuff for us in the background, but where it may be instructive I will include equivalent rustc commands.

Building a Library

Rust libraries and the integration of a rust lib in to C++ (or any other language) is the focus of my project, so lets get started!

The project I'm going to use as an example will use autotools to control compilation, and use both C++ and Rust, with C++ having the main call point, and both languages calling functions plus passing variables to each other.

Start by creating a directory to store the project in, and in that, create an src directory;

In src/ create a main.c with the following content;

#include <stdio.h>

extern void hello_world(); // declare the Rust function
int main(void)
{
    hello_world();
}

This is fairly standard for C, what we're interested in though is the declaration of the Rust function; extern void hello_world(). The extern here tells the compiler that what follows is a declaration only, and not to allocate memory for it as it will be found elsewhere at link time. In other words; this is declared, but not defined - it is defined somewhere else. In our case, it will be defined in the Rust source which will then export the symbol (compiled function definition) at compile time so that it can be linked.

Change in to src create a new Rust project using cargo new --lib rs_hello --name rs_hello. This creates our project under rs_hello with a Cargo.toml, and src/lib.rs, and names it. The source file contains only a simple test to run, and no functions or other code. You can erase or leave the test code there, it won't affect anything being done in this post, but it is good to learn how rust tests are built and run.

In src/rs_hello/src/lib.rs add the function that we declared in the C source;

#[no_mangle]
pub extern "C" fn hello_world() {
    println!("hello world!");
}

It's as simple as that, but there are two things to note;

  • #[no_mangle] - this tells the rust compiler not to generate a hash of the function name.
  • pub extern "C" - here we're declaring that the function is publicly accessible (pub), and is being exported to the C calling convention.

You can compile and run this right now if you wanted to, in the base run;

rustc --crate-type staticlib -o librs_hello.a src/rs_hello/src/lib.rs &&
gcc -o hello src/main.c librs_hello.a -ldl -lrt -lpthread -lgcc_s -lc -lm -lrt -lutil

Running cargo build on a new library project will by default produce a rustlib, .rlib, which is not linkable to external non-rust source, open src/rs_hello/Cargo.toml and append to the end;

[lib]
crate-type = ["staticlib"]

Using cargo build in src/rs_hello will produce the static link library in src/rs_hello/target/debug by default, and to link with the main.c just prepend the path to librs_hello.a.

Note: libraries built with cargo will have lib prepended to their name.

Building with autotools

Top Optimization

Now, on to autotools!

We will need two files in the base directory: configure.ac and Makefile.am. The content of configure.ac is;

AC_PREREQ([2.60])

AC_INIT([rust_hello], [0.1])
AM_INIT_AUTOMAKE([1.6 foreign subdir-objects])
m4_ifdef([AM_SILENT_RULES], [
    AM_SILENT_RULES([yes])
])

AC_CANONICAL_HOST

AC_PROG_CC_C99
AM_PROG_CC_C_O

AC_PATH_PROG([CARGO], [cargo], [notfound])
AS_IF([test "$CARGO" = "notfound"], [AC_MSG_ERROR([cargo is required])])

AC_PATH_PROG([RUSTC], [rustc], [notfound])
AS_IF([test "$RUSTC" = "notfound"], [AC_MSG_ERROR([rustc is required])])

LT_INIT

AC_CONFIG_MACRO_DIRS([m4])

AC_CONFIG_FILES([
  Makefile
])

AC_OUTPUT

As far as I can tell (and I'm absolutely not an autotools expert here) this is fairly standard for an ultra basic configure.ac. We're only going to be focusing on the relevant rust bits however, as that is what makes our build tick.

AC_PATH_PROG([CARGO], [cargo], [notfound]) is a macro (AC_PATH_PROG) that checks if a program (cargo) exists, and stores it in the variable [CARGO], if it doesn't exist it stores notfound in the variable.

AS_IF([test "$CARGO" = "notfound"], [AC_MSG_ERROR([cargo is required])]) tests the variable CARGO, and checks if the content matches "notfound", if it does then it calls the error print macro AC_MSG_ERROR.

The content of Makefile.am is;

ACLOCAL_AMFLAGS = -I m4

RSHELLO_DIR = src/rs_hello
RSHELLO_TARGET = $(RSHELLO_DIR)/target/release

bin_PROGRAMS = hello_rust
hello_rust_SOURCES = src/main.c
hello_rust_LDADD = $(RSHELLO_TARGET)/librs_hello.a
hello_rust_LDFLAGS = -lrt -ldl -lpthread -lgcc_s -lpthread -lc -lm -lrt -lutil

$(RSHELLO_TARGET)/librs_hello.a:
    cd $(srcdir)/$(RSHELLO_DIR); \
    $(CARGO) rustc --release -- \
    -C lto --emit dep-info,link=$(abs_builddir)/$@

clean-local:
    cd $(srcdir)/$(RSHELLO_DIR); cargo clean

Again, a fairly standard layout. bin_PROGRAMS declares the name of our program, and the lines beginning with hello_rust_ declare much of the same stuff that we used for the gcc command above. We haven't included the rust source on the SOURCES line however since autotools is geared towards compilation of C/C++.

How does it build the rust source then? It looks at

hello_rust_LDADD = $(RSHELLO_TARGET)/librs_hello.a

and sees that it needs librs_hello.a in the src/rs_hello/target/release directory then looks for the relevant commands to build that if it doesn't exist'. That's where $(RSHELLO_TARGET)/librs_hello.a: comes in to play. This is a pattern that make matches against which basically says "for any file named librs_hello.a in directory src/rs_hello/target/release, perform the following operations";

  • cd in to $(srcdir)/%(RSHELLO_DIR) - srcdir is a variable that Make sets to pwd, and RSHELLO_DIR is the variable we set near the top of the file.
  • run cargo, which is contained in the variable CARGO with the following arguments;
    • rustc --release - instructs cargo to use the rustc option, which allows us to pass arguments to rustc, and uses the "release" profile.
    • -- arguments to rustc begin.
    • -C lto - this is not a default option in --release mode. lto is "link-time optimization".
    • --emit dep-info,link=$(abs_builddir)/$@ breaks down to;
      • --emit output the following,
      • dep-info, tells us what libraries you need to link to the output,
      • link, a compiled binary with the rustlib linked in,
      • =$(abs_builddir)/$@ output the link files to the builddir (generally the base dir of the source if not set), $@ is a macro the autotools uses which passes in the file name that is before : - $(RSHELLO_TARGET)/librs_hello.a

The last block, clean-local: run along with the usual clean with make clean, since rust and cargo place files in different locations to what autotools expects, we need to clean up manually. This cds in to the cargo project and run cargo clean.

With those two files done, you now need to run autoreconf -si to generate all the files needed. Then run ./configure followed by make.

Congratulations! You've built a Rust library used by a C program, using autotools. So with that groundwork out of the way, lets dive a little deeper.

Types of Libraries

You'll recall that above we had to pass in the staticlib option to rustc and add to the Cargo.toml for use with cargo. This is because rust builds rust libraries (.rlib) by default which are native to rust only. The format of these is still unstable afaik, and may change between rust versions. They also include extra metadata for rust, and don't require the use of unsafe blocks when you want to use functions/data from them. This cannot be used with other languages.

For this reason we need the staticlib option. This produces a static library which contains all the rust projects generated code and its upstream dependencies. As such it will not have external dependencies on Rust libraries.

There are other options too!

dylib produces a dynamic rust only library. This can be used with other languages at the moment but will eventually be used for Rust only. The file extension is *.so on Linux. You should probably avoid using this altogether and use either lib for Rust libraries, or one of the below for external use cases.

cdylib is a dynamic library which is a newer output format introduced in rust v1.10 specifically for use with embedding in other languages. It exports public Rust symbols as a C API using C calling conventions. This is meant to be linked in to binaries that use it, at run time, this typically uses a system linker mechanism. The file extension is *.so.

staticlib is meant to be compiled and linked in to other projects statically - this means it is copied in to the binary that uses it, at compile time. Suitable for embedding in other languages. File extension is *.a.

lib is default, and will be whatever Rust needs it to be to produce a compiler recommended Rust library.

rlib is a static Rust library.

A small note: if you were to produce a library for use with other Rust projects, you should use the default lib. If you use cdylib or staticlib, Rust projects will need to use unsafe blocks.

Static vs Dynamic Linking

Linking on Linux is typically done using ld, and is the last step of compilation. If you run man ld to view the man page for it, the first sentence of the description states;

ld combines a number of object and archive files, relocates their data and ties up symbol references.

This gives a pretty good idea of what linking is. When building a typical C/C++ program, the compiler will compile each source file to can object file, then as the last step it will invoke ld to tie them all together.

Each declared function or data structure in one source file that is meant to be public to another source file (as in our example above, pub extern "C" fn foo()) is exported and exposed as a symbol. When another source file references this function, the linker looks for the related symbol and links them together.

The way linking is done for static vs dynamic is different.

  • static linking replaces all references to external symbols in a compiled object with the actual code needed at compile time
  • dynamic linking will instead put a reference to the library being linked to in the compiled binary/library, and will not link to it until runtime. A dynamic library can be shared between many programs.

Rust by default static links all Rust dependencies including the Rust std library, as in, it copies in parts of the libraries where it is used.

If you create a library using dylib or cdylib, that library is dynamically linkable to other projects, and also static links the Rust std library. Whereas if you create a staticlib, that library is copied in to other projects that use it (along with the Rust library parts it contains).

Rust will however, dynamic links system libraries such as libc and pthreads. You can static link system libraries if you use an alternative libc such as musl. Read more here

Rust and Objects

In Types of Libraries we outlined a few types of libraries that Rust can build - we can also output object files much like C/C++ compilation does. This can complicate things a bit though and I won't go in to much detail here except to outline it. If you did want to output objects for linking, then you will be losing the benefit of cargo handling linking for you - this means you need to manually link any Rust libraries you depend on.

When you're dealing with library names such as /usr/lib/rustlib/x86_64-unknown-linux-gnu/lib/libstd-912d6e6c7cbc93f3.so, well, I don't recommend forgoing cargo unless you really need to. Unusual filename right? This hash will change with each distribution that compiles the Rust compiler from scratch and so is a bad idea to hardcode the name in to build scripts. Whence why sticking to cargo is a good idea.

Another use-case is using bare metal rust on an embedded controller. Bare metal Rust is Rust code only, no standard lib, no libraries that may require external dependencies - this makes it much easier to deal with linking.

Rust is not ABI stable

Rust does not have a stable ABI as of yet, and may not do for some time. What this means to us in terms of linking is that a project that dynamically links the Rust std library will only work with the library that it was compiled with. This shouldn't be an issue with Linux distribution supplied Rust, but if you switch between [Rustup][rustup] and distro supplied, it likely won't work with one.

Optimization

Top Building with autotools

Now that we have covered what types of libraries there are, lets have a look at way's to optimize Rust.

In all honesty, there isn't that much you need to do - default --release produces very fast binaries with the following defaults;

  • opt-level = 3
  • debug = false
  • lto = false
  • panic = 'unwind'

But there are some things you can do such as reducing size via link-time optimization, using a different allocator, and a few other tricks.

Covering how LTO works in detail is well beyond my abilities, but I may be able to adequately simplify it; at compile time the objects produced consist of everything that may be used, eg, all of a library. For C, the first pass of a linker may find that function foo() is not actually used, and so it is removed from the object, a second pass may find some condition is always false and so bar() is never called, on a third pass since fizz() was being called by bar(), and bar() was removed, fizz() is no-longer called and so is removed too.

Using LTO with Rust works similar to this, it will find all the functions etc that are never called and remove them - this results in a very nice size drop. Once of the differences between Rust and C here, is that Rust will warn you that a block of code isn't reachable (the compiler treats it as an error if it is a pattern matching block) and implores you to remove it.

So how do you use LTO? Two ways;

  • pass -C lto to rustc as an arg, or via cargo rustc --release -- -C lto if using cargo
  • or, (also for cargo) add a section in the Cargo.toml as follows;
[profile.release]
lto = true

[profile.debug]
lto = true

Currently for the small amount of Rust I have in GJS so far, using LTO reduces the size of libgjs.so from 12mb to 7.7mb - quite a decent saving.

Another way to reduce the final size is with the use of [strip][strip] - a tool used to remove symbols from a binary/object. Handy also for making reverse-engineering harder (you'll probably never stop Matthew Garrett though).

Running strip on libgjs.so with my Rust code compiled in without LTO reduces this size down to 1.9mb. Using LTO and strip reduces it to 912K.

The usual way to use strip is to remove only the debug symbols, via strip --strip-debug, running this on libgjs.so along with LTO reduced the size from 7.7M to 926K.

This step is typically performed by Linux distributions as part of their packaging process - they strip the debug symbols out to a separate file/s and package these alongside the stripped binary/library. The end user doesn't require them normally.

You can pass a strip argument to rustc with

rustc -C link-args=-s

If you are using cargo this would be

cargo rustc --release -- -C link-args=-s

The last thing we can try is changing how panics are handled. The default handling for a panic is to include code to unwind the stack to help debugging. We can remove the code for unwinding, and just abort by passing an arg to rustc;

rustc -C panic=abort

or with cargo, add panic = "abort" to the relevant profile section. The saving here isn't all that much though, ~100K, but this may be useful for embedded devices etc.

Finally

In light of all the testing and getting to grips with autotools and how various bits of the Rust compiler work, I've decided for the "Rustify GJS" project to use basically what is covered in the examples.

  • the default args for the --release are quite adequate
  • to reduce final size I have used lto
  • stripping is to be left to distributions
  • static linking the rust code in will be best to keep libgjs.so whole.

And one last thing: You can pass global args by the RUSTFLAGS environment variable, such as RUSTFLAGS="-C lto -C panic=abort" cargo build, I will likely switch to this method at some point. The RUSTFLAGS env-var also means that Rust crate dependencies also use these flags, where without the env-var set, they use the rust defaults.

Please email me if you see anything factually inaccurate that needs correction, or even just better explanations.

Note: Makefiles require the use of actual tabs, not spaces.

TODO: Parallel build fails due to Rust not finishing build before C++ linking.