Benchmarking Mike Bland's OpenSSL Makefiles

Contents

Mike Bland has been hard at work refactoring the build system for OpenSSL. I wasn’t involved in developing these changes, but I do care about OpenSSL and the way the decision process happens there. I care about making sure its developers get the best bang for their buck out of their time, so that they can focus on the important issues.

As my small contribution to this effort, I ran benchmarks and did statistical analysis of Mike’s builds to see if they really are faster than the old ones — and if so, to see just where these speedups manifest.

The results are quite good.

Motivation

The main reference point in the literature for Mike’s refactoring is the 1997 paper Recursive Make Considered Harmful. That article makes all sorts of good points, and offers some solutions as well.

But to summarize it, the two problems with recursive Make we’re concerned with:

  • The inaccuracy of the dependencies, or the simple lack of dependencies, can result in a product which is incapable of building cleanly, requiring the build process to be carefully watched by a human.
  • Related to the above, some projects are incapable of taking advantage of various parallel make impementations, because the build does patently silly things.

I won’t go further into the details of why these refactors are helpful here. You can read about that in Mike’s writeup, [openssl-dev], or the [openssl-testing] thread if you’re interested. What I’m going to cover is the benchmarks that I ran and which of my results are statistically significant.

Why? Because I don’t like to see speed statistics without the data analysis to back them up. When someone comes around saying hey, this way of doing it is faster I want to see some sort of proof, repeated trials, and a significance test of some sort.

R analysis

You can find the full dataset created from my benchmarking in one easy-to-use CSV file.

I ran builds with ccache and without it, so we’ll start by subsetting the data based on that.

Then we can subset the data into the different types of builds:

Full sequential build

These builds are just the regular make clean && /usr/bin/time -p make. Since the single-makefile approach generally favors parallel more than sequential builds, it’s not surprising that we don’t see much improvement here.

master

Not exactly a normal distribution, but close enough for comfort
Not exactly a normal distribution, but close enough for comfort

mbland-makefiles-00

Comparison

Not surprisingly, the T-test is inconclusive. If you don’t have a multicore machine, odds are you won’t see a significant speedup from these patches.

Full parallel build

Parallel builds are where you expect a single-makefile build system to really shine. When running these, it certainly felt faster.

master

This could actually pass for a normal distribution
This could actually pass for a normal distribution

mbland-makefiles-00

Comparison

Single-makefile structure performs excellently in parallel builds
Single-makefile structure performs excellently in parallel builds

T-test yields favorable results. Looks like a 40% speedup. This is the best result on this whole page, since fast parallel builds after just this initial work can open doors for even better parallelization in the future.

Full sequential build with ccache

While doing build benchmarks with ccache might seem a bit odd,1 it actually makes sense: we should have data on how people might be building things in practice. If developers choose to use ccache when working, then they would want to know how this impacts them.

master

mbland-makefiles-00

Comparison

We see noticeable (and statistically significant) improvements in even the sequential build with ccache. However, considering we’re talking a difference of 13 vs. 16 seconds here, this may or may not matter to you.

If you’re concerned with how ccache impacts actual performance during development, there’s some useful info on their site’s performance page:

It should also be noted that if the expected hit rate is low, there may be a net performance loss when using ccache because of the overhead of cache misses (typically 5%-20%). Also, if the build machine is short on memory compared to the amount of memory used by the build tools (compiler, linker, etc), usage of ccache could decrease performance due the fact that ccache’s cached files may flush other files from the OS’s disk cache. See this mailing list post by Christopher Tate for a good write-up on this issue. So to sum it up: it is probably wise to perform some measurements with and without ccache for your typical use case before enabling it!

Full parallel build with ccache

master

mbland-makefiles-00

Comparison

More obviously so, the single-makefile parallel build with ccache is faster. These kinds of results are really encouraging: ccache can do its job better, work in parallel, and cut build time in half on an already quick build.

Specs

~$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-unknown-linux-gnu/4.9.1/lto-wrapper
Target: x86_64-unknown-linux-gnu
Configured with: /build/gcc/src/gcc-4.9.1/configure --prefix=/usr
--libdir=/usr/lib --libexecdir=/usr/lib --mandir=/usr/share/man
--infodir=/usr/share/info --with-bugurl=https://bugs.archlinux.org/
--enable-languages=c,c++,ada,fortran,go,lto,objc,obj-c++ --enable-shared
--enable-threads=posix --with-system-zlib --enable-__cxa_atexit
--disable-libunwind-exceptions --enable-clocale=gnu --disable-libstdcxx-pch
--disable-libssp --enable-gnu-unique-object --enable-linker-build-id
--enable-cloog-backend=isl --disable-isl-version-check
--disable-cloog-version-check --enable-lto --enable-plugin
--enable-install-libiberty --with-linker-hash-style=gnu --disable-multilib
--disable-werror --enable-checking=release
Thread model: posix
gcc version 4.9.1 (GCC) 
$ ccache -V | head -n1
ccache version 3.1.9
$ uname -a
Linux dionysus 3.16.1-1-ARCH #1 SMP PREEMPT Thu Aug 14 07:40:19 CEST 2014 x86_64 GNU/Linux
$ cat /proc/meminfo | head -n1
MemTotal:        7952408 kB
$ cat /proc/cpuinfo | egrep "(model name|cache size|cpu cores)" | head -n3
model name	: Intel(R) Core(TM) i7-2620M CPU @ 2.70GHz
cache size	: 4096 KB
cpu cores	: 2

Complications

Conclusion

As far as I can tell, Mike’s makefile refactoring has definitely achieved a ~40% speedup for parallel builds of OpenSSL, even without ccache. That’s an 81 second build time versus a 48 second build time.

As one StackOverflow member writes:2

I believe Caper Jones discussed some studies in which anything greater than a 1 second delay broke people out of the zone (my gist), and that the time / productivity lost due to that small delay was really an order of magnitude larger than it appeared.

If torial is correct in saying that, then build times really do have a significant impact on developers. Which makes this a big deal.

Mike has a pull request for these changes pending on GitHub.


  1. When I originally reported my results on openssl-dev, I just provided the ccache data without even realizing I had been using it. Aside from slight embarrassment on my part, that shouldn’t take away form the results.

  2. I’ve quoted that build time/dev performance statistic in other contexts as well. It really rings true to me.