Benchmarking Mike Bland's OpenSSL Makefiles
Mike Bland has been hard at work refactoring the build system for OpenSSL. I wasn’t involved in developing these changes, but I do care about OpenSSL and the way the decision process happens there. I care about making sure its developers get the best
bang for their buck out of their time, so that they can focus on the important issues.
As my small contribution to this effort, I ran benchmarks and did statistical analysis of Mike’s builds to see if they really are faster than the old ones — and if so, to see just where these speedups manifest.
The results are quite good.
The main reference point in the literature for Mike’s refactoring is the 1997 paper Recursive Make Considered Harmful. That article makes all sorts of good points, and offers some solutions as well.
But to summarize it, the two problems with recursive Make we’re concerned with:
- The inaccuracy of the dependencies, or the simple lack of dependencies, can result in a product which is incapable of building cleanly, requiring the build process to be carefully watched by a human.
- Related to the above, some projects are incapable of taking advantage of variousparallel makeimpementations, because the build does patently silly things.
I won’t go further into the details of why these refactors are helpful here. You can read about that in Mike’s writeup,
[openssl-dev], or the
[openssl-testing] thread if you’re interested. What I’m going to cover is the benchmarks that I ran and which of my results are statistically significant.
Why? Because I don’t like to see speed statistics without the data analysis to back them up. When someone comes around saying
hey, this way of doing it is faster I want to see some sort of proof, repeated trials, and a significance test of some sort.
You can find the full dataset created from my benchmarking in one easy-to-use CSV file.
> ssl <- read.csv('all_builds.csv') > head(ssl) branch shortref command real user sys ccache 1 master a46149c make 112.30 99.15 8.07 FALSE 2 master a46149c make 118.95 105.31 8.58 FALSE 3 master a46149c make 118.12 104.59 8.52 FALSE 4 master a46149c make 117.28 103.85 8.22 FALSE 5 master a46149c make 124.16 109.49 9.21 FALSE 6 master a46149c make 121.17 106.99 8.71 FALSE
I ran builds with ccache and without it, so we’ll start by subsetting the data based on that.
Then we can subset the data into the different types of builds:
> fullseq <- normal[which(normal$command == 'make'),] > fullpar <- normal[which(normal$command == 'make -j4'),] > fulltest <- normal[which(normal$command == + 'touch ssl/d1_both.c && /usr/bin/time make'),] > fullseqcc <- ccache[which(ccache$command == 'make'),] > fullparcc <- ccache[which(ccache$command == 'make -j4'),] > fulltestcc <- ccache[which(ccache$command == 'make test'),]
Full sequential build
These builds are just the regular
make clean && /usr/bin/time -p make. Since the single-makefile approach generally favors parallel more than sequential builds, it’s not surprising that we don’t see much improvement here.
> t.test(real ~ branch, data=fullseq) Welch Two Sample t-test data: real by branch t = 1.0478, df = 15.906, p-value = 0.3104 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -2.688901 7.940012 sample estimates: mean in group master mean in group mbland-makefiles-00 120.6433 118.0178
Not surprisingly, the T-test is inconclusive. If you don’t have a multicore machine, odds are you won’t see a significant speedup from these patches.
Full parallel build
Parallel builds are where you expect a single-makefile build system to really shine. When running these, it certainly felt faster.
> t.test(real ~ branch, data=fullpar) Welch Two Sample t-test data: real by branch t = 31.1981, df = 13.113, p-value = 1.084e-13 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: 30.60205 35.15128 sample estimates: mean in group master mean in group mbland-makefiles-00 81.24000 48.36333
T-test yields favorable results. Looks like a 40% speedup. This is the best result on this whole page, since fast parallel builds after just this initial work can open doors for even better parallelization in the future.
Full sequential build with ccache
While doing build benchmarks with ccache might seem a bit odd,1 it actually makes sense: we should have data on how people might be building things in practice. If developers choose to use ccache when working, then they would want to know how this impacts them.
> t.test(real ~ branch, data=fullseqcc) Welch Two Sample t-test data: real by branch t = 4.3274, df = 5.265, p-value = 0.006681 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: 1.221215 4.666119 sample estimates: mean in group master mean in group mbland-makefiles-00 16.17067 13.22700
We see noticeable (and statistically significant) improvements in even the sequential build with ccache. However, considering we’re talking a difference of 13 vs. 16 seconds here, this may or may not matter to you.
If you’re concerned with how ccache impacts actual performance during development, there’s some useful info on their site’s performance page:
It should also be noted that if the expected hit rate is low, there may be a net performance loss when using ccache because of the overhead of cache misses (typically 5%-20%). Also, if the build machine is short on memory compared to the amount of memory used by the build tools (compiler, linker, etc), usage of ccache could decrease performance due the fact that ccache’s cached files may flush other files from the OS’s disk cache. See this mailing list post by Christopher Tate for a good write-up on this issue. So to sum it up: it is probably wise to perform some measurements with and without ccache for your typical use case before enabling it!
Full parallel build with ccache
> t.test(real ~ branch, data=fullparcc) Welch Two Sample t-test data: real by branch t = 57.7511, df = 9.992, p-value = 6.001e-14 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: 5.326394 5.753940 sample estimates: mean in group master mean in group mbland-makefiles-00 11.090500 5.550333
More obviously so, the single-makefile parallel build with ccache is faster. These kinds of results are really encouraging: ccache can do its job better, work in parallel, and cut build time in half on an already quick build.
~$ gcc -v Using built-in specs. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-unknown-linux-gnu/4.9.1/lto-wrapper Target: x86_64-unknown-linux-gnu Configured with: /build/gcc/src/gcc-4.9.1/configure --prefix=/usr --libdir=/usr/lib --libexecdir=/usr/lib --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=https://bugs.archlinux.org/ --enable-languages=c,c++,ada,fortran,go,lto,objc,obj-c++ --enable-shared --enable-threads=posix --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-clocale=gnu --disable-libstdcxx-pch --disable-libssp --enable-gnu-unique-object --enable-linker-build-id --enable-cloog-backend=isl --disable-isl-version-check --disable-cloog-version-check --enable-lto --enable-plugin --enable-install-libiberty --with-linker-hash-style=gnu --disable-multilib --disable-werror --enable-checking=release Thread model: posix gcc version 4.9.1 (GCC)
$ ccache -V | head -n1 ccache version 3.1.9
$ uname -a Linux dionysus 3.16.1-1-ARCH #1 SMP PREEMPT Thu Aug 14 07:40:19 CEST 2014 x86_64 GNU/Linux
$ cat /proc/meminfo | head -n1 MemTotal: 7952408 kB
$ cat /proc/cpuinfo | egrep "(model name|cache size|cpu cores)" | head -n3 model name : Intel(R) Core(TM) i7-2620M CPU @ 2.70GHz cache size : 4096 KB cpu cores : 2
As far as I can tell, Mike’s makefile refactoring has definitely achieved a ~40% speedup for parallel builds of OpenSSL, even without ccache. That’s an 81 second build time versus a 48 second build time.
I believe Caper Jones discussed some studies in which anything greater than a 1 second delay broke people out of the zone (my gist), and that the time / productivity lost due to that small delay was really an order of magnitude larger than it appeared.
If torial is correct in saying that, then build times really do have a significant impact on developers. Which makes this a big deal.
Mike has a pull request for these changes pending on GitHub.
When I originally reported my results on openssl-dev, I just provided the ccache data without even realizing I had been using it. Aside from slight embarrassment on my part, that shouldn’t take away form the results.↩