Testing jessie for reproducibility or "here, have some 8 cores and 25 GB RAM extra..."

So three weeks ago it occurred to me that it would be useful to test the reproducibility of Jessie too (until then we only tested unstable and experimental), as this will give us a nice data point to compare future developments against and also because we wanted to test "testing" in future anyway, so we could as well start while "testing" is still jessie... (because then we will have to test less once stretch development has started, as in the beginning stretch will be jessie anyway.)

Quite quickly I then realized that building all packages twice on the current, quite heavily loaded jenkins.debian.net machine, would take 4-6 weeks and that it might well happen that we'll release Jessie before this has finished. (And I do want a Jessie release rather sooner than later!)

So I've asked Profitbricks, who have been sponsoring the jenkins.debian.net setup since October 2012, for some more resources temporarily, and once again they quickly helped out and the next day I could add eight cores and 25 GB of RAM to the existing VM, for a total of 23 Cores and 75 GB RAM.

So currently jenkins.d.n runs eight reproducible build jobs simultaneously, each of them is pbuilding packages in tmpfs in RAM. Thus these builds are really fast: for small packages without additional build depends build times as low as 40 seconds can be seen, which doesn't sound impressive at first, but this includes the source download and building twice and it includes untar'ing the base.tgz twice as well as running debbindiff on the resulting binary packages. (And all this is while quite many other jobs continue to hammer the machine as well.)

(BTW: I doubt using LVM snapshots is faster then running this in tmpfs, but if you think LVM is faster I'd be curious to see some numbers.)

Interestingly this rebuild also discovered a rather unexpected result as we found an RC bug in jessie, which caused build failures for >20 packages: "#781076 cdbs: perlmodule-vars.mk LDDLFLAGS quoting problem".

And then there were some unexpected but really expectable results: testing turned out to "only" give reproducible results for 79.4% of all packages in main, while unstable has 82.6% reproducible packages. This was unexpected, as our (few) fixed packages in our repo are also used for testing testing and as testing is smaller than sid, I at first believed the reproducible percentage should be higher, as testing has 1000 packages less.

So why was the lower percentage to be expected? This is due to two factors: since FOSDEM we have introduced a number of new variations for the second build, that is we now build with a.) a different domainname, b.) with a different umask, c.) a different timezone and d.) a different kernel version (using 'linux64 --uname-2.6'). And secondly, most packages in sid haven't been rebuild since FOSDEM. Thus we'll now reschedule all packages in testing which are unreproducible in unstable, which should decrease the current reproducibility of sid a bit...

For those wondering about the 622 packages failing to build in testing: at least 453 are due to our test setup, categorized in (currently) three issues: timestamps_from_cpp_macros, ftbfs_werror_equals and ocaml_configure_not_as_root. The ones in this third issue category are rather trivial to fix, so this leaves 622-453=169 minus those fixed by #781076, so roughly 150 packeges which fail to build in our setup which need to be investigated...

And for those wondering about other missing variations, there is at least one big one missing: changing the build date. Current guestimate is that this will make 1-2000 packages unreprdoucible again but we'll only know for sure once we tested them.

So I would like to once again thank Profitbricks for supporting jenkins.d.n in the last 2.5 years and making reproducible.d.n possible so smoothly. Being able to painlessly add more resources when we needed them was incredible useful and I hope we can count on their support in the future. (I have some more ideas how to burn resources usefully in future. Stay tuned and btw, if you know how to put more hours in the day, please do tell me. ;)

And, of course: thanks for your work on Debian, no matter whether you've been working on Jessie, reproducibility or something totally different!

To finish this post I'd like to remind everyone that currently all this is just about the prospects of reproducible builds for Debian. Debian is not 80% reproducible yet - but it easily could be! And I certainly hope it will be "soon", and hopefully "soon" will only mean a few months. We will rebuild and see.