Deterministic builds

When I build safe_vault it creates a different binary file to when maidsafe builds safe_vault.

This is not really ideal; the functionality is identical but the file itself is not.

So this topic is about exploring the path toward deterministic builds for at least the vault but hopefully the other SAFE software also.

Bitcoin does deterministic builds with gitian (for windows and mac) and is looking to move to guix (which is currently used for the linux build). You can see from the massive commit Add deterministic Guix builds this is not a trivial thing to do.

My initial investigations have given rise to some basic questions:

  • Why is the maidsafe linux build of safe_vault done with x86_64-unknown-linux-musl
    rather than x86_64-unknown-linux-gnu (see the *-musl suffix on the release page)? For me the default rust toolchain installed by rustup is gnu, and the travisCI is also gnu (see L435 of the travis build log for vault 0.20.1), so why is maidsafe using musl for their build? I had an error with the openssl package when trying to build safe_vault with the musl toolchain. Which toolchain would be preferred if we try moving toward deterministic builds?

  • I looked at the vault binaries (using xxd safe_vault) and searched for the text maidsafe and home and didn’t see anything that immediately stood out as being specific to the build environment. An introduction to deterministic builds seems like a good starting place to get an idea of how complex it is to manage sources of variation (although not specifically about rust builds).

  • What is the value of deterministic builds, are they important, and should they be attempted or worked towards? Can / should they be used for all maidsafe products (eg browser, frontend etc) or only the vault? Can we do fine without them?

This topic doesn’t seem to have been discussed too much on the forum from what I can see… @bluebird discussed it a couple of times (here is one such time) and @sfultong uses NixOS which is known for deterministic builds. Anyone else got experience or opinions about this topic?

And for those that just want a good read, try Reflections on Trusting Trust.

21 Likes

This allows the binary to run on all flavours of linux. So the 64 bit will run on all 64 bit machines. If we don’t then we need to provide a load of builds for differing versions of glibc.

It is also a step towards deterministic builds. On release we should be using a fixed cargo.lock to aid this as well. We can go further though and vendor such apps.

21 Likes

If people are interested, I could try setting up a nix build script for SAFE binaries.

11 Likes

Musl target is also more secure. A quote from yourself one year ago:

8 Likes

I agree that Deterministic Builds is a good goal.

I’m not sure if this is fully achievable with rust toolchain yet. Eg, see https://github.com/rust-lang/rust/issues/34902

Related, there is Codechain for signing.

In code we trust: Secure multiparty code reviews with signatures and hash chains.

The most common signing mechanism for open-source software is using GPG signatures. For example, GPG is used to sign Git commits and Debian packages. There is no built-in mechanism for key rotation and key compromise. And if forced to, a single developer can subvert all machines which trust the corresponding GPG key.

That’s where the Codechain tool comes in. It establishes code trust via multi-party reviews recorded in unmodifiable hash chains.

Codechain allows to only publish code that has been reviewed by a preconfigured set of reviewers. The signing keys can be rotated and the reviewer set flexibly changed.

Every published code state is uniquely identified by a deterministic source tree hash stored in the hash chain, signed by a single responsible developer.

Codechain uses files to store the hash chain, not a distributed “blockchain”.

4 Likes

Is there a way to also specify / force the version of rust to use? My searching didn’t give any clues.

@sfultong, do you have any opinions regarding nix vs guix?


Some more notes on my dabbling toward maybe some progress:

One of the tough things I faced in generating a musl version of vault was the dependency on openssl.
cargo build --target x86_64-unknown-linux-musl

Looking through the build script for vault (see fleming/Dockerfile.build) there’s a fair bit of screwing around to build openssl with musl (L25-L40). This particular part is failing when I try to build vault with musl on my machine (ie not using a docker container).

OpenSSL is not actually needed directly by the vault. The dependency tree (reduced just to show openssl dependency) stems from self_update

├── self_update v0.5.1
│   ├── reqwest v0.9.20
│   │   ├── hyper-tls v0.3.2
│   │   │   ├── native-tls v0.2.3
│   │   │   │   ├── openssl v0.10.24
│   │   │   │   │   └── openssl-sys v0.9.49
│   │   │   │   ├── openssl-probe v0.1.2
│   │   │   │   └── openssl-sys v0.9.49 (*)

when I removed self_update feature the vault compiled without needing to point to the specific musl version of openssl.

The DOPENSSL_NO_SECURE_MEMORY flag used in the build of openssl also comes with a reasonable warning (see this github issue comment): “Obviously it’s not recommended to compile things this way (its clearly better to have the secure memory feature).”

It seems like possibly a good first step to remove the openssl dependency, what do you think? I know self_update is cool but is it worth dragging in the complexity of openssl for it? Just poking around for opinions, not trying to step on any toes here so apologies if I’ve misunderstood some aspect!

5 Likes

We have done a ton of work to remove that actually. Rust_sodium needed it and we deprecated that for exactly that reason/ My feeling is we build 100% in rust and then lock it down as well as audit it.

9 Likes

Removing self_update has another benefit, the vault binary goes from 15.6 MB to 11.7 MB, ie that feature adds about 4 MB.

4 Likes

@sfultong, do you have any opinions regarding nix vs guix?

Not strong ones. I’m more familiar with nix, but nix is a weirder language. Guix is scheme-based, so it’s more friendly to people who know scheme. I think nix has more adoption, probably mainly because it came first.

1 Like

I managed to reproduce the same build on two different machines.

  • Remove openssl dependency from safe_vault (see this commit) only because I couldn’t get it to build with musl. Maybe it’s possible to do deterministic builds still with openssl, but I chose to remove it.
  • musl-gcc 9.2.1 20191008
  • rustc 1.39.0 4560ea788 2019-11-04
  • safe_vault 0.20.1 8dee8601 2019-12-04
  • cargo build --release --target x86_64-unknown-linux-musl

This builds on two different machines to both give the same sha256:

$ sha256sum target/x86_64-unknown-linux-musl/release/safe_vault
eaeed686b7183314f26c85f2461963e95041dd6e483906d7371be1a4e4b1245a

This doesn’t necessarily mean it’s deterministic, just that it’s reproducable!

So it seems like a possible step toward the goal of deterministic builds.

On the topic of openssl, it would be a pretty big headline (for the geeks anyhow) for the new internet to work without any openssl dependency. That particular library is so critical to the existing internet but often seen as a bit of a weakness. I mean, just look at the popular topics on hacker news about openssl. It’s not pretty!

I have some thoughts on self_update too but maybe getting too far off topic.

11 Likes

This is a test to see if building safe_vault on different linux distros is deterministic or not.

The builds are not identical, which is not what I hoped but at least we know.

The c compiler is the variable that isn’t controlled well enough. Unfortunately this is also an extremely complex component so going into the details of my exploration in this post is not going to be helpful! (Some leftover notes about it appear at the end of the post).

Good news is no more openssl dependency so that simplifies things quite a bit.

Also good news is the build is repeatable, ie repeating these steps produces the same result every time.

Gotta say, xorurls are going to be a damn blessing. Truly universal and consistent management for remote dependencies is going to be pretty epic (as guix and nix are beginning to show).


Build on AWS Ubuntu AMI

# install rust
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
source ~/.profile
rustup target add x86_64-unknown-linux-musl

# install gcc so musl-gcc can be built from source
# this is the variable that needs better control
# but building repeatable copies of gcc from source is no easy task
# This installs gcc
# gcc --version
# gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0
sudo apt-get update
sudo apt-get install build-essential

# install musl-gcc
wget https://musl.libc.org/releases/musl-1.2.0.tar.gz
tar -xvzf musl-1.2.0.tar.gz
cd musl-1.2.0/
./configure
sudo make install
export PATH=$PATH:/usr/local/musl/bin

# install and build safe_vault
cd ~
git clone https://github.com/maidsafe/safe_vault.git --depth=1
cd ~/safe_vault/
cargo build --release --target x86_64-unknown-linux-musl
sha256sum target/x86_64-unknown-linux-musl/release/safe_vault
28b09022a93054541ce4fc889ef14370f0e566bcada79ad297a4f08a8854b2d4

Build on AWS Linux 2 AMI

# install rust
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
source ~/.profile
rustup target add x86_64-unknown-linux-musl

# install gcc so musl-gcc can be built from source
# this is the variable that needs better control
# but building repeatable copies of gcc from source is no easy task
# This installs gcc
# gcc --version
# gcc (GCC) 7.3.1 20180712 (Red Hat 7.3.1-6)
sudo yum groupinstall "Development Tools"

# install musl-gcc
wget https://musl.libc.org/releases/musl-1.2.0.tar.gz
tar -xzf musl-1.2.0.tar.gz
cd musl-1.2.0/
./configure
sudo make install
export PATH=$PATH:/usr/local/musl/bin

# install and build safe_vault
cd ~
sudo yum install git
git clone https://github.com/maidsafe/safe_vault.git
cd ~/safe_vault/
cargo build --release --target x86_64-unknown-linux-musl
sha256sum target/x86_64-unknown-linux-musl/release/safe_vault
d5507ccc901df0ec4e5c8f4d23322750625e2fd4e9aaf78d564978ca6a19aebb

Notes on installing gcc from source (which depends on having an existing gcc installation)

I had hoped to build gcc-7.5.0 from any existing version of gcc, then use gcc-7.5.0 to build musl-gcc to give a consistent copy of musl-gcc, but it did not. The original prebuilt gcc matters.

# instructions for AWS Linux 2 AMI
sudo yum install gcc gcc-c++
wget https://ftp.gnu.org/gnu/gcc/gcc-7.5.0/gcc-7.5.0.tar.gz
tar xzf gcc-7.5.0.tar.gz
cd gcc-7.5.0
./contrib/download_prerequisites
cd ..
mkdir objdir
cd objdir
$PWD/../gcc-7.5.0/configure --prefix=$HOME/GCC-7.5.0 --enable-languages=c,c++ --disable-multilib
make -j 2
make install
# gcc --version will still show the prebuilt version
export PATH=$HOME/GCC-7.5.0/bin:$PATH
# gcc --version should now show 7.5.0
8 Likes

You need to build with musl to test this cargo build --target=x86_64-unknown-linux-musl

[EDIT Ah I see you did, but looks like the musl-tools built on different glibc is producing different binaries. Very interesting indeed. So we have musl build that the binary will work across distro’s but the build itself is different compiled across distros. Great work again @mav ]
[edit II Maybe a mix of musl and docker or similar is the answer?]

6 Likes

This would work.

It depends a bit on the intended purpose of deterministic builds.

Docker is hard to audit and verify all the components. But it works well as a ‘recipe’ which everyone can follow and verify at a recipe level rather than a source code level.

Guix and nix are pushing hard in this arena because they try to verify all source from the ground up, sorta like mathematicians have been doing with foundational mathematics (eg Principia Mathematica).

For now a bash script or docker image is probably good enough, but I think later it will really need guix (or similar) to ensure secure builds. Bitcoin uses guix fwiw.

My main interest is being able to ensure that if I compile a vault which has exactly the same source code as my friend, those two binaries should be the same. If they’re not the same then we have to trust that the features in each different binary really are the exact same features. eg if maidsafe tells me to download binary 9e3...a11 and I can’t compile the same binary myself from source, I can’t say which source they really used for that release so I can’t really know what features are in that binary.

Having a recipe helps a lot. Having a recipe with ‘fully verifiable ingredients’ (ie guix) is even better.

11 Likes

This may be of interest.

1 Like

This won’t be true if you have different target processors. Any “arch=native” like compiler optimizations will change this. Intel vs AMD etc.

1 Like