I wonder if I might ask experienced Docker users here for help.
I’ve been using Docker for a week now and I have created images that build safe_vault for two architectures: amd64 and armv7 (hf).
That’s all good, except that the images are a bit over 1GB in size, which is several times the size they should be as compared to similarly complex images on the Docker hub. It makes them impractical to upload to the Docker repo or distribute to anyone else.
The Dockerfiles that I wrote follow the strategy of git-cloning/downloading each necessary component, compiling and then installing it: base debian image, system tools, gcc, libsodium, rustup, sodiumoxide, safe_vault
I have tried the following:
Follow the recommended practices in writing the Dockerfile: Use a single “RUN” command to clone, compile, install and then delete the source code for each component.
Run docker-squash: GitHub - goldmann/docker-squash: Docker image squashing tool
Run docker-companion: GitHub - mudler/docker-companion: squash and unpack Docker images, in Golang
What do people think? I’d prefer to avoid radical departures such as using a non-debian starting image.
Not an expert here… but have you tried uploading them?
I am thinking the reported size might be very different from what is sent / received because of the way Docker works (essentially using clever diffs).
Just a guess but worth a try.
I became aware of the problem when I went to upload them.
The size (1 and 1.2 GB for the two files) as reported with the command “docker images” is accurate, because when I do “docker save” to a tar file, the tar file is the same size. Gzipping the tar file gives a gz file of half a GB but that only mitigates uploading it to my cloud server. At the other end it still has to be imported (with “docker load”) as a 1 or 1.2 GB image.
The way Docker works is that it keeps everything ever added to the image as “layers” (one layer per RUN command in the Dockerfile (the script to build an image), which is why you are supposed to build and tear down each component within the same RUN command, before it gets committed to the next layer) kind of like the way git works, where you can roll back to any part of the repo’s history.
I’m not sure what all those layers are doing or why I can’t get rid of them.
Could you provide your dockerfile somewhere? Always easier to give advice when you have the most in-depth understanding of what is actually done.
In generally, I’d recommend running
rm -rf /var/lib/apt/lists/* after any install on a docker (preferably in the same
run with a
&&) to clean up cached data and not keep them around. If you aren’t doing that, that’s a starting point.
to keep the layers down you could put all commands you execute into one bash-file and execute that making the entire run only layer. And further more make sure you are cleaning up all pulled/cached information, for example by putting all in /tmp and purge the directories after a successful install.
There are some more recommendations on ctl.io and it is hard to give more specific answers without seeing the Dockerfile .
just to give some idea, the Rustup musl images using best practices (like the
rm I mentioned before) with a fresh openSSL already summons to about over 500MB, too. So, while there’s some leverage, I doubt you can make it smaller than that without replacing the distro…
It isn’t for deploying/running safe_vault but for building it in a portable environment.
I found that the presence of the armhf architecture on Debian Jessie was messing with the apt update system, with a conflict going on. I had a script for building and tearing down the architecture for each build session, and I felt that it was getting overly compicated, with just two architectures, and that it wouldn’t scale very well to the half dozen or so I’m aiming for. So putting each compile environment in its own Docker image is a robust solution… once I get the image bloat problem sorted.
The compile step near the end would be commented out once I was satisfied. I include that for testing. The image is run by a script on the host that runs the internal script (“/compile”) and then loops, alternately sleeping for a minute and checking for the appearance of the “done123” file in the running container. Once it sees that it copies the binary out.
EDIT: As I mentioned, it works beautifully, and it runs at 11pm each night and rsynch’s the binaries up to the server. But is too big to get off the ground itself. That’s the only issue.
Some months ago I was playing around to create smallest possible containers with alpine linux but finally came up to this article which was a great resource using scratch and ldd to follow dependencies…
If images get to complex I think you’ll have a bunch of work, but doing simple things works great…
Principle is making a scratch file and it’s directory structure – finding all dependencies and copy the to appropriate destinations -
Image size of only program + dependencies + directory
Thank you, I’ll have a look at that.
While still using a Debian base, and no ridiculous stringing everything into one RUN command (which would make readability, commenting and therefore maintenance especially by a third party much more difficult) I have managed to get one image down to 747MB (from 1027MB), a reduction of 280MB!
I’ll upload that as version 1.0. I’ll revisit this matter later by trying a scratch file and adding only what is essential.
Something I finally confronted: Sodiumoxide doesn’t actually do anything except build an rlib file. But unless you utilize that in another program (via cargo.toml, say) then it has no purpose. Rather obvious in hindsight, but hey, I was a complete beginner when i first put it in. It appears therefore that Rust bindings for the libsodium C library is what the sodiumoxide crate does (which answers another question).
I’m new to the repository, docker.io but it seems to be taking a long time to update, and put the tags in place, so that people can pull the image. And I’m short on patience today. Github is brilliant efficient, while my experience of Docker’s infrastructure is… not so much (their keyserver also takes forever).
In the meantime, the current dockerfiles (still have to add armv7) are at GitHub - Bluebird45/dockerfiles along with the external script and usage note.
I left out the section about rsyncing the binaries to the server: outside the scope of the discussion.
How big are the files for maidsafe?
Let’s see if we can’t get that within 5MB of what what you are running. Aha, there are the dockerfiles!
How big are my Docker images? 739MB and 874MB for AMD64 and ARMV7 respectively.
I haven’t looked at what Maidsafe are doing with Docker. My projects serve both as self-training and as useful tools in themselves. i think Maidsafe’s docker is for deployment of vaults, but I might be inaccurate on that. My docker images are templates for containers that are basically binary factories: You run a script that loads the containers and after a while a binary pops out of each container, for each computer architecture that might be desired: AMD64, ARMV7 (the two done so far) i686, Windows, OSX, and maybe several others for which there are Rust toolchains/targets.
Iooked over your Dockerfiles yesterday, and I approve. You might get them somewhat smaller with Arch Linux as a base, but they’re quite good overall. How long have you been Dockering for?
You mentioned some offical project Dockerfiles, can you point me in the right direction?
About a week.
This is the answer to your second question, unless there are other Docker projects: https://github.com/maidsafe/safe_vault/tree/master/installer/docker