Some of you may have noticed that this last 2 weeks have seen us not running the testnet1 as much as you would expect. This is an expected thing and planned, however it has highlighted something important to me.
The last few weeks have been a mix of hard hard (too hard) work, testnet focussed and interviewing and recruiting. This is all great, but last week I put my foot down and stopped the progress. This probably is the closest to a heart attack I have been, trying to stop a maidsafe dev, is a nightmare The reason being we are now growing from a small group of devs who communicate every few hours in person to a larger community. This is true of our own dev community in house, which now is supplemented by at least devs in 4 different counties. These people cannot chat every few hours as we do. Fortunately the first 2 remote devs Nial and Bjorn are pretty amazing self managed and motivated people, we cannot depend on that. It would be wrong.
So we have taken a brave and required step in this last week. This is to stop new features and fix 100% what we have in place. We have failing tests that we work around to get testnets up etc. This is OK if everyone knows what they are, but it’s at best managing a moving target. It bit us with the testnet demo for SF.
Now you will see the dashboard tests all going green, coverage falling sightly and critical tasks being put in place for next weeks sprint.
So what steps have we introduced and why?
Workflow management using Atlassian Jira (https://maidsafe.atlassian.net) - this allows us to manage tasks from design, through dev, then onto Qa and Code Review, prior to any commit to the mainline codebase. This important step introduces controls and makes use of Pull Requests (more later).
In this process core devs have been removed form being able to push to mainline code (removed write capability on mainline). That sounds like a slow down, but it’s the opposite. A dev just needs to do development and Qa and review will handle the rest. It is a much faster process for development and works with a dev local or remote.
This also provides in process roadmaps based on workload and makes transparency of the workflow very simple and fact based. No more hunches We added a roadmap module to allow visibility of a roadmap in real time.
Huge thanks to Atlassian for a 2000 seat Open Source license, an amazing company.
Disable or fix existing tests
This weeks sprint has been to get the dashboard green and allow continuous integration machines to check each code commit. Any disabled test has become a critical task for this coming weeks sprint. Its not over yet, in usual style we will work over the weekend, weekend days off are not a maidsafe bonus just yet.
Introduce Continuous Integration framework
This is a mechanism where every pull request and code commit is tested across many machines on many systems. So we have our in house Jenkins setup that covers all OS’s, this can be up to 80 builds and tests per push to mainline. We have drone.io and codeship.io where a further 40 builds are tested in Release and Debug modes of each library. Feel free to add your own, ping us and we will supply the scripts and add deploy keys where necessary for this.
This essentially means the code is analysed from all angles and OS’s before any code review is done (code review should not be - “will this work”, it needs to be about efficiency).
This was all happening in a much less automated manner till now. This process makes it impossible to try and push non tested (or less covered) code.
Focus on coverage, tests and test times
Qa now have their job automated to an extent. The CI tests are checking code with static analysis, dynamic analysis (Tsan, Asan, Msan, UBsan etc.) this is some serious checking. The Qa team will focus on automating coverage analysis to ensure any commit never reduces code coverage. They will also Qa the test code, i.e. what is being tested, why and how. This will increase quality per commit to mainline.
Test timings are also automatically checked to ensure there is no out of range speed up or slow down in any test. This is an important indicator of possible error (did somebody comment out a part of a test, or ‘fix’ something like the way Apple ‘fixed’ tls/ssl checks).
In addition, code hotspot analysis will be included to identify any code thats worked on too often (a sign of a hidden bug or too complex bit of code)
Installer based development environments
To make App developers job much simpler, we have created some installer based packages. These are in process just now and should be available soon. The main one being a maidsafe-dev package. This will be a single click install and provide all required headers (147Mb of them) and a single monolithic static library. No more git/cmake/compiler stuff unless you want to.
So what does this all mean really?
OK no slowdown in release, thats key. This stop in progress will allow an accelerated more transparent route to market.
We will have probably one of the most capable workflow and testing setup around. This will be grown as we move on. It’s already pretty massive.
This was all happening previously, but ad hoc, its now baked into the process.
We can now manage devs regardless of their proximity and not in any ad hoc manner.
Devs now have a much simpler task in getting to launch.
This setup is pretty significant and our intention with it is that we can achieve many things now. Some of these will include (and be encouraged)
1: Allow App devs access to the whole process, from workflow through continuous integration and regression proof testing (give us a few weeks to bed in though).
2: Allow 3rd party code checks, we have already offered boost and bitcoin this capability to ensure their code base is further checked.
So this is a drive to not only make core dev very complete, but to aid the community out there. We will make efforts for easy integration into this whole platform and hopefully help 3rd party devs ensure code correct practices.
A very important issue for us is testing and this is where the community can help as well. We have many types of tests and much of our code base has way more test code than production code (significantly so in many libraries). So code correct is great, speed checks, fantastic etc. but there is a huge area of our test code that is critical, security testing.
An example is when the Intel revelations about PRNG came out, we went a bit further than many and disabled all Intel acceleration (SSE etc.) until we had the ability to confirm that we tested (and baked into our test suite) that we can confirm we get the exact output results (slower though) with cpu enhancements as we did without them. This kind of thing is where backdoors etc. can happen, unless we are that careful.
So we have a lot of this kind of code security check in place. These tests are open and anyone can look and confirm them. We can also add many more, but why us?, anyone can.
So tests like, is there any ‘ping home’ messages from any node, or perhaps a switch that can remove encryption (do our tests cover that, well the net design does, but you know what I mean).
So this is a wide open comprehensive setup now, we should be able to confirm each pull request (anyone can they are public in github), comment on them and confirm they introduce no clever backdoor code. We will reject any overly complex code anyway, it is a sign of a bad programmer. We also reject masses of code, improvements come from reducing a code base, not increasing it, so the I have done 70,000 lines of code dev in maidsafe, probably would lose their job We want very efficient and complete code, not lots of raw loops, magic numbers and if statements everywhere).
In summary, we have switched on a huge amount of automation and done so in a way that is even more open transparent and auditable by the community (pull requests should/will cover a single task, that is well documented, no magic). This is a very important step forward in terms of both code efficiency and security through transparency. I think it is worth the 2 weeks of feature freeze, even though it has slowed down initially this release, I feel we will make the time up very quickly.
I hope this step will help everyone involved and further enhance security of the project. We really want folks to be able to tie their process into this and also tie their own CI checks on the core code base in as well. The more the merrier and as the pods take up core dev tasks, the community will retain this visibility. It will be hard to catch us napping and see any back doors or flaws in the code if we make the code accessible and each task/commit documented (each commit will have a jira task ID, see the whole thing there).