Update January 13, 2022

It was fantastic and gratifying to see the latest community testnet performing so well :tada:. Huge thanks to everyone who participated :pray: :pray: :pray:. Now that the network is becoming more stable itā€™s time to move onto the finer points of data storage and retrieval, and our current plans are outlined below. As ever, we are trying to come up with basic mechanisms that are light and flexible and which can be easily extended to support other functions such as payments and farming. Most importantly, they do not involve any assumptions of network synchronisation.

General progress

@yogesh has managed to fix a logic problem when promoting new elders, reducing the number of AE rounds, and consequently message exchanges, from 150 to 15. This is an impressive 10X optimisation, but heā€™s sure there is even more that can be done here at a later stage. He and @anselme are now looking at the data handling processes described in the main section below, with @qi_ma debugging the acknowledgement and error handling process at the client.

@Joshuef has introduced Bors, an automation system that integrates multiple PRs at once, so itā€™s a real time saver when it works ā€“ which after a bit of fair bit of fiddling is most of the time now, happily. He and @oetyng have also been working on moving registers to adults, simplifying register puts, and relaxing the current requirement to send requests to all seven elders (see below).

@bochaco is considering the routing flow membership consensus and handling nodes that have gone offline, and how that will be integrated with the DKG work @davidrusu and @danda have been pushing ahead with.

@chriso has been updating and tidying up the CLI documentation, and @lionel.faber has fixed some end-to-end tests which were not passing, updating the testnet tool in the process.

Data handling

The heart of the Safe network is its ability to store data securely, reliably and permanently. Hereā€™s an overview of our current thinking about data handling. It touches on other issues too, such as the UI/UX and liveness checks for the adults to see if theyā€™re doing what they should, and also a mechanism for clients paying for storage.

Valid data

Data stored on the network must be valid from the networkā€™s point of view. Once a piece of data is valid it can potentially be stored by anyone.

Each data item is made up of a name, the content, a signature and a section key.

The name must be signed by any valid (old or current) section key, and the section key will be from the section where it is being stored.

As a reminder, a section is made up of seven decision-making elders and many more (60 to 100) adults which store data and offer it up when instructed to by the elders.

Data is stored by the four adults closest (in XOR terms) to its name.

Storage capacity

Stepping back a bit, each adult is actually someoneā€™s computer. It may be a cloud VM, a home PC or a Raspberry Pi - or even a smartphone provided it has enough storage. But how much storage is enough? This is a little tricky because requirements will likely grow in time.

If an adult runs out of space it will stop responding properly and will be penalised (lose node age). If the machine is used for other things too, such as work, music, storing photos etc, being full of Safe chunks would also affect these too, so for both reasons itā€™s important that its owner is given ample warning when capacity is running out - and that the network is aware too.

There are a few options here. First, we set no limit for Safe storage, simply measuring space left on the disk and warning when itā€™s nearly full. This has the advantage of simplicity, but since storage is a background process, full capacity could creep up on the user and give them a nasty surprise.

Another option would be to have the user pick a fixed value for the storage volume, with suggested amounts based on the available space at the time of starting the node, nudging folk toward a useful amount for the network, perhaps by highlighting a middle value.

storage

This gives the user more control, but the downside is we, let alone they, donā€™t really know what the ideal value is and how it might change over time.

It may be possible to create a dedicated expandable partition only for Safe Network data, but that could be complex to do, given the range of platforms and operating systems.

So, this oneā€™s still under discussion.

Checking adults are behaving

When the client wants to get some data, it makes a request to the elders in the section closest to the dataā€™s name. Each elder then computes which four adults should be holding the chunk. It keeps a record of the operation ID that the adults need to fulfil. When the elders have received a response from an adult with a data chunk, it disassociates the operation ID from that adultā€™s name as it has now been fulfilled.

At this stage, the elders will check for unresponsive adults. An adult is unresponsive if it performs significantly worse than its neighbours (the exact tolerance will be worked out experimentally). Unresponsive adults will have their node age halved and may be relocated.

Elder caching

For some time now weā€™ve been thinking about the best way to deploy caching. For many operations we think caching on elder nodes will help with performance and managing data as nodes go offline, as both a safety measure against data loss and a way to speed up the redistribution of chunks.

In this scheme elders will store any data put or retrieved from the section in a LRU (least recently used) cache. The capacity of the cache will be capped, with elders dropping less recently used data as necessary.

What happens when we promote a node?

When we promote an adult to an elder, the adult first publishes its data to other adults and the elders record the chunks in their LRU cache, removing at random any above their size limit if necessary.

What happens on restart?

When an adult restarts or relocates it sends its chunks to the three closest elders to the chunk name and they store as much as possible in their cache. The elders in turn store each chunk to four adults.

Elders can drop data from their cache, but adults cannot drop data. Adults continuously report their level to elders, and once they are 90% full no more data gets sent their way.

Storing data as a client

When a client stores data it sends that to three elders to sign. Why three? Because there is guaranteed to be one honest node among them, since we assume there are no more than two faulty elders in a section of seven. With one honest elder, so long as the data is valid the client will eventually get a supermajority of signature shares (5) from the honest section elders meaning that it can be stored. As soon as one node has returned an acknowledgement with a network signature, that chunk can be considered stored.

Getting data as a client

Since chunks are signed and self validating, in the case of immutable data the client only needs one chunk. It doesnā€™t care if itā€™s network signed or not, because itā€™s immutable.

Mutable (CRDT) data is a bit more complex. In this case the container is section-signed, but the contents are only signed by the client (the data owner). In this way the data is self-validating and hard to corrupt, but a malicious or faulty node could refuse to deliver the content or give the client old content.

So the client wants to make sure it gets as much of the data as possible, meaning it should ask at least a superminority of elders (three) for the data. The more copies it has, the quicker it can merge those copies to recreate the latest version of the data.

Paying for storage

This model feeds nicely into using DBCs to pay for storage upfront. When a client requests 100 chunks are stored, the elders each come back with a price for signing the names of those chunks.
The eldersā€™ quotes should be the same. Any elder quote that is wildly different would suggest a faulty elder, and the client could flag that fact back to the section so it can be dealt with.

The client then pays the quoted amount before storing its data.


Useful Links

Feel free to reply below with links to translations of this dev update and moderators will add them here:

:russia: Russian ; :germany: German ; :spain: Spanish ; :france: French; :bulgaria: Bulgarian

As an open source project, weā€™re always looking for feedback, comments and community contributions - so donā€™t be shy, join in and letā€™s create the Safe Network together!

68 Likes

First !!!

22 Likes

second darn it

14 Likes

great! step by step it feels like the team is solid!

11 Likes

Is disconnect considered as restart? When power or network connection is lost.

Pays to who?

12 Likes

If it breaks responsiveness then yes. Responsiveness is relative to workload and not time. So a busy network will mean you can be off for shorter periods before relocation kicks in.

Each Elder e.g, they say to store these X names is Ycoins. You pay the Y and they sign all X names and the client aggregates the sigs to form aSectionSig for each data name. Then the data is network data and anyone at any time can store it or restore it (archive nodes) and so on.

15 Likes

Does the data always go via elders when stored and retrieved? Is there a risk of a bottleneck there? Has there been any modeling or back of the envelope calculations about it?

8 Likes

Why not let it be ok for a node to fill 100%? After they are full then they would just operate in read only mode.

8 Likes

when storage fills there are OS performance issues

3 Likes

At the moment, yes, they all go via the Elders. But there is some ā€œthinkeringā€ going on about having clients contact Adults directly to write/get data. Signed data allows us to do this :slight_smile:

14 Likes

ā€˜Thinkeringā€™ :grin: Iā€™m stealing that!

12 Likes

But then adults IPā€™s are known to clients? I was thinking elders would scrub those.

7 Likes

Aye, Adult Shielding is legit and might probably keep it too. We could maybe have this configurable, if Adults are OK with sharing their IPs, we can use them as GETTER nodes to increase throughput and keep rotating them w/every churn, etc. Another option is to scale Elder sizes as the number of sections grows. Multiple possibilities there, though every one of them needs to be scrutinized with security and speed in mind :slight_smile:

12 Likes

Thanks so much to the entire Maidsafe team for all of your hard work! Keep the magic going! :racehorse:

10 Likes

I think this sounds reasonable. I suppose there is enough people OK with sharing their IPā€™s as an elder, so the GETTERs could also be a new age class before elders, ā€˜wannabe eldersā€™. (Or actually it would be ā€˜rankā€™ as one could be older but not willing to become an ā€˜elderā€™ if you are not OK with sharing your IP). I think it is also good to have an option to be ā€˜shielded adultā€™, so people can earn also in regions where running a node publicly is too big a risk. GETTER nodes sounds like a great way to increase throughput of the network!

11 Likes

When the client wants to get some data, it makes a request to the elders in the section closest to the dataā€™s name.

What happens if there are millions of requests for the data every second? Also, how does the client know enough about the topology of the network to make that request directly?

5 Likes

It will be nearest in Xor terms, so the request will be automatically routed to the right section.

7 Likes

During Get if a cache miss or via put from non-section adults i.e. not during relocate.

9 Likes

Great update, I didnā€™t even know about the community test net, but WOW! Good job! Onwardā€¦

13 Likes