Feature Name: Naming of Immutable Data Types
Related components: Routing and SAFE Vault
Start Date: 26-03-2016
A sample implementation including the benchmark tests is available here.
For security and integrity of data, the design of the SAFE network requires that every chunk of ImmutableData be stored at three separate locations. These locations are derived deterministically from the name of the chunk. The current implementation uses repeated hashing of the chunk’s name to derive the secondary names, which could result in all three locations overlapping. This RFC proposes a solution to resolve this issue, which also has secondary benefits.
As discussed in the summary, an ImmutableData chunk is stored in three locations on the network. We manipulate the name of the chunk to derive the locations for the copies. The “Normal” name is the SHA512 hash digest of the contents of the chunk, and this is the primary location. The secondary locations are defined by the “Backup” name which is the SHA512 digest of the Normal name and by the “Sacrificial” name which is the SHA512 digest of the Backup name.
By using hashing to derive the Backup and Sacrificial names, there is a chance that two or all of the locations will overlap. In other words, a Vault could end up managing and storing the Normal, Backup and Sacrificial copies of a single chunk. In an ideal scenario, these locations would have no common Vaults, i.e. no Vault would be responsible for more than one copy of a given chunk.
The proposed solution involves generating the Backup and Sacrificial names by XORing the Normal name with known values. As the “distance” between two addresses on the SAFE network is calculated by XORing them, and since XOR is a commutative operation, the values chosen to XOR the Normal name with will effectively be the distances from the Normal name to the secondary names.
Since we want the distances between the groups to maximal, we can take the “maximum address” (where this is (2^512) - 1 (i.e. 111…111)) divided by two as the target distance between groups. This means we can define the distance between Normal and Backup as 2^511 (i.e. 100…000) and between Backup and Sacrificial as (2^511) - 1 (i.e. 011…111). For any network comprising at least three times “group size” Vaults, there should be no Vault responsible for more than a single type of chunk for any given chunk.
The proposed solution has a useful side-effect: for a given name type, both of the other names can be deduced. This cannot be done for hashing; given a name derived from a hash digest, we cannot calculate the original name, i.e. the hash cannot be reversed.
This is useful since a Vault which is managing a Backup or Sacrificial copy of a chunk would now be able to contact the Normal managers without requiring an actual copy of the chunk.
A further benefit is that XOR is a faster operation than SHA512 hash; in benchmarks, using the proposed solution was between 15% and 20% faster than using the existing implementation.
The proposed change would define the following constants and function:
These would be used in ImmutableData’s implementation as follows:
It would also allow us to define the following functions to convert between name types: