Naming non-immutable data

Names for non-immutable data (ie sequences / maps, do we have a catchall name for these?) are current set randomly by the client (see sn_api/safe_client L249 and L427) which means clients can pick any name they want.

This seems potentially dangerous and unnecessary, since we can end up with a lot of data permanently in a very tight group at any particular part of the network, which would be much more difficult / expensive to do using immutable data (see data density).

It seems like we have a pretty simple way to set the name safely and automatically:

Use the hash of a random bls publickey+signature (eg add a locationProof field here, and the address is the hash of that field). This means the name is unpredictable, verifiable, evenly distributed. The name can come from any random publickey but must have a matching signature for some constant predefined message (eg [1u8; 32]). A name can be chosen without any network interaction (just like immutable data). But the name ends up being well spread out across the network.

To create dense data an attacker would have to iterate over many public keys + signatures to find results that all end up close to each other. The similar technique for immutable data would be to iterate over many random blocks of data to find results that end up close to each other. Generating a name using a bls signature takes about the same time as generating a name by hashing 1 MiB of immutable data (see stats in bls performance).

The downside is it adds extra data to the object (144 bytes = 96 byte signature + 48 byte public key, the const message data can be excluded).

Should we use something like this to name non-immutable data or keep random names? Any issues with doing it this particular way or is there a better way to approach it?

14 Likes

Main issue id see is the removing of the ability for apps to deterministically generate a xorname for data.

NRS in its current form relies upon knowing that the xorname can be derived from the human readable address string.

3 Likes

This makes sense.

And so does this. I wonder though is only NRS needs this and can be limited? We are using (v soon) multimap for this one. It’s an interesting one though. If all other types were self-naming now (esp as we moved form changing owner in place to change owner means new data location).

1 Like

Indexing where the client chooses the name based on the content, like semantic hashing, might depend on this though. Or is there some other way to do that?

2 Likes

I think @david-beinn 's search concepts might rely on this also.


I do agree self naming would be nice. But I’m not sure how you’d achieve this otherwise… If it’s NRS only, are we limiting that to a specific type-tag on data? If we’re doing that… why not a specific range?

I guess in the end you could pay more for naming your own data eg. So the option exists where its useful… That is all more complexity though

3 Likes

There’s an xorname derivation that could replace sha256(nrs_name)

Third step is the important one

  1. nrs name: mav

  2. nrs name as bytes: [109,97,118]

  3. nrs name left-padded to make it a bls private key:
    [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,109,97,118]
    as hex 00000000000000000000000000000000000000000000000000000000006d6176

  4. nrs name public key:
    [139,244,172,42,45,74,93,215,10,103,67,195,161,233,115,211,149,38,124,104,131,101,226,66,140,18,242,0,147,32,161,115,108,248,5,39,81,22,33,4,66,106,251,236,99,115,163,221]
    as hex
    8bf4ac2a2d4a5dd70a6743c3a1e973d395267c688365e2428c12f2009320a1736cf8052751162104426afbec6373a3dd

  5. message (const): [1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1]

  6. signature: [128,244,8,212,115,161,238,125,197,14,55,187,82,1,210,36,162,44,59,211,21,204,91,252,141,23,210,221,55,2,208,248,37,249,220,239,227,11,31,135,128,240,128,232,147,140,30,88,17,57,187,154,164,46,34,172,149,235,85,119,231,51,72,17,104,58,18,53,54,126,240,59,118,168,185,234,122,146,196,14,147,4,254,41,109,157,205,34,4,252,25,5,212,34,224,241]

  7. publickey+signature: [139,244,172,42,45,74,93,215,10,103,67,195,161,233,115,211,149,38,124,104,131,101,226,66,140,18,242,0,147,32,161,115,108,248,5,39,81,22,33,4,66,106,251,236,99,115,163,221,128,244,8,212,115,161,238,125,197,14,55,187,82,1,210,36,162,44,59,211,21,204,91,252,141,23,210,221,55,2,208,248,37,249,220,239,227,11,31,135,128,240,128,232,147,140,30,88,17,57,187,154,164,46,34,172,149,235,85,119,231,51,72,17,104,58,18,53,54,126,240,59,118,168,185,234,122,146,196,14,147,4,254,41,109,157,205,34,4,252,25,5,212,34,224,241]

  8. xorname is sha3_256(publickey+signature): 10a07627d741413051179fb96e33cb718da210beae940516e9866ebca00d952e

Want to register safe://mav? Create the map object at xorname(10a07627…)

Want to read data from safe://mav? Read the map object at xorname(10a07627…)

1 Like

Ah right, you had me there. So everyone looking for safe://mav will do these 8 steps to figure out xorname(10a07627…) ?

I am missing what this achieves though beyond just sha(mav) ? (Monday morning :smiley: )

The existing problem of allowing any random xorname name for mutable data is I’m able to dump TB of data at any specific location.

If I fill xornames 000...000 and 000...001 and 000...002 and so on with the maximum 1 MiB of mutable data content, nodes in the 000000... section will need to store potentially TB of data that cannot be easily spread across other nodes because the data naming is too tight.

This proposal a) make that degree of data density much harder to achieve by not allowing names to be so specific b) still allows nrs style deterministic name derivation where needed

5 Likes

You mean only for NRS? as for other data (atm) folk can pick any name and fill nodes?

I mean nrs is derived using the bls-private-key trick a few posts above.

Search terms etc with deterministic locations can be derived using some similar-but-different bls-private-key-trick so it doesn’t conflict with nrs locations.

All other mutable data types can be at any location, so long as it is a ‘proven random’ location by coming from a hash(pubkey+signature) value. Attackers can still try to get the name in a specific location by iterating, but there’s a fairly limited possible resolution for the name.

Framing it from the point of view of a node - if my node starts to receive data at name 000...000 then 000...001 then 000...002 I can see I’ll have a problem pretty soon and I can’t do anything to stop it. But if this new proposal is implemented, I would never expect to see data at name 000...000. All mutable data is named ‘verifiably randomly’ rather than just ‘random according to the client’.

edit:

for ‘normal’ mutable data, instead of deriving a private key from a nrs name, any random private key is used. But because the naming process must then go through the extra steps of signing+hashing, all names are ‘verifiably random’, even those using non-random initial private key data.

3 Likes

Yes, I agree and feel we must do this.

Cool, so NRS specific. What I am not seeing though is how this is better than hash(name) as unless we are picking weird names to get a close hash then they do the same thing enforcing randomness. Or do you mean even trying to repeatedly hash data to get a close hash is easy, but your proposal adds work to that?

2 Likes

Ah I think I see what you mean here.

But just to make sure, I should clarify, do you mean client picks any random name, then nodes do hash(name) to get the final destination?

So it would mean an attacker chooses name 000...000 but the attack is avoided by nodes saying ‘we will store this data at hash(000…000)’. Is this what you mean?

To repeat the same question for nrs, a client resolves the nrsname to a derived_xorname by hash(nrsname), but then nodes will say ‘I will fetch that data from hash(derived_xorname)’? Clients only ever deal with the current naming paradigm, but nodes add one extra layer of hashing to the final location? Do I read your idea correctly?

If so, then yeah what you suggest is simpler and works equally well to solve the problem.

The reason I went more complex is I’m always trying to find ways to jam bls into the potential path of an attacker so we have the best possible chance of heavily optimized bls. If we only use hash for naming then there’s one less reason to focus on improving bls ops. Not great logic I admit! When I see ‘attack vector’ I usually think ‘add bls’. Same with my hope of replacing proof-of-work-by-hash with proof-of-work-by-signature, it’s a way to get bls as fast as possible (and therefore routing and node performance should also hopefully be very fast too).

3 Likes

My search concepts rely on whatever is available when the network launches!

I definitely would have liked to quickly get to addresses by knowing they were derived from something (if that could have been done securely,) but I was misunderstanding a few things about the nature of mutable data and the NRS system.

(Currently my plan is to start as simple as possible using the NRS system for index/data storage, and focus on experimenting with a genuinely decentralised architecture of the type you were advocating, by building an app(s) that make it easy for people to be part of a network of connected indexes. Might fail for pure search, but hopefully will produce some interesting multipurpose benefits anyway.)

Don’t want to derail the topic tho…

4 Likes

Yes that’s the idea

Yea, I think this works.

I am very keen on this in fact I found https://www.researchgate.net/publication/221010535_Threshold_Signatures_Multisignatures_and_Blind_Signatures_Based_on_the_Gap-Diffie-Hellman-Group_Signature_Scheme/link/55c4f94708aebc967df38448/download just recently.

1 Like

If the bls private key is 32 bytes, doesn’t the bls method limit the length of NRS names etc to 32 characters?

Maybe regular mutable data gets a random name from the network, but mutable data within a certain typetag range, say 1000-2000 or whatever, has an extra string parameter that will be hashed to get the name? Then typetag 1000 could for example be used for NRS, 1001 for one particular algorithm for semantic hashing etc.

1 Like

There’s always some way to derive even from long names. Maybe for long names we can xor the first 32 bytes with next 32 bytes then the next 32 bytes etc until no more bytes in the name. nrs names do have some length limit, but this problem may also affect, say, search terms. Main thing is there’s always some (maybe convoluted) way to derive an xorname.

But I think as dirvine says, the extra complexity is not needed.

2 Likes

One related area @mav is the possibility of having data names Sha512 again. Not node names as 256 is plenty there, but for data names themselves, I have a feeling 512 is perhaps better. Then no tag types etc. just an immense space for names to exist in. Whether NRS limits to 256 though is probably a good debate as 256 there may be plenty.

I have been toying with this as opposed to type fields to allow a few things including tiny chunks (bigger name sounds counter-intuitive, but it’s a much larger namespace). Be good to get thought on this one (previously everything was 512 but caused hassle with pub kep mapping for node names etc.

9 Likes

Hurray! Yes, please go back to 512 for data chunks. Iirc that was your original plan. It was a good plan. :+1::+1:

Can you explain a little more about what the problem was?

5 Likes

Nothing major, just here name == public key. So using a 32 byte address meant direct mapping. So we can see X sent a message and X signed it. Making that 512 is easy but messy.

1 Like

I’ll have a look. Are you interested in this mainly from the perspective of improved performance compared with bls12-381? Or is it more from the perspective of an improved feature set?

Why is 256 not enough? 512 is possible, but why? 256 is plenty and probably will be for a long long time.

My understanding is the need for tag types is not to do with the bitsize of xornames, it’s to do with a simple way to disambiguate the same human-meaningful identifiers across various different uses (eg mav the nrs name vs mav the jams artist vs mav as immutable data). I don’t see how 256 vs 512 changes this.

I’m not sure how 512 improves the ability to include tiny chunks. This is probably just my misunderstanding. I thought the limitation for tiny chunks is a practical one, needing a fixed network routing overhead for both a tiny chunk and a full chunk means full chunks get the best efficiency, which nodes and clients will probably both want. Changing to 512 won’t affect the efficiency.

I can’t see any way we would run out of 256 bit names. But if I’ve misunderstood this I’d love to hear the motive for moving to 512.

To go on a bit of a brainstorm tangent, I have often felt it’s good to think about what happens if we have less than 256 bit names, say 128 bit names, or 64 bit names. Then we would definitely end up having different data at the same address, and this raises interesting problems around how to manage data collisions. If we allowed data collisions we would need some network-level way to resolve which data was really being requested, and this could feed into areas such as search where there are multiple results for any one search term. I’m definitely not advocating for this, but I’ve found it to be a useful thought experiment. The idea maybe has some overlap with 256 and 512 namespace on the same network, since perhaps every 256 bit name could have an extra 256 bit names within in (in theory, but probably not in practice).

4 Likes