Safe API - Registers

The idea is, Pay once for the register and add keys at will afterwards. This is to save processing and hassle for users and the network.

The register is an important part of the network and the part that allows for info to change, but all the history to be maintained in a manner it cannot be removed. i.e. Perpetual data.

5 Likes

With the register being append, does it have a limit to its size (like 1MB at the moment) or does it grow by adding more storage when needed?

It has a limit of 1Mb. The missing part is to make these extendable. Things like last entry is a link to another register and so on. The issue there is the current data would be at the end and would mean traversing many registers to get there. However it’s not as bad as it sounds. I will try and explain

  • We have files/directories/websites etc.
  • They point at a register and entry for each “file”
  • If registers fill then a new register is created
  • New updates point to new register

Then the only time traversal of many registers would happen is going way back in history and not reading current data.

3 Likes

So you pay for 3 PUTs when you use self encryption, and 1 PUT if you encrypt by yourself, or don’t encrypt at all? Let’s wait and see who’s going to use SE :sweat_smile:

Does it mean each register’s history functionality is not used?

We don’t and cannot force folk to use that. But if they want information secure encryption and deduplication etc. then it’s simplest to use it. If they want to roll their own or have no protection, then they are free to do so.

It can be used whenever you wish to see history. So things like file browsers with a time machine functionality would use it, for example. It also makes data perpetual. So if you have a web site or document set that points to files, then they will always work. As files change they have a new register entry, but the old files are always there. No more broken links

2 Likes

Isn’t deduplication based on content addressing? So, If I uderstand well, people uploading two identical files end up with the same address even if content is not encrypted?

Only if they use the same encoding mechanism. So rolling your own etc. would take this function away

2 Likes

That could be a strategy to save a little for files under 1.5 MB. Larger than that, it would not make much sense, as the largest allowed chunk is 0.5MB and the file would have more than three chunks in any case.

Only is helpful cost wise if your file can fit in 1MB or 2MB. That is becoming less likely as time progresses and files become larger due to higher resolutions, better experience, and so on.

The content here is the final encrypted (or not) chunk. So if I upload a file with self-encryption and upload the same file without encryption then they will be different content as far as the network is concerned.

Its been 1MB max since a long time. When did it change? I realise after compression there maybe some space saved.

I do believe the 0.5MB comes from people averaging out the amount of space taken up by chunks with the small files uploaded reducing the average.

I’m not sure I follow the question.

Here we had two ~4 byte entries in a register, the register size is 715bytes. That is the overhead of serializing the register for storage. So the real storage space.

Indeed, folk seem to be hung up on 32bytes (I understand why, mind). Our NetworkAddress type seems to be larger than this already, so restricting to 32bytes doesn’t make sense if we’re talking pointers I think…

Which is what’s outlined in the napkin math w/r/t the register/crdt overhead there. Seems to be ~350 bytes per entry.

There could well be a scratchpad type of data eg. But that is another question and off topic for registers I think?

Uff, that’s not wildly generous honestly. I’d encourage all folk here to come at this with a view of “good intentions”, rather than presupposing a decision is being made and we’re just trying to defend it.

I’ve mentioned above, and caveated many times this is not decided. But I’d be keen to figure it out. If folk don’t want to read those messages, or decide that’s not my intent for some reason… I don’t know what to do there :person_shrugging:


I’m in here asking for feedback and laying out what I’m seeing, and then asking questions:

  • If we want to keep registers as a generic array, what is the correct size and why?

That is the main question I don’t see answered. My napkin math was one attempt at this, not to prove anyone wrong but to get somewhere concrete. As the current size/ratio is way off vs the workload of chunks and pricing…


CRDT related work is not actually on the nodes, but the client, so that’s fine I think. There are more validations w/r/t permissions the node must do, so that could be factored in.


Some time ago when looking at some connection woes and package size optimisations. That could be revisited, as I’m not sure how applicable it will be.

3 Likes

I just assumed if we have 1024B registers, then regardless of the size of actual data, entire kilobyte is written. I’m happy I was wrong :slight_smile:
So, we are getting somewhere. Our maths meet here: 500B - ~350B = ~150B ~= 138B :slight_smile: And that is with assumption, that we keep 1024 points of history.
Do you think it would be possible to keep max register size constant, and let devs put larger amounts of data, in exchange for history capacity?
Do you think reducing the 350B overhead is possible? It’s especially worrying if we want to store ~32B pointers there, that’s 10x overhead…

You’re right, sorry for that. I should perhaps rather say “core team” than “core devs”. I did not presuppose anything, just talked about what I read in this thread. Perhaps that’s was also a result of me wanting to reply to entire thread in one post.
I think everything is ok, I read the messages, and understood your position, that nothing is decided yet. Although, I feel like David’s look on this is the opposite, and that was mainly what I wanted to refer to.

1 Like

Could be. But more complexity there in any/all calculation, so ideally avoided.

Also could be… but we’re looking at serialisation with msgpack here (that is not optimal, but is cross-platform… perhaps it should not be). But there’s a lot of structure to this data to get the CRDT benefits. I woiuld not assume we can shrink this a whole lot.

I appreciate that :bowing_man: . Yeh @dirvine has a clear preference here, I think for the simplicity it engenders. It’s been working very well for us in general, and neatly sidesteps the how big is right question entirely.

I am unsure in 32byte pointers, which work for chunks, but does not for registers? (they don’t the address is larger due to xorname + pub key being used there), so what would we do there? Put it in a chunk? (could be!)

There is knock-on, but also neat neat simplicity and shifting of data parsing to the app…

And there are concerns of devs here, which I’m trying to understand.

For example, i do get: “just let us be creative”, “give us the flexibility” conceptually…

But I’d like to know why an API which is the same as it is now (pub fn write(&mut self, entry: &[u8]) -> Result<()>), is more restrictive (because under the hood… chunks)??. In what way are folk imagining things playing out here that I’m not seeing??

GET concerns (which have both been put forward as a reason not to do this, and dismissed elsewhere) play in if we must fetch each entry… But must we? (And to what extent it will be a concern leans into the overall number of entries in a register).

The data size argument feels off… if we wanted 1024 byte entries (and why that?), well our register is going to be about 10x the cost of a chunk just off of data storage alone. Is that reasonable? Is that restrictive?

Is keeping the register the same price good for devs? (it is simpler for the network, a good deal especially vs a variable length/size register…). But then it becomes unusable beyond chunk xorname storage (or other 130byte endeavours)… (which shifts cost per transaction onto the rest of the network, which is also easier.)

Perhaps we could start from economic point of view, and ask what price is ok for a register with 1024 versions (again: why that? :slight_smile: )? I think it shouldn’t be much over the price of a chunk, so perhaps 2x chunk’s price and size (assuming chunk size 500KB and 1024 versions)? That would give us (1024 - 350) ~= 674B for an entry data, looks more usable than 138B.

This method signature seems just right for me also. But it’s a high-level one, abstracting out all implementation details (will history capability be abstracted too, for example assigning new registers under the hood as the history grows over 1024 versions?). Perhaps beside this default implementation, there could be also a low-level API available, where a dev could decide on the details and develop own implementation?

Only things that should be decided are on a protocol level, like a maximum size of a register or a price.

I think devs can be interested in developing own implementation to gain some performance from avoiding indirection, and can sacrifice some version storage or development time it takes to deal with complexity of their own design.

What calculation are you talking about? I think all calculations should be left to app developer, if he wants to use low-level APIs. What network should do is only reject updates, that make the register exceed allowed size. And in current, default high-level APIs the calculations are already done, right?

Yes, I agree I would change my 32Byte to NetworkAddress or whatever means we point to a network address whether it’s a node, register or chunk.

For interest I have a register_history example working and will be submitting a PR for this once I clean things up.

If you remember, the existing register example simulates a simple chat with two or more users all updating the same register. With this additional example you can get the current state and history of that (or any other register). Here’s a sample of the output including a history of all Nodes (Entries) in the register after two users have exchanged a few lines (three each - their messages are numbered for clarity):

Current total number of items in Register: 6
Latest value (more than one if concurrent writes were made):
--------------
[alice]: this is alice 3
[bob]: this is bob 3
--------------

Enter a blank line to print the latest register history (or 'Q' <Enter> to quit)

Syncing with SAFE in 2s...
synced!
======================
Root (Latest) Node(s):
[ 0] Node("4eadd9"..) Entry("[alice]: this is alice 3")
[ 3] Node("f05112"..) Entry("[bob]: this is bob 3")
======================
Register History:
[ 0] Node("4eadd9"..) Entry("[alice]: this is alice 3")
  [ 1] Node("f5afb2"..) Entry("[alice]: this is alice 2")
    [ 2] Node("7693eb"..) Entry("[alice]: hello this is alice")
[ 3] Node("f05112"..) Entry("[bob]: this is bob 3")
  [ 4] Node("8c3cce"..) Entry("[bob]: this is bob 2")
    [ 5] Node("c7f9fc"..) Entry("[bob]: this is bob 1")
    [ 1] Node("f5afb2"..) Entry("[alice]: this is alice 2")
      [ 2] Node("7693eb"..) Entry("[alice]: hello this is alice")
======================

Current total number of items in Register: 6
Latest value (more than one if concurrent writes were made):
--------------
[alice]: this is alice 3
[bob]: this is bob 3
--------------

Enter a blank line to print the latest register history (or 'Q' <Enter> to quit)

The number in brackets before each Node is an order derived by traversing the history of all Nodes. It doesn’t exist in the Register but shows the order in which each entry was applied (in reverse, so 0 is the most recent node).


It’s amazing what you can achieve when you don’t hang out on the forum so much! :laughing:

Working on this has given me a better understanding of Registers on Safe Network. Regardless of my views on what an Entry should hold (key and/or data) Registers are a very powerful and flexible data type, but because of their power, they’re a bit tricky to understand and use.

What Power?

  • local first storage means local first software.
  • synchronising over Safe Network with multiple replicas using underlying MerkleReg CRDT from the rust crdts crate. There are a lot more data types in there too.
  • a history of all previous states of the Register and its entries

Tricky Bits

To get the power this is an unusual kind of data structure and not instantly clear how to use it. In time there may be higher level APIs to simplify this by presenting implementations of more familiar data structures on top of the Register, but for now you need to understand how to write entries and how to access entries in a way that makes sense for your application. None of that is straightforward because it is unfamiliar and not well documented yet.

Writing Entries though is easy! Getting the particular data back in a form useful in your app is not so easy. For example, you don’t have an array, or even a map of variable names to values. What you have is a set of one or more current values and access to the full history, or to any particular point in the history if you know the hash of the Node you want.

So using it like an array or a collection of values is complicated at the moment, but if you look at the history above you can see the makings of how to do that.

For applications that just want the latest value (e.g. of a file or document) you could use one Register per document and track its history though the sequence of entries.

But where you have multiple related values - where you want to know the state of a set of values at a particular time - as in a filesystem time-machine style, you need to manage those multiple entries in some way. I’m still scratching my head about the ways you might do that, but at least we have a tool to inspect the history of entries in a register now.

But all budding geniuses will though be able to store arbitrary data (probably by reference rather than in the Register) and create an endless history accessible by traversing the Entries starting from the most recent.

This is incredibly powerful. I don’t know if there’s any other system able to do this.

Hats off to MaidSafe for this innovation… hang on, what’s this…

:laughing:

13 Likes

Phenomenal

If you imagine the node is a chunk then it comes together in an easier fashion. That chunk can be a data_map (encrypted or not) plus metadata. It can be tagged data, SOLID data and more, if you leave all the type info to the chunk and the app parsing it then the register as the glue we need to tie it all together, then it becomes more clear.

This is how I think of it, Entries have the ability to hold history of anything, even branching data. They don’t know what they point to and should not know. However they guarantee CRDT growable lists/trees and as you say it’s all off-line first and publish. Even in network outages you can remain local and sync between many different data types as long as you use registers to link the data types. So this link/pointer or whatever is vitally important.

It’s brilliant to be at this point to actually get to the nuts and bolts of registers and chunks. I feel we can achive an awful lot. People focus on self encyption and that’s great, but it is only a way to handle and secure a ton of bytes. Registers are what makes those bytes, files, maps, media or any kind of info we want.

What will bring out the biggest benefit will be inking data, not whole data items but parts of data, So assembling complex reports and view of disparate data sources in ways we never thought possible. This takes us right back round the RDF/SOLID route where links via tags become sensible. If we can autotag (should I say llm/AI tagging) where we already know we have the technology to accurately tag data, then we can have and RDF like system that is not reliant on human labelling at all.

Then …

8 Likes

I’d been pondering what I might build using this once I figure out a design pattern, and implementing LDP containers seems a good place to start. Still wondering though.

Other suggestions welcome.

One limitation for RDF related stuff is the lack of Rust libraries - there really aren’t any - but I’m hoping that compiling JavaScript to WASM will solve that.

1 Like

I agree, rdf etc. is a long range use case. It’s worth back and forth on some app ideas here. My interest is not replicating web 2.0 or whatever it’s called but doing things not previously possible. So I suspect some apps may look very different to today’s.

If we can link everting in ways html could not (so history being obvious, never dying links, another) and perhaps then looking at how to make such links as simple as possible then I think we are at a good starting point for apps or data handling that is new and valuable.

Random ideas

  • Live forever blogs/sites/notes/tweets etc.
  • Idea evolution (papers linked by more than a bibliography but all with history, fixes, rewrites and in graph form where we can see what papers emanate from what research (other papers))
  • Personal data time machine (this one is quite easy if we have a FS based on registers)
  • Content publishing with automatic payments (PtP) - Pay for Get for data is still an option, we cannot enforce it but can help for sure. Others always able to copy and remove payment, but if your brand is strong then honour and stickiness will keep folk wanting to get the data from you.

I am sure there are tons of them that I have zero chance of seeing today, but with us all searching then we will find those crazy new and unique ideas

6 Likes

Well done, even though I didn’t understand much of it. :clap:

Did writing that do anything to your views about entry size?

1 Like