EPHEMERALITY and data-persistence?

I thought this was only an alpha network limitation?

Nopes. From RFC

Maximum size for a serialised MutableData structure must be 1MiB;

(Better source would be the source code due to the RFC being out of date).

Other things have changed (entry count raised from 100 to 1000), but size is still fixed for Mds.

1 Like

That is upto, not fixed sized of 1MB

1 Like

Yes that’s what I meant. Fixed upper limit.
Sloppy formulation.

1 Like

This is tied to the chunk size that the network uses, right? ie. In order for it to be easily mutable, it must stay in one chunk, which is controlled by a single section.

2 Likes

I was thinking about the case of custom classes with any number of properties, whose type can also be such an object. So the level of nesting can be any (even though at some point the code smell will become noticeable).

Writing such an object to MDs, where you need to unnest and both try to fill each MD as well as not exceed size limit will be OK with small objects but quickly grows in operation size.
Then if there is some part of the object that is a single data chunk larger than 1 MiB, you need to device a way to split it.

EDIT: (This is what I mean with that it wouldn’t scale so well.)

It just seems overly complicated to me (also the reading of such a designed data source), when immutable data solves it.
The downside of course, is that you have no choice but to eternally commit that data to the network, regardless of your needs or intentions. That lack of freedom is there, and it would be nice with a tad bit more freedom with regards to this.
I mean, it’s not that it would hurt me to commit it eternally, but if I am producing loads of data that I need secure and replicated and globally available for a while, but that it afterwards is pure junk, then I have no other option than to burden the network with the junk. I personally would like a way to not burden the network, even if it would be without additional costs for me to do so.

1 Like

Same thoughts here with respect to scientific computing. Neo has a good idea for temp data attributes…

1 Like

Yes exactly.
It might produce very large datasets, could be that only afterwards it is known what would be valuable to keep, if any. And in the meanwhile massive storage is needed, and replication because it is potentially very valuable, panavailability because of worldwide collaboration over the data, security and anonymity because the data could be highly sensitive.

All of this says: Use SAFENetwork.
Except that thing about only a tiny fraction being valuable to keep in the end.

Would it affect the network? Some estimations could surely be done with some digging.
But this kind of behaviour would be wasteful regardless, that is for sure. (Assuming we utilise immutable data for storage).

But then again, using purely MD’s for it would also be possible. Just needs some more preprocessing clientside. I haven’t looked closely at the ImD implementation and flow, maybe the same kind of operations is needed, since all is based on chunks, just that it is managed in network instead of client side…? Or maybe the Md abstraction imposes a significant cost?

One way this used to be done was chaining and easy enough to apply generically to MDs. Methods to work with limits on storage abounded in the days when main memory was measured in KWords cache was a drum of a couple MWords and each disk in <40MWords

So your data object could be of variable size since you want to allow for data types that contain blobs of no fixed size, would be a series of MDs.

The first MD of the data object would have an address (probably stored in an index scheme) then the address of the second is stored in a field of the first MD. The 2nd and thereafter MDs do not need a set address or tag type since they are linked to from the one before.

In Fact you could have the 1st MD also contain links to (addresses of) all the chained MDs and if you have too many then chain that too.

Thus the first MD of the data object contains fields. And if more than one MD needed then the 2nd and subsequent ones are chained. The data object starts on the 1st field after the links (addresses). And you could even have the elements in the data object as fields too and include an index of where each element is after the links (addresses) but before the 1st part of the data object.

1st MD of single data object requiring “n” MDs with “x” elements

  • field 1: address of next MD
  • field 2->n: (n-1) addresses of MDs 2->n
  • field (n+1) → (n+1+m): addresses of MDs containing the start of the corresponding element
  • field (n+1+m+1): 1st element of data object.

And when there is not enough fields and/or space continue the fields in the next MD. Each MD has the 1st field as the link (address) to the next MD in the data object chain.

This way the indexing of your data objects only points to one MD for each data object and then the data object is its own series of MDs with self contained indexing.

2 Likes

Definitely time you got coding @neo :slight_smile:

5 Likes

It’s great to read about how used to be done (like your other post here before, good stuff!). The technical limitations really demanded great skill.

About this MD stuff, I’m aware of the possibilities to do it like this, I use a similar scheme in my SAFE.Datastructures, for some parts. But there chaining is for stacks and queues, where it’s inherently expected to work like that, and you don’t access all MD’s in a chain in one request (as you would if it was fragment of a single object), rather one at a time, when needing the next object. The other case with tree structure is faster for large objects but there I use immutable data in the end anyway as it is a keyvalue store and it’s simple to not have to analyze the value to be stored and split it up etc.

My concern is the performance as compared to using immutable data. (The additional complexity is OK, someone will do a wrapper in every language.) But I don’t know yet if my concern is unfounded.

1 Like