Appendable Data discussion

I just looked and there is append only in it. But that is after all the language that does not suggest append only. So its contradictory language and the reason why so many missed it.

2 Likes

Alternative idea to append only data type.

Keep the current commonly understood idea of MD - mutable data type

  • Allow a copy of previous versions to be kept as the default
  • allow MDs to be created with version copies set off
    • Once the MD is set to version copy keeping on then it cannot be turned off unless no mutations done since it was turned on
    • For version keeping off then an optional version keep flag can be set to allow a version kept on a case by case basis.
  • allows applications to access the data without the need to reconstruct the said data.
    • this allows collections of data records (ie database of any sort) to run at maximum speed without the increasing slowdowns caused by reconstruction of data and the associated lag time when reading additional MDs caused in time by append only data types
  • applications including the browser know the status of the version keeping flag in the MD and can reject such MDs if desired or flag them to the user as temporary data.
  • applications using collections of records (ie any sort of database) can either use version-keeping or not depending on the type of application the records serve
    • eg a private database of ones music collection does not need to have versions kept and is up to the person keeping the collection records to decide.
    • eg a health record database would definitely have version keeping turned on.
    • in both cases neither is slowed down by the having to trace through previous changes to reconstruct the current data.
  • allows the concept of private temp files that can be deleted and the MDs reused without multiplying the data stored on the network
  • Coin MDs are always not keeping copies and its optional.

Immutable files (chunks) are a separate data type and not covered by the above

  • They will be used for the main file storage and fulfils the perpetual data for that data type
  • Many web pages will be stored as immutable files and thus kept anyhow
10 Likes

@neo Love the proposal. What about storage cost (and cost implementation/mechanism) for MD? Any thoughts on those problems? It seems like it would be complicated EDIT: but maybe current farming mechanism works for both immutable and MD?

I wonder how much of global data would be MD versus immutable.

Mods - can we have a thread for all of this?

1 Like

From a philosophical perspective this is really interesting. It’s about preserving truth, radical honesty, and accountability. In so many ways this would enhance society but yet it can be used as a tool against us and our freedom. I can’t help but wonder if this perspective is derived from seeing any objective naturally occurring event as a natural design pattern. Where I think it may miss a little is the human element which is also a natural design. We want to have the freedom to say and do as we choose, grow, change our minds, preserve our public personas (out of ego), without having it used against us. This is especially applicable to young folk, IMO. Maybe the implementation won’t change that and I do believe people live under the illusion that once they delete something it’s gone but then again most aren’t considered important enough for the lesser amount of sophisticated people capable of exposing some small mistake they made, etc. to do so.

Since it’s people who have to adopt the network I think the implementation and marketing of these kinds of changes should reflect that as much as possible. Again, I thought this was already possible with MD versioning. From a societal impact POV, I just feel silly it was under my nose in the fundamentals all this time and I haven’t thought of it in this way up until now.

7 Likes

Thats a question that will be discussed when safecoin is being implemented. There was suggestion above that the costs will be different between writing to an existing MD and creating one. Also the suggestion was that there might be a difference between writing a small piece of data to writing 1MB of data.

I am getting the impression that this drive for appendable data is viewing the storage of PUBLIC files/data such as web data/sites and the desire to prevent the public being disadvantaged by deletion/changing of data they previous saw. Plus the fundamental of perpetual data. That is all well and good and I support it for public data/files. But the issue around private data is ignored above and the equating of perpetual data and appendable data as if appending is the only solution.

Unfortunately it seems that the debate is being reduced to this view of SAFE’s data and valid concerns around other areas is being trivialised. Now maybe this is a focus for alpha 4 (Maxwell) and a way to expedite that. Unfortunately if appendable data is the only way then it is going to cause a lot of problems for a lot of applications and speed. And the adoption of SAFE by 90% of businesses which is where the money is when they start storing the massive amounts of data they keep in data collections/warehouses.

4 Likes

Once people start storing their private and public videos on SAFE then immutable files might exceed appendable data. Its no longer mutable data (by definition)

1 Like

Agree that it’s odd this discussion should start here! Certainly seems that whatever is decided it would be strange if a network that advertised the immutability of data had something called mutable data!

By instinct and after reading the various arguments in this thread I would lean towards the side of keeping at least private data as deletable, technical issues notwithstanding. If all data is immutable I think the network will still have a huge role to play in the world, but I think it might be a different and perhaps more limited one than many people are hoping (though that may not be a bad thing.)

One point I would like to put forward though is that I think it is misguided to think of the immutability of data helping to preserve ‘truth.’ The 1984 argument convinced me initially, but I think the way we decide what ‘truth’ is is much more nuanced. What we agree to be true is and always will be a consensus based on triangulating many points of information, many of which need to come from outside the digital realm.

If we come to trust that, for example, the SAFE network holds a record of truth, then that could be a very dangerous place to be. Not sure if this fairly trivial one is a good example, (it’s just the first that springs to mind,) but if I publish 1000 documents saying that Obama was born in Africa, refuting the 1 that says he was born in the USA, then how will history know that the single 1 contains the truth.

To look at it a different way, we are living in an age where science, technology and education have given more people in the world than ever access to information that is as verifiably ‘true’ as it is possible to be, yet we still seem to be living in an age where truth is in crisis. I would argue that there is too much truth in the world, rather than too little. As individuals we just can’t handle the sheer amount!

I do however completely agree with the slightly more trivial argument that it should be made clear that as soon as we publish something on a network, we no longer have the power to delete it. Also, even private things, for example in my Google drive, are not currently deletable, even though we imagine them to be. Perhaps if private data was genuinely deletable, this would be another good way to distinguish the SAFE Network from the clearnet.

11 Likes

I think perspective is important as well. From my perspective, we have the network fundamentals and these should be agreed on fully. Yes, that makes some things able to be reduced to “some ideological” stance, but that is exactly what the fundamentals are, the ideology of the network/proposal.

Can they change/evolve, well I would hope so, given enough evidence they should.

Appendable Data

This work at the moment is to allow people to have the safety features they have with the current alpha II network. That is appendable data, where the list of entries grows, but the delete call nullified an entry, so the entry was still there, but empty.

This allows folk to build apps that seem exciting, so great.

Now we are saying we will not nullify the entry, but we will add in multisig capabilities and give the apps the same capabilities they had, except to say an entry did not exist or deny what was in it.

That is pretty much it.

Public or Private Data

This is another miscommunication I think, they are both the same to the network. Vaults do not know the difference.

That is good actually like the larger the pool of possible stuff then the harder to find the stuff that you think is valuable or belongs to somebody you know. This is not listed as a fundamental, but is, it reverses security practices.

So move from an attackable silo like a server, surrounded by firewalls, in a nuclear bomb shelter being the “most secure” to here is the world’s data, all of it, encrypted, chunked and obfuscated. Which bit belongs to which file, well we don’t know.

So switch from hoarding stuff in a seemingly secure place to put it all public.

This is what we have always meant by secure the data, not the servers.

Conclusions

It seems there is confusion where folk thinks we are taking something away, we are not taking away what people think, but adding features here. The only thing we will take away is the ability to make an entry null (but still exist). SO instead of saying that never happened and we won’t tell you what “that” was, this means all changes are there to be seen.

Plus we will add in multi-sig proper etc.

tl;dr what we currently have is pseudo appendable data without multisig, what we will provide is actual appendable data with multisig. This should mean all apps, still work and the apps should continue to appear and be more powerful.

For actual removal of network data or temporary data then it can be done, we can edit in place with PARSEC as consensus to ensure those edits/deletes are handled efficiently. There are even some CRDT patterns like add/remove sets, orswat etc. and moves to allow some of these in byzantine settings, but all of that is a future consideration. right now we can launch a network with vaults from home, safecoin and with the security we need, Then that can evolve as it should.

23 Likes

I think I follow. My main concerns are like what neo mentioned.

  • Will it affect application speed (for the database type applications he mentioned or otherwise)?
  • Could you traverse a safecoins history (even if using one time throw away ID)?
5 Likes

Wow, this has been quite a passionate thread! I suppose I might as well give some IMO/0.02$.

Implementation details aside, last time I read over the evolution of proposed data structures for the network I felt like maidsafe had ideally/perfectly distilled things down to two fundamental building blocks (Mutable and Immutable data). I also really like how they align with Rust paradigms. From a HPC perspective and looking forward to a perspective of safe as a general world computer (which may be getting ahead of ourselves and a bit and off track), the lack of a simple unversioned mutable datatype would be rather detrimental. Analogous to how you would program in Rust with no mut?

Appendable data also offers some nice features and is something I see good use cases for. It might also help safe get better industry adoption and certified for different compliance needs. However, wouldn’t it be better left for the App layer like ntp time stamps rather than network core? It seems like a good design principle would be to construct it from a combination of mutable and immutable datatypes with the multi-sig features and whatever else added in to complete the appendable datastructure and/or other future datatypes.

I think some of the community frustration found in this thread comes from everyone having a different set of expectations/views/wishlists/hopes/dreams on what the functionality of each datatype offers best. I’ll readily admit that my conceptual view of how it all fits together is rather limited.

It might be fun to brainstorm a list of what everyone sees as their understanding/preconceptions abount MD, ImD, and ApD. Then dirvine or a core dev can tell us how unrealistic we are being, or maybe we’ll give them some ideas to chew on.

5 Likes

I generally like the idea of having an Appendable data type. But I don’t understand why we can’t have both: One data type which is Immutable and one data type which allows modifications / deletions. Looking forward there will then be services which harness immutable data and others will use mutable ones. Let the people decide which services they want to use. E.g.: I can imagine that there will be two versions of a video sharing platform. One allows real deletion of content and the other not. You as a user then have the choice which one you want to use for your use-case.

Am I missing something here?

2 Likes

I think the SAFE network at launch will not be anything like a SQL database. It is a huge discussion, to replicate SQL on a server is unlikely, to replicate the function provided by SQL on a server is. SQL on a sever is generally faster as you do not worry about security and can use data locality etc. The cost of that is security and scalability. So then you look at Amazon etc. they do not use SQL servers, but decentralised systems like dynamo, it is more CRDT like as opposed to consensus driven ordering (PARSEC), but works at huge scale and secured behind a firewall. We do not need the firewall as SAFE has secured data. So yes this can be done and done at scale with te security of SAFE, but it will not be SQL, but it can provide the same end user results SQL does.

No, this will not be possible. A safecoin is a data element with an owner. The owner changes when the coin changes hands. No history, early versions had the last owner then we can have receipts, but simpler to have a single owner. That is metadata, i…e not perpetual, so no tracking.

I hesitate with safecoin though as it is not finalised, but I “feel” it is possible to exist purely in client accounts, backed by PARSEC. So not even a data element at all, that means a very fast transfer of millions of coins and very simple divisibility, but there you go the cat is out the bag of my thinking there.

I think yes discussion is great it helps us all.

I think the RFC process is good for this. We need to be aware every data type is a data subset, so easier to guess or a smaller catch group for data. Then if you make some chunks read only and some mutable, the network needs to identify those and apps will need to be able to read and say what of all the content of a thing is mutable and not mutable. That makes apps harder and user experience harder, say a video has a billion chunks and 1 is mutable? There are many more edge cases. What if you can mutate stuff, would you want history, if so appendable data does exactly that. So what we are looking for/asking for here is 2 things as far as I can see.

  1. Mutable data that scrubs history
  2. Deleteable data

I suspect they are 2 very distinct types for different purposes and both with side effects on the network. so RFCs are good. I worry Devs will be taken from launch though to work on all of these parts, whereas we have alpha 2, apps got created, it expands the API and more apps happen, it’s moving to RDF/SOLID integration and all with pseudo appendable data. all of that still happens, but more efficiently with appendble data. If you see what I mean anyway?

11 Likes

Phew. I honestly figured but hearing the details is always relieving.

I won’t hold your feet to the fire but :exploding_head: mind blown. That would be next level for sure and a helluva way to show off the power of a little ABFT consensus protocol called PARSEC.

Interesting about Amazon I thought they had some distributed refumdancy but did not know that. Also reassuring to me personally.

Just an aside if you don’t mind. How far along is the integration of threshold crypto into PARSEC? I noticed you and @anon86652309 forked it quite awhile ago but don’t see it in the Maidsafe repo. :yum:

7 Likes

It is all happening, some tests already working :wink:

10 Likes

So, if all data is essentially immutable (appendable or otherwise), does this mean that caching is similar and as effective in all cases?

5 Likes

I know that you and the team don’t leave things up to chance and that there is a well thought out master plan behind it all. I just hope that we will have the coolest network that the world has ever seen and that it gives people as much freedom as possible, and that people will be in controll of their data, as much as technology allows in a fair development time perspective.

I hope that ideology never compromises the functions of the Network, or how cool it will be or to give people controll over their data. To give the world security, privacy and that the people own their data is what the network should do, to end dictatorship in countries or other things

I hope will be an effect of people using the network, but that it is not ever what it was built for, if it compromises functionality. Just promise to give us the coolest network the world has ever seen, that is all I wish and hope for. Facebook was never written to overthrow dictators but the ability for people to connect and start groups, allowed for dictators to be overthrown.

If it is possible for people to choose if their data should be forgotten or not that I believe very strongly would be a good thing, if it is possible don’t only let people own their data, let them also be in full control over their data, if it don’t compromises functionality or security or other things of higher importance too the network.

3 Likes

I don’t agree with or even really understand this “once public - up forever” -principle. It sounds very unforgiving and cruel to me - and I think the world needs network, that allows forgiveness and kindness.

Sure, I would like to hold bad actors accountable, but I would also like to give them the opportunity to limit the scope of their bad acts if they come to regret their actions. Or to give people a chance to try to limit the consequences of the mistakes they do when, young, drunk, or stupid - or all these at the same time.

I think it is important, that you can publish stuff anonymously and not be forced to take it down, thinking about whistleblowers here. But I really don’t see reason to be forced to not to be able to take down what you want - thinking about ex-schoolbullies, young girls seeking attention etc. here.

Of course once you publish something, there is a chance that it will be public forever, because someone else can copy and republish it. And you just have to live with that. But that is not necessarily the case, and it actually should be less the case in SAFE Network, because no one else is owning the platform where you publish your stupid stuff.

I also expect that there would be people or organizations working as watchdogs, keeping book of the stuff the powerful and influential people say and do.

I know that there are some smart people that think that people should not be protected from their own stupidity, but - they are smart. It’s like powerful people saying that weak should not be protected. And accidents can happen to anyone. Anyone can accidentally publish something that was meant to stay private. Why not give us a chance to correct our mistakes?

Ok, just the existence of public immutable data is something that I see as risk, but I’m willing to accept that. But I’m not willling to accept that all the public data should be undeletable. Now I’m uncertain of the technical details, but if it is the case that datamap must be public and thus public data becomes undeletable, would it be possible to make a public site so that there is public data map pointing to another “map” (or something like that) that I actually can retract? So that if the basic layer of public data is permanent, there might another layer of doors where you can point to from permanent layer, but I can choose to lose the keys?

3 Likes

If it is public - owned by everyone - what gives anyone the right to unilaterally delete it?

5 Likes

Hmm… if there is a public data in a forest, but no-one has seen it, is it really public? :wink: I mean I can publish something by accident, but that doesn’t mean it is yet public, if no-one has seen it, and I think I should have a possibility to try to correct my mistake.

And on the other hand, if I publish something and someone else sees it, why it should be anyone elses - or the network’s - responsibility to keep a copy of it for them?

4 Likes

I am struggling to understand how deleting something once it has been made public changes that copies have most likely been made if it is even of the slightest interest to another person.

2 Likes