Appendable Data discussion

appendable-data
immutable-data
mutable-data

#81

From a philosophical perspective this is really interesting. It’s about preserving truth, radical honesty, and accountability. In so many ways this would enhance society but yet it can be used as a tool against us and our freedom. I can’t help but wonder if this perspective is derived from seeing any objective naturally occurring event as a natural design pattern. Where I think it may miss a little is the human element which is also a natural design. We want to have the freedom to say and do as we choose, grow, change our minds, preserve our public personas (out of ego), without having it used against us. This is especially applicable to young folk, IMO. Maybe the implementation won’t change that and I do believe people live under the illusion that once they delete something it’s gone but then again most aren’t considered important enough for the lesser amount of sophisticated people capable of exposing some small mistake they made, etc. to do so.

Since it’s people who have to adopt the network I think the implementation and marketing of these kinds of changes should reflect that as much as possible. Again, I thought this was already possible with MD versioning. From a societal impact POV, I just feel silly it was under my nose in the fundamentals all this time and I haven’t thought of it in this way up until now.


#82

Thats a question that will be discussed when safecoin is being implemented. There was suggestion above that the costs will be different between writing to an existing MD and creating one. Also the suggestion was that there might be a difference between writing a small piece of data to writing 1MB of data.

I am getting the impression that this drive for appendable data is viewing the storage of PUBLIC files/data such as web data/sites and the desire to prevent the public being disadvantaged by deletion/changing of data they previous saw. Plus the fundamental of perpetual data. That is all well and good and I support it for public data/files. But the issue around private data is ignored above and the equating of perpetual data and appendable data as if appending is the only solution.

Unfortunately it seems that the debate is being reduced to this view of SAFE’s data and valid concerns around other areas is being trivialised. Now maybe this is a focus for alpha 4 (Maxwell) and a way to expedite that. Unfortunately if appendable data is the only way then it is going to cause a lot of problems for a lot of applications and speed. And the adoption of SAFE by 90% of businesses which is where the money is when they start storing the massive amounts of data they keep in data collections/warehouses.


#83

Once people start storing their private and public videos on SAFE then immutable files might exceed appendable data. Its no longer mutable data (by definition)


#84

Agree that it’s odd this discussion should start here! Certainly seems that whatever is decided it would be strange if a network that advertised the immutability of data had something called mutable data!

By instinct and after reading the various arguments in this thread I would lean towards the side of keeping at least private data as deletable, technical issues notwithstanding. If all data is immutable I think the network will still have a huge role to play in the world, but I think it might be a different and perhaps more limited one than many people are hoping (though that may not be a bad thing.)

One point I would like to put forward though is that I think it is misguided to think of the immutability of data helping to preserve ‘truth.’ The 1984 argument convinced me initially, but I think the way we decide what ‘truth’ is is much more nuanced. What we agree to be true is and always will be a consensus based on triangulating many points of information, many of which need to come from outside the digital realm.

If we come to trust that, for example, the SAFE network holds a record of truth, then that could be a very dangerous place to be. Not sure if this fairly trivial one is a good example, (it’s just the first that springs to mind,) but if I publish 1000 documents saying that Obama was born in Africa, refuting the 1 that says he was born in the USA, then how will history know that the single 1 contains the truth.

To look at it a different way, we are living in an age where science, technology and education have given more people in the world than ever access to information that is as verifiably ‘true’ as it is possible to be, yet we still seem to be living in an age where truth is in crisis. I would argue that there is too much truth in the world, rather than too little. As individuals we just can’t handle the sheer amount!

I do however completely agree with the slightly more trivial argument that it should be made clear that as soon as we publish something on a network, we no longer have the power to delete it. Also, even private things, for example in my Google drive, are not currently deletable, even though we imagine them to be. Perhaps if private data was genuinely deletable, this would be another good way to distinguish the SAFE Network from the clearnet.


#85

I think perspective is important as well. From my perspective, we have the network fundamentals and these should be agreed on fully. Yes, that makes some things able to be reduced to “some ideological” stance, but that is exactly what the fundamentals are, the ideology of the network/proposal.

Can they change/evolve, well I would hope so, given enough evidence they should.

Appendable Data

This work at the moment is to allow people to have the safety features they have with the current alpha II network. That is appendable data, where the list of entries grows, but the delete call nullified an entry, so the entry was still there, but empty.

This allows folk to build apps that seem exciting, so great.

Now we are saying we will not nullify the entry, but we will add in multisig capabilities and give the apps the same capabilities they had, except to say an entry did not exist or deny what was in it.

That is pretty much it.

Public or Private Data

This is another miscommunication I think, they are both the same to the network. Vaults do not know the difference.

That is good actually like the larger the pool of possible stuff then the harder to find the stuff that you think is valuable or belongs to somebody you know. This is not listed as a fundamental, but is, it reverses security practices.

So move from an attackable silo like a server, surrounded by firewalls, in a nuclear bomb shelter being the “most secure” to here is the world’s data, all of it, encrypted, chunked and obfuscated. Which bit belongs to which file, well we don’t know.

So switch from hoarding stuff in a seemingly secure place to put it all public.

This is what we have always meant by secure the data, not the servers.

Conclusions

It seems there is confusion where folk thinks we are taking something away, we are not taking away what people think, but adding features here. The only thing we will take away is the ability to make an entry null (but still exist). SO instead of saying that never happened and we won’t tell you what “that” was, this means all changes are there to be seen.

Plus we will add in multi-sig proper etc.

tl;dr what we currently have is pseudo appendable data without multisig, what we will provide is actual appendable data with multisig. This should mean all apps, still work and the apps should continue to appear and be more powerful.

For actual removal of network data or temporary data then it can be done, we can edit in place with PARSEC as consensus to ensure those edits/deletes are handled efficiently. There are even some CRDT patterns like add/remove sets, orswat etc. and moves to allow some of these in byzantine settings, but all of that is a future consideration. right now we can launch a network with vaults from home, safecoin and with the security we need, Then that can evolve as it should.


#86

I think I follow. My main concerns are like what neo mentioned.

  • Will it affect application speed (for the database type applications he mentioned or otherwise)?
  • Could you traverse a safecoins history (even if using one time throw away ID)?

#87

Wow, this has been quite a passionate thread! I suppose I might as well give some IMO/0.02$.

Implementation details aside, last time I read over the evolution of proposed data structures for the network I felt like maidsafe had ideally/perfectly distilled things down to two fundamental building blocks (Mutable and Immutable data). I also really like how they align with Rust paradigms. From a HPC perspective and looking forward to a perspective of safe as a general world computer (which may be getting ahead of ourselves and a bit and off track), the lack of a simple unversioned mutable datatype would be rather detrimental. Analogous to how you would program in Rust with no mut?

Appendable data also offers some nice features and is something I see good use cases for. It might also help safe get better industry adoption and certified for different compliance needs. However, wouldn’t it be better left for the App layer like ntp time stamps rather than network core? It seems like a good design principle would be to construct it from a combination of mutable and immutable datatypes with the multi-sig features and whatever else added in to complete the appendable datastructure and/or other future datatypes.

I think some of the community frustration found in this thread comes from everyone having a different set of expectations/views/wishlists/hopes/dreams on what the functionality of each datatype offers best. I’ll readily admit that my conceptual view of how it all fits together is rather limited.

It might be fun to brainstorm a list of what everyone sees as their understanding/preconceptions abount MD, ImD, and ApD. Then dirvine or a core dev can tell us how unrealistic we are being, or maybe we’ll give them some ideas to chew on.


#88

I generally like the idea of having an Appendable data type. But I don’t understand why we can’t have both: One data type which is Immutable and one data type which allows modifications / deletions. Looking forward there will then be services which harness immutable data and others will use mutable ones. Let the people decide which services they want to use. E.g.: I can imagine that there will be two versions of a video sharing platform. One allows real deletion of content and the other not. You as a user then have the choice which one you want to use for your use-case.

Am I missing something here?


#89

I think the SAFE network at launch will not be anything like a SQL database. It is a huge discussion, to replicate SQL on a server is unlikely, to replicate the function provided by SQL on a server is. SQL on a sever is generally faster as you do not worry about security and can use data locality etc. The cost of that is security and scalability. So then you look at Amazon etc. they do not use SQL servers, but decentralised systems like dynamo, it is more CRDT like as opposed to consensus driven ordering (PARSEC), but works at huge scale and secured behind a firewall. We do not need the firewall as SAFE has secured data. So yes this can be done and done at scale with te security of SAFE, but it will not be SQL, but it can provide the same end user results SQL does.

No, this will not be possible. A safecoin is a data element with an owner. The owner changes when the coin changes hands. No history, early versions had the last owner then we can have receipts, but simpler to have a single owner. That is metadata, i…e not perpetual, so no tracking.

I hesitate with safecoin though as it is not finalised, but I “feel” it is possible to exist purely in client accounts, backed by PARSEC. So not even a data element at all, that means a very fast transfer of millions of coins and very simple divisibility, but there you go the cat is out the bag of my thinking there.

I think yes discussion is great it helps us all.

I think the RFC process is good for this. We need to be aware every data type is a data subset, so easier to guess or a smaller catch group for data. Then if you make some chunks read only and some mutable, the network needs to identify those and apps will need to be able to read and say what of all the content of a thing is mutable and not mutable. That makes apps harder and user experience harder, say a video has a billion chunks and 1 is mutable? There are many more edge cases. What if you can mutate stuff, would you want history, if so appendable data does exactly that. So what we are looking for/asking for here is 2 things as far as I can see.

  1. Mutable data that scrubs history
  2. Deleteable data

I suspect they are 2 very distinct types for different purposes and both with side effects on the network. so RFCs are good. I worry Devs will be taken from launch though to work on all of these parts, whereas we have alpha 2, apps got created, it expands the API and more apps happen, it’s moving to RDF/SOLID integration and all with pseudo appendable data. all of that still happens, but more efficiently with appendble data. If you see what I mean anyway?


#90

Phew. I honestly figured but hearing the details is always relieving.

I won’t hold your feet to the fire but :exploding_head: mind blown. That would be next level for sure and a helluva way to show off the power of a little ABFT consensus protocol called PARSEC.

Interesting about Amazon I thought they had some distributed refumdancy but did not know that. Also reassuring to me personally.

Just an aside if you don’t mind. How far along is the integration of threshold crypto into PARSEC? I noticed you and @Fraser forked it quite awhile ago but don’t see it in the Maidsafe repo. :yum:


#91

It is all happening, some tests already working :wink:


#92

So, if all data is essentially immutable (appendable or otherwise), does this mean that caching is similar and as effective in all cases?


#93

I know that you and the team don’t leave things up to chance and that there is a well thought out master plan behind it all. I just hope that we will have the coolest network that the world has ever seen and that it gives people as much freedom as possible, and that people will be in controll of their data, as much as technology allows in a fair development time perspective.

I hope that ideology never compromises the functions of the Network, or how cool it will be or to give people controll over their data. To give the world security, privacy and that the people own their data is what the network should do, to end dictatorship in countries or other things

I hope will be an effect of people using the network, but that it is not ever what it was built for, if it compromises functionality. Just promise to give us the coolest network the world has ever seen, that is all I wish and hope for. Facebook was never written to overthrow dictators but the ability for people to connect and start groups, allowed for dictators to be overthrown.

If it is possible for people to choose if their data should be forgotten or not that I believe very strongly would be a good thing, if it is possible don’t only let people own their data, let them also be in full control over their data, if it don’t compromises functionality or security or other things of higher importance too the network.


#94

I don’t agree with or even really understand this “once public - up forever” -principle. It sounds very unforgiving and cruel to me - and I think the world needs network, that allows forgiveness and kindness.

Sure, I would like to hold bad actors accountable, but I would also like to give them the opportunity to limit the scope of their bad acts if they come to regret their actions. Or to give people a chance to try to limit the consequences of the mistakes they do when, young, drunk, or stupid - or all these at the same time.

I think it is important, that you can publish stuff anonymously and not be forced to take it down, thinking about whistleblowers here. But I really don’t see reason to be forced to not to be able to take down what you want - thinking about ex-schoolbullies, young girls seeking attention etc. here.

Of course once you publish something, there is a chance that it will be public forever, because someone else can copy and republish it. And you just have to live with that. But that is not necessarily the case, and it actually should be less the case in SAFE Network, because no one else is owning the platform where you publish your stupid stuff.

I also expect that there would be people or organizations working as watchdogs, keeping book of the stuff the powerful and influential people say and do.

I know that there are some smart people that think that people should not be protected from their own stupidity, but - they are smart. It’s like powerful people saying that weak should not be protected. And accidents can happen to anyone. Anyone can accidentally publish something that was meant to stay private. Why not give us a chance to correct our mistakes?

Ok, just the existence of public immutable data is something that I see as risk, but I’m willing to accept that. But I’m not willling to accept that all the public data should be undeletable. Now I’m uncertain of the technical details, but if it is the case that datamap must be public and thus public data becomes undeletable, would it be possible to make a public site so that there is public data map pointing to another “map” (or something like that) that I actually can retract? So that if the basic layer of public data is permanent, there might another layer of doors where you can point to from permanent layer, but I can choose to lose the keys?


#95

If it is public - owned by everyone - what gives anyone the right to unilaterally delete it?


#96

Hmm… if there is a public data in a forest, but no-one has seen it, is it really public? :wink: I mean I can publish something by accident, but that doesn’t mean it is yet public, if no-one has seen it, and I think I should have a possibility to try to correct my mistake.

And on the other hand, if I publish something and someone else sees it, why it should be anyone elses - or the network’s - responsibility to keep a copy of it for them?


#97

I am struggling to understand how deleting something once it has been made public changes that copies have most likely been made if it is even of the slightest interest to another person.


#98

If there is no data element … what happens when my node drops offline … obviously I don’t lose my coins … so where are they stored - how is there no data element? Sorry for my stupidity, I just don’t get it. Also if you can do this magic with safecoin, then why not temp data?

In the end I don’t have enough understanding to get a feeling on what is better. However @neo has raised some points that I feel haven’t been addressed – maybe they can’t be addressed until the code is written and tested? For instance: speed (reconstruction cost), data growth (and data storage cost) of worthless data – essentially and IMO I think we all hope that the Safe Network can compete with the clear-net overall in the end (with cost, speed, and security issues all taken together into account).

I believe the concern here is that without some sort of ephemeral data storage we won’t be able to compete and we will miss out on a lot of growth … of course there is no way to know this, so all just gut feelings on one side versus gut feelings on the other – but that isn’t to say that rational arguments aren’t being presented here or that we shouldn’t do all that we can to close a perceived gap (again, in overall cost, speed, and security) between the Safe Network and the clear-net.

Yeah - great idea …

So,

  • Immutable Data for me is also the Perpetual Data (I’ve thought these were the same two things - am I wrong). I imagine this data type being used to store really valuable information - family photos, diaries, historical info.

  • Appendable data I sort of assumed was just an offshoot of Immutable data allowing for version-control.

  • Mutable data (or what I thought was mutable data, which it seems I was wrong about), IMO, this should be an ephemeral data store that may or may not be private could be used for storing temp data and is accessible to all sharing it. Similar to Appendable data but deletable and would hopefully have less overhead in both speed and storage cost.

Cheers


#99

IMO, all data is ‘public’ on the Safe Network, but you’d never find it unless it was shared with you. So effectively all data is private. So deletion should still be a thing.

EDIT: never say never Tyler … given enough time, all data is NOT private! So again, being able to delete seems important for some people.


#100

FORGET SQL. It seems the focus is on SQL which really would be converted into another database type for SAFE anyhow.

The very fundamental processing of data by only having appendable data means that ANY collection of data records will have increasingly long access times as the data is mutated because you can only append the changes. The implications of that is

  • to reconstruct the data you either
    • have to “nullify” previous record and search through the appended data for the actual record. This maybe many network requests away since too big for one AD object
    • OR append each change to any field(s) of the record and reconstruct as you process the AD and subsequent ADs (as the record changes are too big for one AD
  • This constitutes a forever increasing processing time and network accesses for each and every record as they changed.

Solution
Just have the promised MDs with mutation (modify == change in place) and append functions.

  • version-kept. A flag in the MD denotes if a copy of each version kept (perpetual data) or temporary/changing. 90+% of APPs will either reject MDs not keeping a copy of the version as temporary data. (<-- this could be at the api level and defaults to retrieving only perpetual MDs but allows other APPs like text editors, database etc to allow the other)
    • Thus application temporary files (eg text editors) can reuse MDs without adding (notes on paper as you called it) extra MDs containing encrypted (once only keys) data unreadable after the editing session. And the actual files (previous & new) are kept thus keeping to perpetual data.
  • Browsers and most (90+%) APPs will take note of the version-copy-keep flag and appropriately deal with it.

Isn’t this trying to have your cake and eat it too?

The owner of data (any AD data object) and the history of ownership is just as important as the data itself.

For instance comment fields in a forum. You can completely change the flow of comments if you change the owners of the comment ADs.

For instance on a blog site

  • you make a blog entry about privacy and
  • some authoritarian makes a blog about the authorities must have ALL knowledge of its citizens,
  • Now the owners of the ADs containing the authoritarian should have all knowledge changes the owner of his/her AD to you.
  • Now the blog site has two entries owned by you with very contrary views.

Ownership information is definitely a part of perpetual data

Thus if you make allowances for safecoin then you already broken the flawed model of appendable data only.

I can tell you that having done data mining for 3 years in a job, taught me a bit about what is history and what is not. And ownership is very definitely an important aspect of perpetual data

Again that is a very flawed argument.

Videos are immutable files and no chunk can be mutable.

So what are you implying here, that immutable chunks are being brought into the AD type and we no longer have the specific immutable chunks (immutable files)?

Except the argument had a flaw (see above)

Its stored with you account information.