Appendable Data discussion

nbaksalyar · February 8, 2019, 6:36pm

I agree with the sentiment, but I’m not sure it has to be a part of the official API. We’ll have SPARQL for queries on Linked Data, and it is closely resembling SQL, by design. So I’m thinking there should be existing translators from SQL into SPARQL, or even ODBC drivers that can be easily integrated into Excel and other software.

So it’s a matter of tools & wrappers around the API we provide really. Personally, I think this is a very cool idea and it can be one of the first ‘killer apps’ for the SAFE Network, along with SAFE Drive

wydileie · February 8, 2019, 7:37pm

I have to say I agree with neo and the others here about the switch to appendable data. This seems like a bad idea in my mind. I also have issues with the app development residing on SAFE where one can never delete anything and just append in perpetuity. My real gripe, though, is with the ideological push.

While I understand the ideology behind the change, in practicality, it would crush the perception of the SAFE Network to the general public. One needs to keep in mind their userbase when designing anything. Sometimes you have to compromise the best solution in theory for the best solution for your users.

There will be no way to make this ideological argument to the larger public, or ability to change people’s minds in any significant way. There is really one chance to make a first impression, and you don’t want to see the talking heads on a news station talking about the “new Internet” where you can never delete anything you put on it. You know how many people regret that picture they posted to Facebook, or a mean tweet they sent? Whether or not that is actually truly deleted from the Internet is besides the point. People have to believe that it is for their own sanity…

happybeing · February 8, 2019, 8:38pm

Seems to early to come to any conclusions and no need for anyone to worry - Maidsafe always think things through from many angles and also present important areas such as this for in depth discussion before making decisions.

I’m curious. The only thing that I feel concerned about at this stage is the point @Seneca makes about everything in your private account being at risk. This would make it increasingly attractive for targeted attacks, so the ability to forget private data seems important from that respect. It need not be deleted as such, but I’d want the ability to make certain data inaccessible by anyone including the owner for this reason.

tfa · February 8, 2019, 10:26pm

Maidsafe never had not a consistent view on data deletion, going from one extreme to the other throughout the years.

A brief recap of history:

Initially an SD could be deleted but everyone would know that the SD existed and was deleted and if someone recreated it then its version was incremented. This was IMO a reasonable implementation.
At one time Maidsafe made an evolution that removed these controls and an SD could be deleted by removing it from the network. It could be then recreated at version 0 by anyone. This was what I would qualify as an extreme implementation were anybody is free to do what they want of their data, including completely rewriting history.
I had a hard time to convince Maidsafe to go back to initial implementation, see these 2 long threads: unsuccessful one Deletion of SD objects and then successful one Transparency or opacity of SD modifications.
Current MD implementation is the reasonable initial one at the entry level (an entry can be deleted but everyone knows that, and history cannot be rewritten).
What is proposed now is another extreme implementation were data cannot be deleted at all by users. This ensure that not only history cannot be rewritten but also cannot be erased. The price is less freedom for users.

To politicize Maidsafe position on this specific problem (data deletion) I would say the evolution is: center party -> right wing -> center party -> left wing.

This may be the right thing to do, but in the past Maidsafe was arguing against it and now they are going even further than what I proposed. This has been a big waste of time.

dirvine · February 8, 2019, 10:56pm

We stared the project calling it perpetual data in 2006 with perpetual coin in 2007 as well.

tfa · February 8, 2019, 11:15pm

So, the full circle is going to be completed!

dirvine · February 8, 2019, 11:19pm

I think there is great value in looking at various mechanisms though. Deletable public data is not good and never wanted, but there is scope/space for editing metadata. The answer is in there somewhere for sure. Ensuring public data is never deleted though has never changed, protecting how to achieve that and allow mutations of metadata or management data is tricky for sure. To start with on launch though it makes sense to be simple and with an API that is rich and allows apps as we have seen so far and hopefully much more will be a good start.

TylerAbeoJordan · February 9, 2019, 12:07am

But it’s not forced as the clearnet still exists … which as @neo has pointed out will drive them to stay on the clearnet. Unless of course you are secretly working for the government and will use their guns to make us all use the Safe Network

It won’t lose data you choose for it not to lose. I’m fairly certain that most people don’t think having a choice is being in a weird place.

Wait wait wait … I thought we were talking about MD for private temp data - for apps and personal use … Why would this idea need to be expanded to the whole of the network for public data?

I don’t think many are opposed to appendable data at all … IMO the question is whether or not we have private temp data that can be erased.

Think about it this way … when you are working on an idea - say you are writing a paper on something that is rather political in nature … but since you are ‘working’ on it, you write a few things at first that you later come to understand are wrong … but if people down the track find your earlier views they will mud sling it all over the place and ruin your reputation … all because you started out with a brainstorm and wrote a bunch of nutty things down that you later regretted …

People have the right to privacy - so they need a right to delete.

You make my point exactly. Plus hacking is still possible either through deceit or coercion to gain access to someone’s accounts.

Appendable data seems great for collaborative projects and website backups, etc. but for private data it is not a substitute for deletable data.

neo · February 9, 2019, 1:42am

This I’ve suggested many times and solves a lot of the issue.

MD data allowed this and the idea of never actually deleting the containing MD with the version number provided a way to protect against recreating history. Or as @tfa points out delete the contents and set the owner to an owner with no known private key

The biggest issue I see with append only data is the growth of individual records.

For instance

a record with simple info takes up 100KB of data (yea a couple of blobs of data in that)
the data is encrypted so only the app can read it. (thus discarded data is encrypted and meaningless to anyone)
A change of fields occur about once a month. The average change size is 25KB
There are about 1 billion records in this collection of data (some sort of database by another name)
For appendable data this results in 25TB of wasted (to anyone else) encrypted data
But the worse is the app must trace through all the changes to the record to reconstruct the record when retrieved
- This means after one month the updating APP has to reconstruct the 1 billion records during the next month and process 25TB of extra data due to reconstruction of each record (25KB ave/record)
  - the next month it is 50TB of extra data to process
  - after a year its 300TB of extra data to process.
  - and this is just one of the 1000s and 1000s of massive data bases that are being used
- After 2 years the users accessing data have to process an extra 600KB of unused data just to reconstruct the record they are reading
And even worse is if the data is organised in a relational manner and to present one set of data many records have to be read
This represents a massive waste of processing worldwide and energy usage that will rival the blockchain mining today in scale. blockchain mining is minor compared to the data bases of the world.
this is the real barrier to adoption of SAFE as the storage medium if appendable data is the only way to store collections of data (database by another name)

Hey they are stored in immutable files and thus cannot be deleted or changed you are cheating here

Lets look at the idiot who stores a 4GB video in MDs. - If its stored in a deletable data type (non-versioned-kept MD) then people KNOW its temporary and they will copy it if they want it, then the copies can be perpetual.

No your mutable data type was the answer. Yea we know it was not in alpha 2 but it was the plan

Actually they are not the same thing, they are two different concepts. Perpetual data can be achieved with less issues by version-keeping MDs. By this I mean keeping a copy of each version of the MD

If you versioned-kept MDs and a change to a MD caused the previous version to exist would solve the problem outlined above (all the extra processing required to reconstruct a record).

ALSO for data bases this is not always a good idea because of the mutation rates that can occur in certain databases and how they already don’t lose data (journals and the like), but appendable data will multiply the time to process data through the databases. At least keeping a version copy of the MDs will not multiply the time.

If you version MDs by keeping a copy of the old MD upon change, then you can have a temp file type very simply by not keeping copies of each version of that particular MD

The browser app can refuse to display websites stored in non versioned-kept MDs thus forcing web page versions to be kept.

And all apps can do the same for data that should be kept perpetual. Remember this is only MDs and immutable data files already are kept perpetual

Exactly anonymity will be removed and all transactions will be visible.

Exactly temp files and databases able to operate at full speed not slowed down (by processing many times more data than the actual data) is an absolute necessity.

Exactly and agree with this.

the idea of temp files being deletable agrees with this
databases being able to mutate data agrees with this (even if versioned-kept MDs is used
- appendable data causes any collection of records that are being updated to grow in size and access times.

No thats the cop out answer. People WILL reuse IDs for various reasons. EG so family know who sent the payment for instance. If appendable data or always-version-kept MDs is used then those transactions can be traced. All that is needed is one of the families IDs and then all transactions can be traced.

Also once I have one ID I can like blockchains follow the transactions gaining a lot of information along the way even if throwaway IDs are used. If I know 2 IDs then I’ve got you.

Nope no help in the case of payments - see just above. Not all cases will be use once only IDs for so many reasons and family scenario above is just one.

tl;dr

I am not against perpetual data. I am against append only data types replacing MDs (mutable data types)

append only data types require extra processing to reconstruct the data.
- waste of energy and processing. Particularly bad for database style of record keeping where it keeps multiplying the problems month after month.
The fundamentals only say public/published (implied shared with others) data is to be kept perpetual.
Mutable data does not go against the fundamentals if optional keeping of MD versions is used. Various methods can be used to prevent non-versioned-kept MDs from being used for websites and other general applications.
The big problem with appendable only datatype is the growing of the records/data. The reprocessing of all that old changed data just to reconstruct the actually state of the data. OK if the data is small, but for data that is large (even 1MB) will cause the users to have to be reading mountains of data just to get to the data they are after. Keeping a copy of versioned MDs solve this problem.

TylerAbeoJordan · February 9, 2019, 2:01am

Thanks for that detailed explanation Neo, I didn’t understand the full benefit of MD’s plus the versioning of MD’s seems like a nice solution to replace the idea of appendable data.

Nigel · February 9, 2019, 2:01am

This is what I thought was meant by the fundamentals aside from the obvious immutable data.

I definitely agree that this is a major issue that truly goes against what has long been said on this forum for years.

I’m open to the RFC but I think the community is making solid points.

neo · February 9, 2019, 2:06am

I just looked and there is append only in it. But that is after all the language that does not suggest append only. So its contradictory language and the reason why so many missed it.

neo · February 9, 2019, 2:16am

Alternative idea to append only data type.

Keep the current commonly understood idea of MD - mutable data type

Allow a copy of previous versions to be kept as the default
allow MDs to be created with version copies set off
- Once the MD is set to version copy keeping on then it cannot be turned off unless no mutations done since it was turned on
- For version keeping off then an optional version keep flag can be set to allow a version kept on a case by case basis.
allows applications to access the data without the need to reconstruct the said data.
- this allows collections of data records (ie database of any sort) to run at maximum speed without the increasing slowdowns caused by reconstruction of data and the associated lag time when reading additional MDs caused in time by append only data types
applications including the browser know the status of the version keeping flag in the MD and can reject such MDs if desired or flag them to the user as temporary data.
applications using collections of records (ie any sort of database) can either use version-keeping or not depending on the type of application the records serve
- eg a private database of ones music collection does not need to have versions kept and is up to the person keeping the collection records to decide.
- eg a health record database would definitely have version keeping turned on.
- in both cases neither is slowed down by the having to trace through previous changes to reconstruct the current data.
allows the concept of private temp files that can be deleted and the MDs reused without multiplying the data stored on the network
Coin MDs are always not keeping copies and its optional.

Immutable files (chunks) are a separate data type and not covered by the above

They will be used for the main file storage and fulfils the perpetual data for that data type
Many web pages will be stored as immutable files and thus kept anyhow

TylerAbeoJordan · February 9, 2019, 3:23am

@neo Love the proposal. What about storage cost (and cost implementation/mechanism) for MD? Any thoughts on those problems? It seems like it would be complicated EDIT: but maybe current farming mechanism works for both immutable and MD?

I wonder how much of global data would be MD versus immutable.

Mods - can we have a thread for all of this?

Nigel · February 9, 2019, 8:09am

From a philosophical perspective this is really interesting. It’s about preserving truth, radical honesty, and accountability. In so many ways this would enhance society but yet it can be used as a tool against us and our freedom. I can’t help but wonder if this perspective is derived from seeing any objective naturally occurring event as a natural design pattern. Where I think it may miss a little is the human element which is also a natural design. We want to have the freedom to say and do as we choose, grow, change our minds, preserve our public personas (out of ego), without having it used against us. This is especially applicable to young folk, IMO. Maybe the implementation won’t change that and I do believe people live under the illusion that once they delete something it’s gone but then again most aren’t considered important enough for the lesser amount of sophisticated people capable of exposing some small mistake they made, etc. to do so.

Since it’s people who have to adopt the network I think the implementation and marketing of these kinds of changes should reflect that as much as possible. Again, I thought this was already possible with MD versioning. From a societal impact POV, I just feel silly it was under my nose in the fundamentals all this time and I haven’t thought of it in this way up until now.

neo · February 9, 2019, 8:24am

Thats a question that will be discussed when safecoin is being implemented. There was suggestion above that the costs will be different between writing to an existing MD and creating one. Also the suggestion was that there might be a difference between writing a small piece of data to writing 1MB of data.

I am getting the impression that this drive for appendable data is viewing the storage of PUBLIC files/data such as web data/sites and the desire to prevent the public being disadvantaged by deletion/changing of data they previous saw. Plus the fundamental of perpetual data. That is all well and good and I support it for public data/files. But the issue around private data is ignored above and the equating of perpetual data and appendable data as if appending is the only solution.

Unfortunately it seems that the debate is being reduced to this view of SAFE’s data and valid concerns around other areas is being trivialised. Now maybe this is a focus for alpha 4 (Maxwell) and a way to expedite that. Unfortunately if appendable data is the only way then it is going to cause a lot of problems for a lot of applications and speed. And the adoption of SAFE by 90% of businesses which is where the money is when they start storing the massive amounts of data they keep in data collections/warehouses.

neo · February 9, 2019, 8:31am

Once people start storing their private and public videos on SAFE then immutable files might exceed appendable data. Its no longer mutable data (by definition)

david-beinn · February 9, 2019, 9:50am

Agree that it’s odd this discussion should start here! Certainly seems that whatever is decided it would be strange if a network that advertised the immutability of data had something called mutable data!

By instinct and after reading the various arguments in this thread I would lean towards the side of keeping at least private data as deletable, technical issues notwithstanding. If all data is immutable I think the network will still have a huge role to play in the world, but I think it might be a different and perhaps more limited one than many people are hoping (though that may not be a bad thing.)

One point I would like to put forward though is that I think it is misguided to think of the immutability of data helping to preserve ‘truth.’ The 1984 argument convinced me initially, but I think the way we decide what ‘truth’ is is much more nuanced. What we agree to be true is and always will be a consensus based on triangulating many points of information, many of which need to come from outside the digital realm.

If we come to trust that, for example, the SAFE network holds a record of truth, then that could be a very dangerous place to be. Not sure if this fairly trivial one is a good example, (it’s just the first that springs to mind,) but if I publish 1000 documents saying that Obama was born in Africa, refuting the 1 that says he was born in the USA, then how will history know that the single 1 contains the truth.

To look at it a different way, we are living in an age where science, technology and education have given more people in the world than ever access to information that is as verifiably ‘true’ as it is possible to be, yet we still seem to be living in an age where truth is in crisis. I would argue that there is too much truth in the world, rather than too little. As individuals we just can’t handle the sheer amount!

I do however completely agree with the slightly more trivial argument that it should be made clear that as soon as we publish something on a network, we no longer have the power to delete it. Also, even private things, for example in my Google drive, are not currently deletable, even though we imagine them to be. Perhaps if private data was genuinely deletable, this would be another good way to distinguish the SAFE Network from the clearnet.

dirvine · February 9, 2019, 1:25pm

I think perspective is important as well. From my perspective, we have the network fundamentals and these should be agreed on fully. Yes, that makes some things able to be reduced to “some ideological” stance, but that is exactly what the fundamentals are, the ideology of the network/proposal.

Can they change/evolve, well I would hope so, given enough evidence they should.

Appendable Data

This work at the moment is to allow people to have the safety features they have with the current alpha II network. That is appendable data, where the list of entries grows, but the delete call nullified an entry, so the entry was still there, but empty.

This allows folk to build apps that seem exciting, so great.

Now we are saying we will not nullify the entry, but we will add in multisig capabilities and give the apps the same capabilities they had, except to say an entry did not exist or deny what was in it.

That is pretty much it.

Public or Private Data

This is another miscommunication I think, they are both the same to the network. Vaults do not know the difference.

That is good actually like the larger the pool of possible stuff then the harder to find the stuff that you think is valuable or belongs to somebody you know. This is not listed as a fundamental, but is, it reverses security practices.

So move from an attackable silo like a server, surrounded by firewalls, in a nuclear bomb shelter being the “most secure” to here is the world’s data, all of it, encrypted, chunked and obfuscated. Which bit belongs to which file, well we don’t know.

So switch from hoarding stuff in a seemingly secure place to put it all public.

This is what we have always meant by secure the data, not the servers.

Conclusions

It seems there is confusion where folk thinks we are taking something away, we are not taking away what people think, but adding features here. The only thing we will take away is the ability to make an entry null (but still exist). SO instead of saying that never happened and we won’t tell you what “that” was, this means all changes are there to be seen.

Plus we will add in multi-sig proper etc.

tl;dr what we currently have is pseudo appendable data without multisig, what we will provide is actual appendable data with multisig. This should mean all apps, still work and the apps should continue to appear and be more powerful.

For actual removal of network data or temporary data then it can be done, we can edit in place with PARSEC as consensus to ensure those edits/deletes are handled efficiently. There are even some CRDT patterns like add/remove sets, orswat etc. and moves to allow some of these in byzantine settings, but all of that is a future consideration. right now we can launch a network with vaults from home, safecoin and with the security we need, Then that can evolve as it should.

Nigel · February 9, 2019, 3:55pm

I think I follow. My main concerns are like what neo mentioned.

Will it affect application speed (for the database type applications he mentioned or otherwise)?
Could you traverse a safecoins history (even if using one time throw away ID)?

Topic		Replies	Views
Database @ Safe in the Published Zone Features development , appendable-data	8	504	December 13, 2022
Safe Network storage features Features storage	38	814	December 25, 2023
An Overview of the New Data Types Development	40	1998	October 21, 2020
DataStore over AppendableData design Development	25	2559	February 27, 2019
Thoughts on the dangers of undeletable data? Features	71	5110	April 20, 2016

Appendable Data discussion

Alternative idea to append only data type.

Keep the current commonly understood idea of MD - mutable data type

Related Topics