Appendable Data discussion

appendable-data
immutable-data
mutable-data

#8

And where can a web running on safenetwork store short term data? Android or any other app does not have problem here, it can store it elsewhere but how will Safe web handle such very short term data?


#9

Dropped that and ran :slightly_smiling_face:

Is this official or a one night stand?


#10

AppendableData ! Again !

History repeats itself: Structured Data -> Appendable data -> Mutable Data -> Appendable Data

So, you are going to define the 4th version of this kind of data. This is typically over-engineering. In French there is an expression to describe a situation like this: the better is the enemy of the good.

At least, you can take advantage of if to put back signatures in the new structure. I have mentioned several times that this was a regression compared to initial SD, without them:

  • no multi-sig
  • no static check

#11

I like this idea at first glance.

Choosing an arbitrary xorname for MD was a concern to me (eg data density attacks). Being able to use the content of AD to derive the xorname should be more secure. It also allows simpler caching of AD, whereas caching MD is very complex and challenging.

Having an initially blank field next_data_xor_name that can be updated just once (ie the append operation, which would not affect the original xorname) allows arbitrarily large appendable data sets, but comes with a cost of needing more lookups to fully download the data or reach the latest point in the chain. I think there are ways to simplify this though, eg via communications rather than storage, maybe some overlay network recording AD-LATEST-XORNAMES. I feel like there’s a strong parallel between bitcoin segregated witness and the ‘appendable’ part of AD. Navigating the ‘next_data’ direction of AD is an interesting problem.

This maybe has implications for safecoin history and privacy since the new owner must be appended rather than replaced. But with a suitable signature scheme privacy should be retained. Hard to imagine this would still permit free txs though.

AD also really challenges (and strengthens) the idea of volatile vs permanent data. Is SAFE used to ‘store data forever’ or is it used to ‘securely arrange meetings with other people and then transfer data p2p in a volatile way’? A bit of both I guess… I think AD progresses this concept in the right direction. A bit like how lightning network is used transfer the day-to-day data (volatile) then every so often the aggregate result is written to the ledger (permanent), where the bitcoin blockchain and lightning are acting as “a highly accessible and perfectly trustworthy robotic judge and conduct most of our business outside of the court room” source. MD feels like a pre-lightning-network bitcoin, AD like bitcoin + lightning-network because AD encourages volatile data transfer using efficient ‘off-chain’ ways.

Will we get to hear more details about how AD has been discussed within maidsafe?


Very cool. I know nothing of this technology, but having read the medium article posted by @dirvine it will be interesting to see if this particular drawback ends up being significant or trivial:

BLS signature verification is order of magnitude harder than ECDSA. Signature aggregation for the whole block with 1000 transactions still requires to compute 1000 pairing, so verifying one tiny signature in a block may take longer than verifying 1000 separate ECDSA signatures.


#12

This is not at all decided yet though.

There are many ideas floating around, and this is just one of them (admittedly the one I am advocating for personally, for multiple reasons :slightly_smiling_face:). The primary goal here is to store the data perpetually, as per the network fundamentals. But I also think secondary goals like having the backwards API compatibility with Mutable Data are very important too.

How exactly is this data type will be called and implemented is yet to be decided though, so for now we’re at the discussion and pre-RFC stage.


#13

EDIT: some of the information in this post is misguided due to ADs not being implemented as data objects but link objects. So an AD should be called a ALD and this changes some of the logic/assumptions my post was based on.

Not smart in my view. Remember we want to support devices that do not have a large disk capacity and/or privacy concerns so that no trace is left on the device

One notable use case is - Database operations. Some very large databases are doing thousands of mutations per minute or even per second during their peak hours. If you make all data immutable then the space required for these types of databases will balloon out and swamp the storage capacity. Its one reason these multi TerraByte databases do not keep a log of every mutation that is made on them and only do snapshots of the data. And even these snapshots are kept for a set period of time.

In my opinion you must provide the ability for fast changing data (bases) to not keep every change done on them. At least have it as an option.

Also Databases with append only data means there is not a simple field change function. You either have to have a procedure to track through all changes (may need to read many MDs == very slow access now due to lag time) in order to reconstruct the record and this is time consuming when done every record read. All that energy wasted. OR the database makes a complete copy of the record and appends it so that the procedure to reconstruct the record is easy.

To build just one object for display it may require 10s to 100s of relational database records to be read from multiple files & index records too and if the database is very active then the work to reconstruct each of those hundreds of records could require more than one MD per record. And its not a parallel situation since the index field for other files is held in the records being read.

Then in my opinion there is a use case for temporary files too. For instance editors that store a temp file and discard it once the editing session is finished. So the temp file is useless once discarded since the saved file and previous file is the actual files. Remember SAFE will be run on devices that cannot have large temp files on its disk and/or for privacy concerns

Also these application temporary files are often heavily mutated and some on group of characters change and others on larger changes. This is for recovery purposes and if someone wants privacy (whistle blowers, ordinary people) on the shared device then temp files have to be on the network If you keep all these mutations then this represents a lot of wasted space for no benefit. The changes are saved when the file is saved and session is over. Think of all those 100s of millions of word documents that office staff work on each day and you want to save all the character/lines/paragraph of changes (for no benefit). Thats many terra bytes or more a day of useless data (never accessed again) (no information gained/lost by keeping or not keeping it)

And deleting this very temporary information does not take away meaningful information since each version of those documents are still kept in immutable data as the files. Thus reinforcing the fact that its keeping data with no benefit to the people using those applications or the future world.

The world of data storage is a lot more than web sites.

So in my view keeping web site changes is good. BUT not EVERY character or word or tag that is changed during an editing session. Just keep the saved files for goodness sake.

tl;dr

  • Remember one of the early promises was that you could log in on any device and when you logged out there is no trace left. Having the requirement that temp files are stored on the device means there is a trace. SSD devices/memsticks means that wiping files using overwrite methods don’t work properly and files can be recovered often. EDIT: even if you encrypt the temp files, the fact they even existed (meta data) can cause problems. Remember the ex NSA chief who said we kill people on meta data.
  • Databases will require a method to reconstruct records by tracing through all the appended changes and building the record.
    • This dramatically increases the time to access data since a lot of those records will now be multi MD in size because of changes done to the record.
    • Index files now become almost useless (speed wise & size wise) due to having to reconstruct the index record. Just read up on how they optimise index records and you might get an idea of the problems of append only data will cause.
    • the multi terrabyte databases with 1000s of mutations per minute or second will result in dataspace blowout for no measurable benefit
  • The world of data is so much more than web sites and I agree that each version of the website should be kept, but not all the temporary data/files involved in the edit of each web page.
  • Privacy & Security
    • once you have append only data then you force temporary files back onto the device, if indeed the device can support it. This has serious implications for those in the world who want privacy and security of their data. Not every activist or whistle blower can have a device that can support large temp files and that also cannot be taken from them. Often they use shared devices or phones/tablets that can be taken from them. If the temp files are not on SAFE but the device then the device can betray them.
    • But if you put all temp files on SAFE then data space will balloon. For instance if I have 10 MB of documents and I edit each on average 3 times a year then I end up adding 60-300MB of appended temporary files. And 3MB to 30MB of immutable data (the various versions). Now I have 60-300MB of pure wasted space stored on SAFE. Many times than each version requires.

#14

Will Fleming still use the existing model?


#15

Fabulous! I’m always impressed by the open-mindedness of the team and hence the ability to learn and adapt new tech for the project. Possibly a lot of this is simply due to not yet being in beta and perhaps this attitude will shift a bit when beta is launched, however I think there is also a sense of humility here - knowing that you can’t figure everything out yourselves - so you keep looking at other projects for better solutions. Maidsafe isn’t the only team in the cryptosphere this mature-minded for sure, but it’s good to see.

Can we have a “pros versus cons” thread on this? @neo has made some interesting ‘pro’ (keep MD) points. I wonder about the costs and cons of MD to the network though – and importantly how they are or may be addressed without giving up MD altogether.

Great to see the new video’s, events, website dev, and general marketing pressure maintaining itself week after week. It will pay off handsomely down the track, so do keep it up!

Awesome update as usual. Thanks to the whole team for the hard efforts.

Cheers


#16

One cost I forgot to mention is that it will make the barrier to adoption of SAFE a lot higher for large business and systems. Even small database systems that will not touch SAFE because they will not write the code to reconstruct records.

So the cost is non-use of SAFE by some/many including those seeking privacy and security from leaving traces of the temp files on their device.


#17

Is it insane to have immutable data, mutable data, AND appendable data? It seems they both have irreplaceable qualities that have been the cause of continuous debate.


#18

I’m only guessing, but perhaps one of the cons of MD is more network traffic? and of course more code complexity.

Given the seemingly strong pro’s though I’m scratching my head at why those con’s would be much of a trade off. Which is why I’d like to see a clearly laid out thread on the pro’s versus the cons. Maybe this discussion has happened in the past and it’s already on the forum but a quick search found nothing.


#19

But of course if it means that adoption fails when it comes to real database applications and web sites then …

Yes and it seems that the problems with removing mutable data have been forgotten.


#20

Would not surprise me at all. It’s a long running project and people have left and new people have joined. Hopefully this can be resolved quickly and the solution cemented firmly in place - it’s frustrating to see this coming up again now. I’m not discounting that there may be a good reason, but would like to know what it is.


#21

Exactly that and many more. Without option to store cheap temp data this whole project is just another super expensive blockchain like database. This will also kill anonymity forcing devs to keep footprint data on client devices. If I need to change single byte of data and I have to pay for 1 MB of permanent storage for that than good luck with story about cheap alternative. Even blockchain with sharding will be cheaper since on blockchain I have to pay for transaction bytes not for whole 1 MB block. I also can’t imagine how to create dynamic web services on such network. For me this is disaster idea killing anonymity and usability cutting possible use cases of network by magnitude. This whole concept of storing everything forever at any costs is a nigthmare.


#22

This would be helpful. I understand that keeping data forever is one of the big ideas, but already it’s not an absolute rule since metadata and messaging won’t be kept - presumably that includes all the machine-to-machine sensor data that will grow exponentially as the IoT comes online. So there’s already a dividing line between data that will be kept and data that won’t, and as @neo points out having only the immutable data option could potentially be restrictive.

From an end-user’s point of view this would seem to be the ideal scenario. What would be the downsides? More complexity presumably … any others? Interested in your thoughts @nbaksalyar.


#23

Hjuston we have a problem


#24

Is this feasible / sensible without keeping MD as well?


#25

Exciting stuff! I am really curious to see a RFC outlining this — especially the backward compatibility with MD.

I assume AD must have XOR predictability for ‘compatibility’ with MD. E.g. DNS would be impossible without it. Without an option to pre-determine the XOR I see no way one can do high-level communication.


The crux is about whether we want to allow the network/user to delete data. I’ve slowly assumed that data would never be deleted; MD is to have versions and to be addressable per version. It’s only natural that you would call that ‘appendable’, because data isn’t actually deleted/mutated — thus ‘immutable’. What’s in a name?

The points made by several members so far lead me to have mixed feelings about it. @neo puts forward a good case for truly mutable data — and thus deleting data.


#26

If published data were all or even in part included in a MD object that could delete or edit then history and proof is broken. If Mutable Data is append only then there is no place to hide, what you publish will stay and can easily be found. Like a built in Internet Archive with no missing bits.

For small throwaway data, then have that locally and throw it away, but when you publish then it is publically available forever. That is the proposal really.


#27

I don’t know, I thought that I would be the owner of MY data. So if I want to modify it or delete it, then I should be able to it. It feels like a new google to me