Immutability - does it change everything?


#1

Like quite a few people here (I’m guessing), I’ve been nodding along and stroking my beard (optional) in the great MD vs AD debates (links below), kind of getting it - but mostly not, at least in terms of the big picture.

I think a step back would be useful: what’s the point of immutability?

Intuitively, the idea of never deleting anything puts one in mind of those documentaries where we see into the homes of people who just can’t throw anything away. “Why are they living in squalour amongst those broken umbrellas, yellowed newspapers and dog poo,” we ask? The obvious answer is because they’re slightly nuts with an impractical attachment to the past.

But the horders will have their own internal logic which they will find it hard to explain to others, but they might point to some examples. BBC archivists thought it made sense to tape over Beatles footage and Dr Who to make space for Terry and June (apologies to non-Brits and young people).

Computer scientists attached to immutability will similarly struggle to explain their apparent data hording to others. Why keep hold of rubbish that you could chuck away and make space for useful stuff? It’s counterintuitive to say the least. But most peole aren’t great at exponentials and computing power and storage vs cost have been increasing exponentially, and probably will for a while. If our houses increased exponentially in size with time it would make perfect sense to keep the broken umbrellas and old papers because there’s a chance they’d be useful at some time in the far distant future, and there’d always be space for them. We’d be considered sensible, not nuts, for keeping them.

See the brilliantly titled Immutability Changes Everything by Pat Helland

But what about the dog poo? Surely even accounting for exponential jiggery pokery it doesn’'t make sense to store that? It it might be vaguely useful as fertiliser short-term but it will soon decay and rot into nothing, becoming worthless.

Getting to the point, I’d like to see some explanation about how SAFE might work in IoT situations where sensors are churning out readings every 1/100 seconds. This sort of data will increase massively, but it’s dog poo: there’s surely no point in keeping it in its raw form?

Will sensor data be dealt with in a client to client way, being aggregated by some app before being stored forever or will that be outside of the scope of SAFE? If so where does that app live? I don’t think SAFE should be limited to static archival storage as that’s already relatively cheap and simple and there will be no market for it. So how will SAFE deal with streaming data?

I think how SAFE fits into the paradigm shift of exponential expansion of compute/storage needs to be explained more clearly and separate from the technical arguments about data types and so on, and there are further discussions to be had about what immutability means for the more abstract notions of freedom and access to information, but that’s maybe for a different thread.



#2

Yes we need to properly define what we mean by perpetual Data which is driving the AD scheme.

David did give a very brief set of thoughts in his interview recently with John @fergish and just one of the points amongst others was it did not mean what you write on a scrap of paper. How does that translate into SAFE is not precise.

In my opinion we need to tailor what is kept according to what the desired perpetual meaning is. In my view it means documents (incl web pages) meaningful data/context stored in databases and so on, a history of the actual ID used to put those documents/data up since it can change. NOTE: this is not ownship attributed to a person, but to keep history of changes to the ID owning the data held in objects that ID is kept (eg ADs and not immutable chunks since they are anon)


#3

It seems to me like there is a requirement for some stuff to last and other stuff to rot, just like in nature. Some stuff you need to have chiseled in stone, other stuff carved into wood, with some stuff barely in need of recording at all.

I understand the need to make data addressable via the hash of its contents. This makes caching simple and provides a consistent approach to data storage. I also get that multiple people may access this same data with different expectations of how long it will last. However, I don’t think this is mutually exclusive to allowing data to decay where appropriate, thus allowing garbage collection.

I think this idea has been mentioned before (maybe by @neo) but having a flag to set as temporary would get us a good way to this goal. Data set in this way would persist with lower guarantees of immutability. This could be be refreshed or made fully immutable if others store the same data again with or without the temporary flag set.

This decay model would be similar to how IPFS treats all non-pinned data. It seads the original temporary data and then leans on the caching subsystems to retain the data while it is popular. As far as I understand, this approach should be relatively easy for SAFENetwork.

So, data could then be temporary unless made permanent. Temporary data would persist while it remained popular. Unpopular temporary data would decay to dust. Storing the same temporary data again would start the cycle over again.


#4

With that temporary flag mention before, you maybe meant this post?
Everybody should be able to read the flag and know that the data can be deleted. But they would be able to make that data immutable by storing the data themselves. Only changing the flag of such a (then deduplicated) data chunk would be enough.

Edit: ‘the Everybody should’ is of course assuming the solution of the referenced post. And for me this also has lower priority: first have a good working system with immutable data only.


#5

I think there is still come confusion. To me if you publish then it’s immutable. So I think we all agree on that. Right now for the network to achieve this quickly and launch quickly that will mean all data. You still control access to private data, even to the extent of zero access for anyone including you (effectively render the data useless).

Temp data such as sensor stuff etc. is all temp, it should likely not be published. There are ways to handle this off-network (at client side) right now with some thought, but no clean and not as simple as it should be.

We could do many things, like a data type that will never survive all elder churn or similar. That is quite easy, but not the only solution by far.

So the trick right now is protect published data by making all data fully immutable (except login packets, where we have some wiggle room).

When we solve this properly then temp data types are simple, but will require some debate to make them as effective as possible. Right now we don’t feel the time is right to do this, but who knows? if there is enough motivation to delay launch to do this then it of course would be done. I have a feeling it is required and we could create a simple RFC that would satisfy the temp stuff, but will ensure the temp stuff is actually removed over a long but indeterminate period.

tl;dr Don’t publish dog poo right now :smiley: :smiley:


#6

There’s also this article by the same guy :slight_smile:

The idea is that people are less likely to upload dog poo when they have to pay for PUTs.

Can’t the devices communicately directly, node-to-node? Why use PUTs for this??

That’s an interesting question, but it seems like streams are already moving towards immutability on the Internet. For example, most twitch.tv streams are recorded and users can watch past streams (with the whole chat experience and everything) whenever they want.

The Internet is increasingly becoming immutable, it seems to me. Facebook stores everything you put on it forever, even after it’s “deleted”. The idea of immutability is getting easier for people to accept as they are becoming used to it from the internet they already know.

This is really interesting. A problem I see with this is that someone can download the data, make a slight change, and reupload it – the hash will be completely different. When your original data rots away you may be misled into thinking it’s gone forever. Again, we would be feeding wrong user expectations about what “deletion” means.

This scheme seems to seek to answer the question of how do you distinguish between stuff which shouldn’t and should be deleted (like temporary files). And to me, it seems like you can make the distinction by whether you chose to pay for the PUTs, or not.


#7

Does it have to be a binary choice: data is payed for or not? Could there be different tariffs or could you get (part of) your money back when you delete your data?


#8

Ok, i thought that that thread wasn’t primary about storing the data (or deleting, thus freeing space), but rather about privacy (you can’t take a post back once published, even if you published it 30 seconds ago).


[…] sensors are churning out readings every 1/100 seconds. This sort of data will increase massively, but it’s dog poo: there’s surely no point in keeping it in its raw form?

you could eg

  • correlate it with other data, and generate new data by doing that. better data/resolution of the data would yield better results
  • generate statistics over large periods of time

#9

Yes that’s another issue that needs fleshing out, but I thought it would be useful to take a step back go over the benefits (and limitations) of immutability more generally, as that seems to be the way things are going.

Interesting, I didn’t know that.


#10

Yeah, maybe it’s just a cultural thing. Maybe we need to get used to things being public for ever after we posted it, we should think about the stuff we are posting before we are posing it.


#11

This definitely is 2 different discussion.

It’s true, but not sure if it’s such a good thing, to be inspired by :slight_smile:
Also, the word immutability can be changed for anything in that sentence.

The idea of surveillance is easier for people to accept as they are becoming used to it from the internet they already know.

I don’t think Facebook practices serves as a good measurement of what is desirable. They might be doing some things that are good, and some not, but that’s like all there is to it.

And besides, wasn’t it said that it’s boring to do like it’s already being done :wink: (just showing that that argument isn’t really good, it can be used in all sorts of ways).

The heaviest and most real arguments I see is:

  • Focus on releasing sooner rather than later
  • Network economy shows (?) that the numbers add up.
  • (Possibly) Data storage advances show that this is completely a non-issue (do they?)

(mostly in scope of performance / sustainability)

First point is not much to discuss IMO, only the most critical features qualify as to allow delay of release I would say. One thing at a time is a good device.
Second point though… The one thing that can show if it is sane or not, is the numbers IMO. Do the numbers add up?
And it is about this idea:

It is one fragment of the wider idea about the network economy;
If people are prepared to pay for the PUT, then the network capacity will be there (provided by those who want the rewards) to

1.Hold the data
2.Serve the data

In a sufficiently performant and sustainable way.

So. Is it shown that it is sufficiently performant and sustainable? Or have we identified where it is not, and clearly stated why we comprromise it, and what is gained by doing so?

That would, theoretically, then solve the question about excess duplication coming out of immutability.
If people pay for it, then there will be resources to handle it.
It’s not that it isn’t producing wasteful garbage - it’s just that, if someone thought it was more expensive to set up another non-SAFE layer of security and redundancy for their indexes and current state, then they would instead pay for storing it on SAFE, and that means the resources to cater for it will be there, regardless. Because people will want the safecoin, so they will contribute the resources.

In that case, we actually don’t need to say Don’t publish dog poo now. Because the idea would then be, Dog poo or not, If you pay for it, then resources will be there to handle it. But of course, not publishing dog poo, might be related to a knowledge that it is a waste of resources. Just because you pay for it, doesn’t mean it isn’t waste.

So, if I know humanity right, CryptoKitties and crap consumerism will drain as much resources as always, and there will be duplication of data (draining resources).

The most important thing is that the network can handle it.
Because if the useful stuff can happen, well then OK, the “useless” stuff in a way also enables it to happen.
Next thing (a bit more luxury), is to consider if it’s possible to come up with a smarter design. To not do as it is already done (costs didn’t prevent CryptoKitties). And somehow technically make it less wasteful with resources for humans to indulge in their stupidities (as they always have and always will). That’s a fairly large and imprecise issue, so maybe not realistic to think about for a very long time.

To wrap it up, my question would just be about the numbers, simply: Are they solid? And I would think that is where effort should be placed, just to make them add up. And that would redirect the entire question about immutabilty, towards the network economy. Just a redirect on to that: “It all works, look at the numbers.”
The rest is mostly various levels of philosophy, politics and metaphysics.

EDIT: (I am focusing mostly on performance / sustainability aspect here)
EDIT2: One more very strong argument: Simpler and more robust code!


#12

Again, we would be feeding wrong user expectations about what “deletion” means.

That is correct, but it is not black and white. You could have a mutable data implementation that is mainly focused on sustainability / performance, but could still have (slight) advantages on the privacy / security front.
Not having a guarantee that data you marked as deleted is immediately and completely removed from the SAFE Network (let alone that someone could have copied it before your deletion) does not mean that it hasn’t any advantage on the privacy / security front…


#13

There is many benefits to these possibilities, but these are also the possibilities that can turn “thought-to-be-harmless” pieces of data into weapons used against you.

I wonder if the ID that uploaded a public data is published alongside the actual data? If it is, then it means that no one can upload the exact same data twice. I mean that if an ID uploads information that “The head of our secret police took bribes.” the metainformation of what ID uploaded it is very relevant too. Now if someone else copies it and uploads it again, the information will have different metainformation. But can the uploader ID be hidden in the first place? This way the person uploading this stuff could protect herself against accidentally revealing her true identity and correlating and statistics would be a bit less effective against her.


#14

It was a post from years back.

Immutable just seems too permanent for some data. I think the devs accept this by suggesting it is covered by local storage, but that only partially covers the use case.

For logs or some such, local storage may be fine, but what about an instant message? How about logs which may need to be viewed by others? How about safe quick file share with someone? How about stuff you would rather not be remembered (something encrypted, but requires temporary persistence)? Etc…

Clearly, IPFS and other technology have identified this as a common use case, as their tech stack is dedicated to addressing it. IPFS seems to be growing in use too. Why not just have a way to cheaply publish stuff without guarantees to immutability? Seems useful to me.


#15

I think MaidSafe has suggested it is absolutely a possible direction, just that we need to take one thing at a time, and that thing can still wait. Even so far as: if community wants to delay release as to broaden the scope a bit, then it is also possible. But I mean, discussing this within the community, and how it can be solved, is still valuable work being done, which can be used later when it’s time.


#16

I agree, but given threads like this are being created by moderators suggests the message is a bit murky.


#17

I wonder if it’s not quite bound to be like that. It’s just a thought that hit me. Since there are so many possibilities, and unknowns of various sizes, and we all know we can’t say when things will be done - it’s been research and exploration.
So, to me it seems that it naturally follows, that it is equally hard to precisely say what the network will do, or what we can aim at. It’s quite possible that we talk about the future possibilities when describing the network, and that it isn’t at any time quite clear what of that will actually make it to the first release.
Then in the middle of it, there won’t be that much attention paid to things that don’t seem immediately necessary, so for example this question might not be un-murkied in the way that perhaps could have been. Not all gray-areas are delved into.

I don’t know, maybe it’s not like that, it’s just an idea I got.

While some seem to be waiting for the RFC about AppendableData before saying much, I for one don’t think it will cover the larger picture. I think it’s a much more narrow topic, I would guess it is more a technical specification of how AD should work. I don’t expect it to be about risks, benefits and drawbacks of immutability in a distributed autonomous data network (maybe I’m wrong).

Anyway, that could also be why it is maybe a bit “murky” still.


#18

Hmm, i would rather say, it’s the other way around, their tech isn’t able do real permanent data. But they still advertised them self as “The Permanent Web”.

IPFS provides an interface as simple as the HTTP web, but with permanence built in.


#19

Well, so much for a Twitter clone for now. :stuck_out_tongue_winking_eye:


#20

Yes, agreed - that’s my point. We can do non-permanent data in SAFENetwork in a similar way, for when people don’t want permanent data.

It strikes me that it is much easier to work from immutable to temporal than the other way around - it is easier to remove guarantees than add them.