RFC: ImmutableData deletion support


#1

An RFC about the possibility of delete Immutable data:

Very interesting conversation …

some thoughts and question…
.-A truism but, why call Immutable data if it ceases to be immutable? Maybe, if this RFC goes ahead, must change the name of something similar to Archive Data and let the Immutable Data only one who has Owner::Network.

.-Will be able to upload data with the Owner::Network activated?

.-This RFC introduce some security concerns. First less anonymity because we join data and owner and the possibility of block contents based on ownership. Are any solutions in sight?

.-Will be some easy way to take ownership of the data without reupload? If not DEDUPLICATION_LIMIT = 20 seems an extremely high number. Something close to 5 appear more reasonable. There is nothing more frustrating than possess the Datamap and cannot download data because they have disappeared.
In fact this RFC broke one basic commandment of the SAFE Network, if you have the Datamap you have the data.

.-This RFC let me mixed feelings. There are some positive aspects such the possibility of recycling old useless space but others generates me many doubts. I have the feeling that the network could lose some of its original magic.


MaidSafe Dev Update 9th August 2016
Editable data and rewarded deleting
#2

I really hope they can make it happen.


#3

True, in some sense. Wo

immutable -> cannot be changed, so this is no different. The data is immutable, but not persistent. To be persistent there are a couple of options as well (data_chains can make data persist etc.). IF people share it then it will persist as well. However ofr short lived (known to be) then perhaps???

I am also in two minds here though as you are and this is why the RFC process is great. We can id some immutable data, but brings a couple of neat points.

Refunds, if we want refunds we need to identify ourselves (oohps, so not great) We can potentially refund throw away address, but then it’s a rabbit hole. We can also use a SQRL type thing to id data and delete it if necessary, but with no refund.

So this may introduce 3 flavours of immutable data, persistent anonymous data and short lived owned immutable data as well as short lived owned anonymous immutable data. There may be a place for both? we do need to be careful though.


#4

I still think there is magic. Want something that can never be deleted? Then create a throw away account, post public, then forget credentials. This is only to incentivize older versions or unnecessary data to be deleted and for network itself to handle data in an efficient manner


#5

I think this is a misconception

There will be a significant amount of videos stored privately that may also have been uploaded by some one else. Maybe they didn’t know it exists publicly, or simply uploaded by another. Some may just upload the video into their private collection because they couldn’t be bothered searching through public data, or consider their tastes as rare, but more than 200 other people have the exact same tastes and thoughts. So some vids upload privately are actually 199 duplicates of the first uploaded privately.

The point PRIVATE != unique.

In fact APPs will be written to check if some public uploads are duplicates by encrypting the first 3 chunks and checking if they exist on SAFE. So just because there is only one copy uploaded doesn’t mean its the only use of the file


This is also a misconception

Back to the example of a contractor producing data (eg the latest logo designs) and he gives the client the data map by some means. It could even be sneaknet so SAFE never knows the datamap was copied. The contractor then says well I don’t want that anymore and “delete” So is the client going to find the works he paid for disappear???


Basically you cannot identify deletable data by its method of upload or the number of times de-duplication occurred. There needs to be a well defined method the uploader can identify the data that can be considered deletable. Maybe a “TEMP” flag that is reflected in the datamap so anyone who receives the datamap KNOWS the file is Temporary


Another misconception and a huge one.

NO NO NO wrong.

Back to my very simple example that we can see with the logo competition. Data file produced/created by “A” is sold to “B” and datamap given by a means that SAFE cannot track (messaging, sneakernet, etc) so in fact now “B” is a legal owner of the private file created by “A”. “A” decides to delete it, but has been paid by “B” and is not the rightful (morally) owner.

If the datamap indicated its a temp file then “B” knows to reupload it, or maybe the network can charge a “PUT” to remove the temp flag by a person who has been given the datamap.


Isn’t this losing anonymity by a thousand cuts

Maybe OK if file is uploaded as a temp file, since its expected to be deleted soon and thats the price one has to pay for the feature (potential minor anonymity loss)

But NOT for every file


TL;DR

Do not make automatic, but allow users to mark files as temporary and these can have a incentive of returning at least one PUT (& max 3) to their balance when deleting the file. One put if file one chunk, 2 if file > 3 chunks and 3 puts if file > 6 Chunks.

It has to be that uploads that are intentionally marked as temp because datamaps WILL BE shared outside of SAFE “monitoring” and some (or many) times morally changing the person the file is for (owner but not in attribute sense but morally, legal sense)


#6

I think there is a major under-estimation of de-duplication when the network is global. Many files will be de-duplicated a 1000 times, maybe 10,000 times.

Take faceless and the number of reuploads of videos and pictures. People just do not check and they simply don’t care, a pic might be a couple of puts so why spend time seeing if its already there.

The the other end of the scale the smart punters will use an APP that checks if a duplicate exists for files. The APP will self encrypt the 1st 3 chunks and request them, if they exist then it doesn’t upload them, otherwise it uploads the whole file. Rinse and repeat for the rest of the file.

Point is that de-duplication count/owners DOES NOT INDICATE NUMBER OF REAL “OWNERS” It only works if people stick to a narrow set of “rules”, but that will never happen. 20 even 100 owner list will never be enough. AND an owner list will make enemies of those who have moral ownership transferred to them and also those who use APPs to prevent reuploads of chunks that exist which saves bandwidth for the network, and saves them PUT charges.


#7

I’ve been a lurker here for a while, but just registered to comment on this.

I’m sure that the MaidSafe team has already thought a lot about the economics of various storage strategies, but I might as well add my own opinion.

Deleting immutable data seems very inelegant and fraught with complexities as detailed by other posters here.

I think the cost in safecoins to store data should mirror the actual cost as much as possible. Although data may be immutable in an abstract sort of way, the nature of our storage systems is very mutable and transient. It costs money to perpetuate data, not just store it.

So the cost model of data storage suggests renting storage, rather than buying it for all perpetuity. This might go against the grain of some of the design of the system, but it takes care of the storage bloat issue: old and unused data is just naturally reclaimed when no rent is paid.

This would make one aspect of the system obnoxious: you’d have to remember (or have some automation) for paying rent. But perhaps you could ameliorate that by giving the option of paying for extremely long time horizons, and also giving the option for “unpaying” rent.


#8

Great so see you post, welcome out of the shadows :smiley: I tend to agree with some of what you say, but this is not a point for you directly, but everyone really. The RFC process is not what MaidSafe as a company thnks or proposes, even when it’s a MaidSafe dev who creates it. There will and probably should be contentious RFC’s out there to solicit feedback.

As the community grows these will get more eyes on, it’s great the community see the deeper implications here and also the pro’s and con’s of such a scheme. I think it may lead to improvements, but do not get me wrong I completely share your hesitation here.

TL;DR MaidSafe as a company do not do RFC’s but individual Engineers do and that is fantastic. A MaidSafe Engineer carries no more weight than any other posting and RFC & we must enforce that realisation when we can. Spandan here has created a really neat RFC and I expect there will be a lot of debate on this one as there should be.

So welcome again, good to hear another voice, whichever opinion you have here, it’s always valid esp when put across in balanced terms, as you have. Thanks for that.


#9

The contractor client has to upload the data so that she is included in the list of owners and the data cannot be deleted from the network.

There is a threshold in the owner count above which the owner list is not maintained anymore and the immutable data cannot be deleted at all (DEDUPLICATION_LIMIT=20). To put it differently, above this limit the behavior remains the same as today.


#10

Some good points here.

Moving the data about is the cost, not the storing of it. Maybe popular data should drift towards archive nodes as a result, rather than worrying about removing it (to cope with churn).

Additionally, if we can get a rental model to work, while retaining anonymity, it would certainly help to resolve the issue. Maybe balances should be allowed to go negative, but files can only be retrieved once the balance is payed off and/or data is deleted. The would give an incentive to clean up old data, without risking accidental loss.

I suspect retaining anonymity is the primary problem with the rental model though.


#11

I don’t know why this discussion is still ongoing, the concept was to pay once and retain forever. Any other reason than a necessitative technological one would be to retroactively alter the contract people have entered in. And which this project was pitched in.


#12

People bought in a proof of concept, so I don’t see the problem. There was never a ready SAFE version. There is still none.


#13

The rental model has a lot of technical problems, not only anonymity but associated to the necessity of introduce time, more metadata and much more control (more messages between nodes).

The pay by delete model is more clean and safe, but have problems too.

Maybe, as David suggest, the solution will be the creation of different Immutable Data similar of different Structured Data. An perpetual immutable (anonymous and undeletable), an owned immutable (not anonymous and payable by deleting) and an anonymous immutable (anonymous and erasable).


#14

Assuming the primary cost is moving the data, perhaps storing stale data on archive nodes is sufficient. Storage is cheap when the data is dormant.


#15

Currently it is possible to update at no cost a file of size < 3076. I wonder if immutable data deletion will allow the same for a file of any size. The procedure would be the following:

  • copy the data map of file in a variable

  • delete the file with NFS API

  • apply immutable data deletion to each chunk stored in the data map and get a refund for them

  • use this credit to create new version of file with NFS API

Will this use case be supported? In particular is the refund equal to the cost of a put?


#16

One of the main concepts was to end the plague of 404.

For many it might not seem important, but we need a system where published information is available for the coming generations. The current internet fails in this aspect completely.


#17

Thanks @dirvine for the information.

Having a bit more background now, I think I’m more on the side of not allowing deletion.

Some minor points on a rental model:

  • Privacy probably isn’t a huge concern. If you want to re-rent data space, just publish the same data with a nonce.
  • The current economics make bandwidth costs much more a concern than storage space, but I’m not sure that will hold for the future
  • keeping track of time in computer systems usually is a huge headache, but it seems like in this instance, it should be fairly simple

Is there an existing thread about the finer points of Safe Network economics? Such as, how to incentivize automatic data replication based on frequency of Get, etc.

Perhaps we could deploy a Safe Network based on data permanence, and then some time in the future port the safecoin ledger into a newer Safe Network prototype based on a rental model if storage bloat seemed like a legitimate concern.


#18

But under the current scheme the client only has to be given the datamap in order to be given the file. But if deletion only checks for those who uploaded then sharing datamaps is no longer a reliable way to send a file to another person. We will then require people to reupload a file they don’t have. How do they do it? Do they have to be sent the datamap and then another step of reuploading the file they were just sent.

There will a lot of sorry and angry people who get sent a file (send datamap) and find it gone after a while.

Ah OK that makes sense. But still keeping a list is expensive and a degree of natural anonymity is lost. People who should be concerned but do care enough will use the same ID for many files and linkages can be made. Thus loss of anonymity by a thousand cuts.


tl;dr

Let the uploader decide if the file is temporary and they have the right to delete. Keep the 20 list as suggested but only for files specifically designated as tempory (capable of deletion )

And if you are sent a datamap of a “temporary” file then you know that you have to re-upload it.


#19

The condition is the same as structured data where a public signing key can also be provided by the uploader, so no regression here.

But anyway, let’s call network owner mode the mode above DEDUPLICATION_LIMIT threshold. In this mode the owner list disappear completely. Maybe we can ask MaidSafe to develop 2 PUT commands and in one of them the chunk would be created directly in this mode (and the other one could be called temp mode and this is what you were proposing in your post above).

Yes, the chunks will have to be reuploaded to be sure that the files don’t get deleted, unless the original uploader is known to upload files in network owner mode.


#20

Yes it would be an attribute when using the API I would expect. Also network ownership would be the default for public data and lets work out if private should default to temp or permanent. (I would argue for default to be permanent/network owner)

I think the choice should be in the hands of the uploader.

I see the biggest problem being that until now we have accepted and worked with the concept that to copy files one gives the datamap and that makes the system quick, slick, and economical. But if every file(chunk) is deletable and thus considered temporary then we have to add extra cost to the users to re-upload the chunks so they can be assured the file will not be deleted while they still want it.

[EDIT: also how many times will someone send a file to another then delete it thinking the sending was actually giving the file to the other person. And so when the other person receives the datamap (days or more later), the chunks are gone. Lots of angst against SAFE]

Yes my question was more to point out the extra major step for the user just to be sent a file. Its also extra work for the nodes every time a file is “copied”/sent to another account or even another directory in the same account. It can get quite confusing and often I’ve deleted a file not realising another package needed it, so got in the habit of copying files for little used packages rather than have the little used package access the other directory. (obviously for big obviously used elsewhere files I don’t)

The nodes will have extra work for all the files being sent to other people (or “copied”)

Best to make it optional and not the default. People can be encouraged to store as temp with the incentive being they get a portion of puts returned to their balance when deleting.

Not really, with SDs you understand that when you create the SD and its a necessary attribute when you WANT to be the OWNER. But immutable files (public especially) you ONLY want to be the “OWNER” if you WANT to delete later on. So rather than forcing ownership when the person doesn’t want to be owner and reducing the previously understood absolute anonymity “by 1000 cuts”, lets make it optionally allowed to be deletable and have a ID associated. I liked the idea of complete anonymity by default.