Abusive scenario about de-duplication

Here is the scenario. I know I’m like a bug. This one it’s more a question. User A put file X on the network privately. So he pay for it. User B put the same file on the network privately. I assume that it will pay for it too. But the file is de-duplicated.

Based on my understanding the location of the chunks is based on some kind of hash. So the location is the same.

Next scenario is the same than above except User B look first for the availability of the file (with the address of all chunks) after all the encryption and hashing stages are done and look for it before storing it on the network. He found them (the chunks). Instead of paying to store file X he pay only to update his Data Map.

That can be done too if user B delete file X, store the same file again and he don’t pay again.

Correct me if I’m wrong please!

It’s a case that can be happen?

1 Like

Why is this a problem?

The first person stored his data and paid for it. Just as he expected to do.

Person B stores his data (no matter if he somehow checks first) and since that is de-dup he has nothing to pay.

This happens in real life too. Group A pays for a statue to be placed erected in a park. Group B had same idea later on and when checking on the park saw the statue was already there, so they pay nothing.

NOTE: it is my understanding that de-duplicated data does not need to be PUT so nothing to pay.

It’s not a problem @neo, but it’s kind unfair in some way. Don’t you think? That can be abusive no? I just want to know if it can happen. I want this SAFE Network fair and secure for everyone.

EDIT: Maybe it’s already there and it’s done like that. I don’t know. That is what I want to know.

NOTE: it is my understanding that de-duplicated data does not need to be PUT so nothing to pay.

EDIT: That may be true too.

After thinking about it. With my understanding, there is no way to know about it. (I mean on the other side. Not on the client)

I am not sure but think you pay for PUT whether chunks already there or not, in which case this scenario is correct.

I agree it creates unfairness, but think about it. Would you bother? How often would it save you? It depends on the cost of PUT and on the likelihood that there’s data will already be there.

I think it’s certainly something for us to be aware of.

Good one!

2 Likes

This is correct. The data manager won’t charge for a PUT it already knows.

1 Like

Actually it will, locally you don’t need to put more than once but on the network you will. The first uploader though will get rewarded via pay the producer. It’s important everyone does pay as it means we can maintain a no search allowed network. This is a big thing that prevents a lot of abuse.

If you are copying a file from SAFE though or storing your own copy there is no charge (all you need is the datamap).

6 Likes

@dirvine

locally you don’t need to put more than once but on the network you will

and

or storing your own copy there is no charge

What does this mean? What else would you store but your own copy?

1 Like

Better to think like this, For any data you store from outside SAFE then you pay. Any data you store/copy from within SAFE is free.

3 Likes

Is that because you are just copying the datamap rather than a physical copying by copying contents

What does this mean? That you can’t GET individual chunks?

Yes, otherwise (and we did implement this to try), the ClientManagers need to keep copies of names you have stored to check payment (huge state and security issues as well).

This way though it’s very good as it maintains the no search allowed rules. It is again the thing not yet used enough, there is never a reason to search on the network, so if somebody asks for something not there, then they are trying to hack and will be disconnected.

4 Likes

Only if they exist, so you cannot try to Get chunks that may exist, they must exist.

3 Likes

Thanks for clearing that up.

As you can see these threads are useful, they clear up things for everyone.

7 Likes

No problem. I have never made enough of this facet, but it’s very powerful. Imagine trying to guess login details when with every wrong guess you are disconnected, etc. There is more to it, but the weird network behaviour is that at the data level in routing you always know the data exists, it must. So any type of search at that level is banned.

It’s unrelated to searches in upper layers (for indexing data we are fine, then search indexes). So does not mean we cannot have search engines, just they will search over known data. I hope that makes sense. I realise I am not explaining well here, but in the middle of crust/routing API issues so will come back no problem and go into it more deeply. It is it’s own thread I think though and very useful.

4 Likes

So if I have a file (from outside SAFE) that I know is likely already on SAFE (publicly), I have the choice between finding it’s datamap on SAFE (via a search index for example), or paying SafeCoin for “uploading” it? Seems a bit odd from the user perspective, but from a security point it makes sense.

It does make sense, the explanation is fine!

2 Likes

if i understand correctly, it is not odd
i think it’s like hitchhike, when you jump into someone’s car, arrive at destination, pay him a beer, and then wonder why the heck did you give him the beer, because he had the same destination as you, and he was already traveling to the destination by himself. but the answer is that YOU needed the ride, that’s why you paid. you pay because you need the datamap

1 Like

I like it :smile:

I agree these discussions add to my understanding too.

2 Likes

But, if you use two accounts? One ask for the chunk and if disconnected the other know the chunk don’t exist. An alternative client can make that very easily and you never pay for data already uploaded.

Yes I agree, but it’s a lot of work, a security issue, it makes things very difficult. So you could create a client, do a get, disconnect etc. So a hassle, then all you would achieve in the end is not paying for something already paid for anyway. Doing this for each file / chunk you want to try and store will be really annoying. If trying to say locate a possible user account then there is a space of 2^512 to cover, it’s a lot of disconnections for sure. So there is a gap, but it’s one of those, if somebody wants to try that they will upload at factors slower than if you paid, then to have to pay you need a registered account etc. so doable but …

I should add we don’t need to tell you that you are disconnected, the network can keep you connected and bin all requests etc. So a lot can be done, one idea is to always give you an encrypted “rick-roll” :slight_smile: There is a lot of things that can be done here, but at least we know what is happening.

4 Likes