New data type: Free Data with PoW signature


#1

I’d like to discuss the possibility for a new data type for the network. I’m aware this will be controversial, especially for the potential for abuse. However, I believe that the benefits outweigh the risks, and that the risks can be mitigated.

We have two types so far, with the following characteristics:

  • Immutable Data

    • has no owner
    • large chunks
    • pay once
    • live forever
    • guaranteed storage
    • content-addressed
    • best-effort caching
  • Mutable Data

    • signed by owner
    • smaller chunks
    • pay on create and (undecided?) on change
    • mutable and/or versionable
    • deletable by owner
    • guaranteed storage
    • address is assigned (selectable? I’m not sure)
    • limited caching (because mutable)

I believe we could use another type:

  • Free Data

    • signed by owner
    • very limited size (under half K or similar)
    • free to create (PoW signature to limit abuse)
    • immutable but deleteable by owner
    • can explicitly expire
    • no guaranteed storage (“cache only storage” but read the notes below)
    • content-addressed (primary ID for caching)
    • addressed by owner id (“signed by”)
    • optional secondary addresses (“link” or “index” types)
    • extensively cached (the size limit helps here)

As this data type would not incur storage cost for its creator, it could serve as the backbone for search indexes, among other things. The index could be searched by a mix of keywords (secondary address) and creator ids (index providers) so users could freely choose between sources they trust.

Safecoin and similar transactions could also use this type, making them free without having to use an exception for MD. For this, however, PoW should come in two flavors: one with low complexity, matching the description as it is outlined above, and another with high complexity, that would guarantee storage forever. The requirements for these two levels would be negotiated within sections based on recent usage patterns, similarly to how PUT costs are calculated.

Since the size of this type would be very limited and entries could be freely disposed of (starting with those with low value, LRU) their cost to create, paid in CPU cycles, would be much lower than what would justify handling Safecoin transactions for. It would also explicitly remove the potential problem of “paying with Safecoin to pay with Safecoin” cycle.

Problems:

  • I’m not sure how to incentivize caching this type of data or disincentivize early deletion.
  • (more may be added as they come up in discussion)

#2

This in my opinion is still possible to abuse. Especially as its unlikely to find a way to control it when spare space is low.

The cost to the sections to maintain a huge number of these would also be high. And an attack would do this without causing much of a price rise (if any) in PUT cost either since very little space is used, but high network cost to maintain them. Very effective amplification attack.

Also the PoW must not be high if you expect all sorts of clients to be using SAFE and contributing to useful uses of the “free data”. A lowly phone must be able to do it in a very short time. Afterall phones will likely be a very large portion of clients and account holders.

Even having two levels of PoW only opens it to greater abuse once the attacker knows which will only require the lower PoW. Then have 1000 high performance machines constantly generate free data that has xor addresses that fall in one sections control. (you know with rotating IP addresses via vpn etc etc)


Considering the network cost to maintain these tiny data blobs, it is still reasonable to charge to store them.

Just have the temp data concept and store a coin-account ID with the data,

Thus have this.

  • user (or APP) pays the PUT to store the data blob.
  • When the APP has a collection of these blobs then it packs them in some format into a single chunk and stores that.
  • The blobs are then deleted and the store cost is halved and returned to the IDs that paid to store each data blob.
    • you maybe could return the whole put cost used to store each one to each ID when the blobs are combined. Need to examine the effects of returning 100% or 50% of the PUT cost to store. NOTE: cannot use the current PUT cost as that can be an attack vector in and of itself.

Thus the attack vector is removed/reduced and no need for PoW which on the scale of world wide (billions of users) would produce so much pollution that its criminal in my view.


#3

The PoW (“hashcash”) cost is controlled by the sections and it can be set so as to be to make it costly enough for a would be attacker (who would need to create many entries to be effective) while leaving regular users, who create just a few entries during normal use, practically unaffected.

Moreover, the cost needs to be set so as to balance supply and demand. As a section is running out of space, it keeps raising the cost, thus slowing down the creation of new entries until a new balance is reached. (As the entries are already cache-only, the balance is something about expected median lifetime in the cache.)

I’m not sure why you assume a high network cost. Whatever they would be used for would incur network cost regardless the implementation.

An important differentiator from MD is that these entries are not modifiable, so it’s much easier to cache them. As the entries would be cached opportunistically as they travel through the network, subsequent queries touching the same XOR path would return earlier, so we have less network traffic.

It was explicitly stated that most of these entries (those with low complexity PoW) can be freely deleted, for example to store immutable or mutable data blocks. It was also presented as a cache which implies a limited size with the least used entries replaced by new ones.

Lowly phones have GPUs faster than supercomputers in the '90s. Creating one entry (e.g. for a payment transaction) is feasible within seconds or less while the cost can still be kept high enough to make the volumes required for an effective attack out of reach

If the XOR address is a function of the content and the signature appended to it, then this attack is infeasible.

One of the intended benefits of this proposal is removing the “paying for paying” loop without giving special treatment to payment transaction MDs.

Yes. I would have an extremely convoluted process that fails to replicate much of the benefits from the original post.

  • a possibility for distributed search
  • limited lifetime for things like index data
  • storage optimized by data access patterns
  • less maintenance compared to MD
  • more efficient caching because it’s read-only

I share your sentiment about pollution. It’s one of the reasons why I believe bitcoin as it is can’t solve anything.

However, PoW in dis context is very different from how it’s used for bitcoin because it’s not about winning a race. It’s just to make a certain kind of attack a bad idea.

Think about the Cold War for an analogy. Nobody had to use nukes because everybody knew they could gain nothing by using them.

Similarly, if we have PoW protection in place, there’s no point launching a flooding attack because it’s already known it can’t possibly work because the costs would rise through the roof. PoW cost can remain low because it could become high.

(Also, the attack vector was not there in the first place as I explained above. Also, that’s not an amplification attack.)


#4

It is when the attacker makes multi millions of them with very similar xor addresses. Easy to do when of a few bytes. The section has to keep track of them and do consensus on each creation/reading of them. extremely inefficient and when you have a few its OK but when you have multi-millions in one section then that section grinds to a halt while the attack is on. Then very low as each is being read. And high cost for the section to maintain 8 copies and indexing of them. High cost to the section.

As I said the PoW has to be fast for phones or no one will use APPs that create them. So a high performance desktop can swamp one section with millions over a few days or less.

Then have an attacker use a modest amount of a few hundred of these machines and bring a section to its knees. Increasing PoW would then mean any other users creating blobs with similar XOR addressed blobs would be inordinately very slow.

I can then see this happening over many sections and SAFE is slowly brought to a grindingly slowness for any APPs that uses these blobs. Didn’t you mention search engines and of course the associated usage rankings etc.

No the solution is not a polluter but just charge for the blobs and refund some/all when teh APP combines the blobs into a single chunk.

Only if the attacker was stupid enough to do that.

But still much slower than any suitable desktop. And phone battery must be considered. So when you have a major APP (Search engines) needing this so that phones can contribute to the search engine scores then the PoW still has to be easy enough to have phones use the search apps.

Nope you said the blobs can be very small, eg a few bytes. That then means that creating blobs with XOR addresses close to each other is easy as pie. The smaller the blob the easier it becomes until you get too small. So say there are 1 million sections (>30 million nodes) then you simply divide the xor space into 1/million and ensure the resulting XOR address is in that millionth.

And the only cause of concern I have at this time. Too easy to abuse. Add in payment and refunds and upto APP to combine in an indexed format and you have a great idea

NO just have the users pay for these blobs. And simply refund when deleted.

It is the APP that simply combines them into a chunk. To read then simply treat the chunk as 1000 or 10000 blob array.

No convolution


#5

Don’t get me wrong, its a good idea if you remove free and have pay/refund


#6

Address: 64 bytes. Owner: similar. Payload: similar. Signature: similar. We’re up to at least 256 bytes. Add to that a few bytes of state info (e.g. expiry date, PoW complexity, etc). If we have a limit of a half K that leaves space for 3 more hashes to link other pieces of data and a few spare bytes.

Not to mention, the assumption that hashing cost would be upwards limited by data size is unfounded. It can be set as needed.

signature = EXPENSIVE_HASH(data)
address = HASH(data + signature)

This is what I’m talking about. The address itself is a function of the expensive hash so you can’t just generate data in random, throw out what doesn’t match your XOR range, then do the heavy lifting just for the rest. You gotta calculate the expensive hash to get the XOR address itself, and that increases the cost of computation a million fold for your example.


#7

But still just charge for the put and refund and there is no avenue for high performance desktops with ASIC to swamp sections.

As I said you have to have PoW work on phones so high performance desktops + ASIC can do many decades more than a phone.

Even if you have to create a billion of these to get a million blobs in 100 sections continuously will cause problems forever in the SAFE network.

Remember its cumulative and not solvable by more space. Only dilute it by having more sections. And an attacker doing it continuously will cause any APP creating blobs to run super slow (and to slow down too) for those sections.

Remember that you aim this at APPs like a search engine that creates blobs and one use is for ranking sites. So if you have 1 million sections and 1 billion users then every day there are a few billion created just by 1 billion users using a search engine a few times a day. Add to that an attacker targeting a small number of sections then parts of the network just slows down in a few days with indexed blobs that never get deleted.


#8

First, just wanted to say that this is a very creative variation on mav’s suggestion for having free PUTs. Nice work Joe. There are a lot of other neat ideas here, like “refund”.

However, I think there may be other more basic ways to achieve the same goal of easy write access. The common theme throughout both conversations is giving the user/client the write access to the network even if they don’t have safecoin initially. PoW is one solution, but wasteful unless the work provides a benefit to network operations. After a little brainstorming on the free data concepts I think the simplest answer is to introduce a means for clients to earn safecoin via computation as part of the proof of resource protocol. That way they can still earn by contributing clock cycles if the amount of storage or bandwidth on their device is limited. This would also demonstrate a very basic distribute computation feature of the network at launch.

This might also work well for new account creation. When new clients sign up, they could be given some initial computations to do that would end up generating a certain amount of starting safecoin balance in their account.


#9

Not everything can be implemented efficiently (or at all) on ASICs. Your argument is technical (thus solvable) not theoretical.

I guess that’s what a leaky bucket rate limiter as a first line of defense can solve easily. The PoW cost is adjusted based on how many request are being dropped by the limiter but the section doesn’t get compromised because those requests are already dropped. The attack would amount to a very short-lived DoD attack before it becomes infeasible by the PoW cost raise.

EDIT I think I responded to something else. To answer what you actually wrote, an attacker can’t target a section (can’t get the address without performing the PoW), therefore the efficiency of any such attack will be inversely proportional to the number of sections on the network.

You still seem to ignore sections can not be targeted cheaply because the address is computed after the PoW is already computed.

EDIT It would also be possible to split the PoW:

signature1 = EXPENSIVE_HASH_1(data)
address = HASH(data, signature1)
signature2 = EXPENSIVE_HASH_2(address, section_key)

Where section_key is a time-dependent random value, refreshed every 5 minutes, accepted for 10 minutes (these numbers are examples). This would make it impossible for a would-be attacker to pre-generate a large number of entries over a longer period of time, and then flood the network with them.

Why do you insist every search has to create another piece of data? Yes, that would be a possible way to implement a certain feature in a search engine, but it’s clearly not an absolute requirement for all possible implementations.


#10

Site ranking. How many people visit a particular site is an essential attribute that allows search engines to effective curate their lists. To prevent that from being used once you provide the functionality is silly.

Anyhow if you can create a PoW that works on cheap phones (without fast GPUs or ones that cannot be used for your function) and be ASIC resistant (If desired then an ASIC can be created) and not cause phones to crawl and fast desktops with 4 GPUs (eg 1080s) plus non-existent ASICs hehe) cannot go fast in the calcs AND use almost no power doing so, yet is fast but not too fast in operation AND sections can limit the number of blobs yet the APPs still work then maybe it might work with FREE data storage.

I am simply looking at the logistics and have no need to know how good your functions are because the environment it has to work in shows that it can be exploited. Once you crack down enough then you greatly limit its usefulness in applications and cannot be used for the applications that would gain the most use from it.

It has to work on phones that may not have usable GPUs, that are not running fast 8 core CPUs but merely 2 core 32 bit ARMS and not cause delays in the user experience. ie sub 100mSecs

Then you have to stop high performance desktops that can calculate at massively greater amounts and HOPE that no one designs an ASIC to do your function. (Yes an ASIC can be made to do any of these if the price is right and some govn attackers have large pockets)

For vaults we know the computer/phone has to met some performance specs and so knocks out all these low end devices and the range is magnitudes of order less than your suggestion here. And on top of that vault initialisation can take hours unlike the web APP the user expects “instant” response from.

From an engineering overview free data isn’t going to fly no matter how good your functions are. Unless of course you can prevent APPs that want to make terrific use of the functionality. IE make it a lame duck feature that only well behaved and low usage APPs can use, which of course is silly.

tl;dr
Just have the blobs paid for upfront and refund when they are deleted and it could be a terrific feature


#11

I’m not smart enough to take sides here … love the discussion though - in the end it’s helping to build a better understanding of the strengths and weaknesses of different models and the Safe Network itself.

In relation to phone speed though, what about doing pow’s in advance? Or is that not possible in this model? E.g.: https://developers.nano.org/guides/proof-of-work/

Also, if both pow and pay-refund have advantages and disadvantages, what about combining them in some way?

Edit … here’s a crazy idea, if a lack of funds is a problem for pay-refund, then what about some sort of credit system? Build up too much debt and you’re booted (or perhaps just prevented from doing more until you’ve paid up or deleted data and refunded).


#12

Or the APP pays and then consolidates the blobs into chunks and gets refunds from the blobs thus deleted. So then there might be 10000 blobs created then refunded when those 10000 are indexed/combined into one chunk. This is what I’d expect a good acting APP to do if it requires a large number of blobs.

So effectively 1 PUT charge for 10000 blobs written.

And absolutely no need for free blobs. Just pay and refund which solves the attack vector and eventually for a large blob usage APP only charge is for the consolidated chunks and teh few blobs not yet consolidated.

Still provides an attack. Just precalc PoW for a few months and then attack.


#13

I like the idea of a trusted yet limited referral program. If you’re a trusted node (certain node age) then you can gift free account creation but they should somehow be limited in number and not be able to accumulate to an amount that could used to attack. If kept limited in number even if an attacker somehow cons enough people to gift them a decent amount of account creations it would still only be enough for account creation which is relatively limited, I reckon.

Besides account creation, what really is the barrier to entry? You get some people working hard to be good nodes to get others online and they get a free account (hopefully with vault software) so they can do the same. If the goal is to get people on the network and the ball rolling.

Having absolutely free data type even if limited seems like a hefty challenge. Nice to see it being thought about though @JoeSmithJr . Think I need to actually refresh myself with the OP since I read it late the other night.


#14

And of course we could have the charge for a blob as a fraction of the PUT charge. Say 1/3 or 1/4

Thus the usage of blobs is less stressful for ordinary user. But not small enough to be a real benefit to an attacker over spamming chunks


#15

On the other hand, PoW for small stuff would reduce the stress on the network because it’s a simple hash check instead of a complicated payment transaction that involves coordinating with at least one other section.


#16

How about increasing numbers of pow proofs with every x number of free data puts with the ability to pay a charge to break out of the loop.


#17

Because a payment transaction is not that costly on the network. Both require consensus, and payment is simple enough. But the free blob means that it will be attacked and all those extra consensus due to the attack will be much more than any minor work for payments.

Also all the PoW is more expensive on the users slowing the APPs performance, and wasting energy. Its the attack vector that makes free fail.

KISS, really I mean that KISS - to use pay/refund means you are just using the code that is already there and adding a subroutine call for the refund - just payment in reverse. Anyhow as seen in Fraser’s safecoin refund functionality has to be provided for when there is an error in receiving ID.

To have free then you need to add network code for asking for the hash, network code to check the hash.

And if the section is just doing a “it’s a simple hash check” then another attack vector is to have good acting nodes and submit blobs that will land in one of those sections. The good acting nodes know the required hash and the attacker can just use that to send off blobs. Thus bypassing your PoW and full steam ahead flooding certain sections. No the section has to do the PoW calc too so that the attacker cannot know the hash to insert from the node they have in the section.


#18

We don’t even know what payment transactions will look like.

What we know is that they require work from at least two sections: one to store the data, another to handle the payment. That’s exactly double of what PoW required.

Don’t get me wrong, I’m all for payments when it’s worth the trouble, I just don’t think that covers all cases.

“Let’s just add some more work.”

That’s really not all that complicated.


#19

Thats rich because your free method add PoW in the client and adds 2 extra unwritten functions in the code code. Whereas Mine is simply a line of code to call the refund function (really just the spend in reverse) And then your method of free has to write code to attempt to stop attacks of it. And as I showed you need to do the PoW in the section too, otherwise it can be bypassed.


#20

Free is not always free. I tender that the energy cost (real $$$) of your PoW will be more over time than the cost of pay to write blobs with refund on delete.