Next step of safecoin algorithm design


#61

What about indirection?
I may sound like a broken record on this topic and past indirection discussions, but I think it would be good to take a hard look at intra-section chunk redirection strategies as a means for achieving multiple objectives. The following graphic is a bit dated, but in principle it would seem that redirection is feasible.

If I understand this correctly, the Pmidmanagers will make an attempt to PUT a chunk on the PmidNode. If they receive an “out of space” error then the PmidManagers would select the next nearest PmidNode in XOR to send the chunk to, and so on.

In the past, our talk on “indirection” was about finding protection mechanisms for diverting chunks from sections that were filling up too fast (too stressed) to neighboring sections with excess capacity. I suppose the “redirection” mentioned above is just an analogy for offloading filled vaults to vaults with excess capacity, within the same section. A preferred nomenclature for both strategies might be “inter-sectional and intra-sectional load management”.

Are we off-topic yet?


#62

This might get a bit off-topic, sorry for that, but the following worries me from the user perspective:

I don’t want to think about the price every time I’m going to upload a file. I want to have an automatic backup of photos from my phone for example - without having to worry how much it is going to cost. On the other hand, I understand that some price fluctuation is inevitable. Maybe there will be possibility to set a price limit in an app that handles the backup, but then I never know when the next backup is going to happen. The need for the user to worry about the price might undermine the longer term appeal of the network for the general public.

EDIT: Would there be any way to “pre-order” for example 50 GB of storage in a set price?


#63

I wonder @mav if this is worth diving further into. I do like the overall objective here but have an aversion to magic numbers as you will probably know by now :slight_smile: So in that vein, I propose

On joining a section (new node or relocated one) then that node issues a vault space available measurement for itself. Every node in the section then votes for that, but also issues their own vault space measurement . These votes will accumulate and then be known via PARSEC to us all.

Then the nodes agree on a value for min vault size (in terms of space) and this is calculated as 50% of the median value of the node space in the section.

why median
To prevent outliers, especially huge nodes (centralisation) and small nodes having too much an impact.

why 50%

To prevent at least half the nodes being immediately excluded. The 50% value says to smaller nodes, if you cannot have at least this amount then you must leave. We will know the vaults we should exclude with this measure as they voted via PARSEC to tell us their space.

Occasionally this may be more than 1 vault, especially at start up where this measurement like many others should not happen before we have a section that has group size elders, none of which are infants. When it is more than 1 vault, we would have to do something to soften this penalty, but we cna leave that, for now, to see if the basic idea seems sound?


#64

I was thinking about this today and what you have said sounds like a good starting point. I suspect though that we should also be asking the neighbouring sections so that no one section gets overly large or overly small median size compared to the network as a whole. Statistically we may get the situation that at some point in time one section in the network could have a very low median due to the vaults being relocated to it and suddenly get an influx of chunks and MDs being stored/created in that section (maybe 30-50% above average) and the section finds itself lacking.

So by taking notice of the neighbouring median size we will actually see the network having a measure of consistency in the median size, yes there may be some sections still with double the median size of other sections. Without that though we may see many times the differences between some sections. And because of that we could problems of multiple exclusions when sections of large median size difference merge


#65

What prevents cheating the network with a non-existent space?


#66

I’m not into the details of the discussion covering vault size etc, so probably nothing new here.
But I also think how much not yet used space a vault says it has, that can be easily cheated.
So the ‘space measurement’ should try to avoid rewarding vaults which indicates to have much more space than other vaults, certainly if that space isn’t needed immediately.
But such a ‘vault space available’ can certainly be useful, if the potential reward is in proportion with the ability to check/punish in the near future as the total number of chuncks a section has to store, grows.


#67

While it is an infant and its vault not contributing to the network have it store a series of chunks equal to its reported size. Then challenge it with asking for a hash of certain bytes in each chunk. Time is not such an issue as it has to do this before ageing. And it also tests bandwidth and cpu performance.

This can be repeated later by asking for hashes of other bytes in each chunk.


#68

Sorry for neglecting to put this in the list, my biases are showing! I don’t really see indirection as a viable solution, but am open to being convinced. Redirection delays the problem but doesn’t solve it. There’s still a need for more space in that part of the xor graph (and a need for reduced complexity in xor space by eventually removing redirects). Seems like it’s in the short-term-pain for long-term-gain basket along with price increases. Possible for sure, but imo not worth the extra complexity.

Sounds good on first impression.

What about rather than voting on ‘minimum vault size’ instead vote on ‘target vault size’? Use that target size to adjust the rate new vaults are accepted / disallowed (and thus the degree of dilution of chunks)?

In the ‘vote for minimum size’ idea the tendency will probably be to err toward voting higher rather than lower since voting lower risks being excluded but voting higher does not. Maybe there should be a small penalty if you happen to vote more than x% above the median for this round. Just enough to reduce the tendency to hedge your bets on the higher side when voting. Really large vaults should be happy with the compromise of being penalized now for voting big vs being able to store more chunks later and capitalize on excluding smaller vaults from rewards.

I’m cautious about voting because it can lead to betrayal in confidence, and confidence is critical for the success of the economics.

For example, a large part of the confidence in bitcoin is that it’s delivered the promise of 21M scarce coins. Keeping that promise is extremely important to the ongoing survival of bitcoin. So is the promise of a small block size since this is a (slightly weaker) promise that bitcoin will remain accessible to most participants and not just data center operators. The promise matters.

Voting allows those with the most resources to have more power, and over time even a very gradual compounding of power can lead to runaway consequences. Voting reduces the confidence in promises so I’m pretty wary of voting mechanisms.

I’d be interested in the possibility where rather than just next-in-line vaults adding their vote also allowing many potential next-in-line participants to have a vote toward median size. But this has the problem of not being supported by proof and / or work so is open to playing games. The idea being those who want to take part in the future should also have some say in the current state of the network so that when their turn comes they aren’t disenfranchised.

Currently yes reported spare space can be cheated, but the work by chia.net suggests that spare space can be measured accurately without cheating. Have not yet seen any implementation though so as of right now you’re correct. I would love to see this work since being able to know how much spare space is available makes planning and understanding stress much simpler. But right now I would say it’s not reliable enough to build an economic system around measuring spare space. Maybe that will change.


#69

While it is an infant and its vault not contributing to the network have it store a series of chunks equal to its reported size.

That is a possible test, but it taxes the network a lot, even if spread over a long period of time.
However it also tests possible maximum upload volumes (like 150GB a month).


#70

If the code was smart then those chunks would include the chunks that will be required to be stored in the vault anyhow (as in the sacrificial chunks) but not relied upon nor asked for (yet) when a GET occurs, which will only occur when it reaches child age. Thus some of the load is going to happen anyhow and the vault comes up to speed earlier as a result.


#71

Maybe some of these test chunks can be useful (later). But only until the next relocation of the vault.
And what if the spare space is much bigger than the total size of unique chunks in the section.


#72

There would be a way crypto style to get a device to fill its data space (vault) and it be too difficult/long for the vault to regenerate it on the fly and cheat. And the parameters to fill the space is much smaller than the data itself to save on bandwidth. Any proof of storage is just what needs to be solved and I gather it has elsewhere. Thus the vault has to prove it has the storage and if it takes hours/days to fill its vault with the data then it cannot cheat. in a reasonable fast time.


#73

Ok, I don’t know anything about how files actually work, but would it be possible to send a compressed file (ZIP) and have proof that the vault has had in extracted? That way we could use less bandwidth and have the space measured?


#74

The cheating vault stores the zip file only and uncompresses it when asked for proof and does this on the fly. Thus it only needs to store the smaller amount and thus cheat. It needs to be something that takes longer to fill the disk than is the timeout period of the proof. That way the only way to beat the timeout period is to actually store the data on disk.


#75

But is a wasteful of resources for the network because the infant node will be relocated before became elder and all this work will be useless. Besides it’s not only work for the infants nodes but for the rest of the nodes of the section that must deliver the data and control all infant node.

The truth is that most of those ideas seem to me a regression from the use of sacrificial chunks that is a very clever idea that solves several problems at the same time.


#76

I came up with this quick idea for verifying the free space of vaults. It is inspired by the Chia thing @mav mentioned in the first post in this thread.

Let’s fill the space with pseudorandom data (seeded by the section id) in a way that the placeholder data for each storage block depends on some data from all previous blocks, or at least a sufficiently long pseudorandom list of such blocks. This makes it impossible to quickly recalculate a storage block without actually having all the free space before it.

Chunks are always stored in the last free storage blocks so as to not overwrite data required to regenerate the placeholder data for later blocks. Similarly, newly released blocks are moved to the end of the free space and then their placeholder data is recalculated from their preceding free blocks. (None of this requires actual copying, just a linked list of free blocks.)

Checking for free space is done by asking for the values of a list of positions within the free space the vault claims, and then comparing the answer to what the asking vault knows as the correct value.

It is a spot-check so vaults can cache a long list of positions and their correct values at initialization so as to allow for checks about blocks that the asking vault already uses for storage. The vault will then slowly “eat away” on this cached list during its lifetime. The cache will contain samples biased towards the end of the storage space because it’s expected that more of the earlier values (not yet used for storage) will be available for sampling directly, thus saving the more valuable cached samples.

The downside to this idea is that only vaults that are at least the size of the claimed free space of the target vault can verify that claim. This may not be such a show stopper as it may sound as I expect that even the largest vault won’t have more free space available than the total storage space of the smallest vault.

Potential attack: If someone owns multiple vaults in the same section, they can use a shared copy of the pseudorandom fill and thus pretend to have a multiple of the space they actually have. This attack probably applies to all free-space checking methods though.


Proof Of Storage prototype
#77

I side with the idea of pre-paid storage, but maybe it’s as simple as setting a particular wallet to back storage requests. No more than what I have in that wallet can be spent on storage and it’s trivial to convert the amount of coins to an approximate number of MBs or GBs based on the most recent amounts paid for storage.


#78

We are assuming 2/3 nodes are honest. This is where the median is nice. Also if a node says it has X space then we can as a group know it should be able to save that amount or be penalised for not doing so.

Not quite sure how that would work or what you mean here @mav can you elaborate a bit, please.

I think the section should be able to store exactly the amount a vault says/claims it has. Not super simple, but certainly doable and we can confirm it has stored with easy proofs i.e. we take a chunk it says it has, we prepend some data, hash it and ask the node to do the same and give us the result. That is a simple proof that requires little work and is hard (likely impossible today) to defeat.

I certainly do not disagree with you here.


#79

By target size I mean how we define ‘normal’ in the phrase ‘return to normal’.

By minimum size I understand that to mean the way vaults may be excluded / killed to maintain a degree of quality.

The two are quite similar in operation but I feel they have important differences in the way of thinking about the algorithm.

Setting a minimum implies continual increase (since below minimum is ‘undesirable’). Target implies variation is to be expected but gradually corrected. Minimum requires us to say what we don’t want, target requires us to consider what we do want.

Having typed a bunch of thought experiments trying to break it, I think vote for minimum works fine. I kept some that are useful provocations and still a bit unclear to me:

There’s no consequence for false voting. If 51% of vaults vote for 100 PB minimum, the only effect is that some unaware newcomers get excluded because they voted sensibly. Once they wise-up to the trick they also start voting 100 PB just so they can be included. There’s no value to the voting mechanic any more.

People will want to vote extremely high because it’s a way to (temporarily) exclude newcomers unaware of the trick and (temporarily) retain more of the action to themselves.

If 51% of vaults vote 1 MB minimum (and their section is already storing 100 GB) what happens? Presumably nothing since anyone could pass the minimum. It effectively removes the purpose of the vote and turns it into busy work.

I don’t think vault size can directly be controlled / measured, but it can be indirectly controlled by the rate that vaults join / depart since this determines chunk distribution. Should votes be for size or should they be for rules of membership? Is it the same thing? Which way of thinking about it is most useful for a) doing design b) doing implementation? Probably the two ideas of size and membership are both controlled by one mechanism. But it’s hard to talk about if it’s still in a 2 x 2 matrix of ideas x mechanisms. I guess I haven’t collapsed the 2x2 matrix into a single mechanism in my mind yet, even though I accept it probably will do so eventually.


Related to votes, monero has a vote mechanism for block size. Great explanation of the mechanism and an intense research document about the mechanism.

This is an interesting precedent to look into. Do most miners utilize the vote or just go with the non-vote default? Why? One of the main mechanisms is there’s a logical tradeoff built in - larger blocks trade off a known reduction in block reward now for an unknown increase in transaction fees later. And of course also the underlying principle that larger blocks are ultimately an exclusionary force.

I really like voting mechanics as a general idea. But should (and how would) the money=vote vs person=vote dynamic be managed?


Is this still vulnerable to time inversions? I guess all previous blocks is intended to prevent this.

The generator must be able to operate on large data, since processing 1GB, then 1.001GB, then 1.002GB is not so bad but processing 500GB, then 500.001GB, then 500.002GB etc could be quite a labor when it’s only intended to prove space not computation.

Here’s a quick explanation of the time inversion attack because it’s an elegant thing and I think people would enjoy understanding it without all the maths from the chia papers: I’m asked to prove I can store 10 chunks of data. The tester gets me to store a seed chunk then generate the next 10 chunks after it, where each new chunk depends on the prior chunk(s). But I cheat and after I store the 5th chunk I throw away chunks 2, 3 and 4. I only store chunks 1 and 5. If I’m asked to prove I have 10 chunks of storage by providing random chunk number 7, I use the value of chunk 5 + some calculation to derive chunk 7. So instead of storing 10 chunks of data I only stored 2 chunks plus used some calculation time. All proof of storage mechanics can be reduced by some factor depending on the derivation scheme, the computational power available to the attacker, and the timeout for the proof. Claiming I stored 100 GB might actually mean I stored 1 GB with some powerful computation to fill in the gaps. The tester can never know the truth since it all happens on the prover machine.

And ‘computation’ might also be replaced with ‘bandwidth’ since the prover can ask a friend storing the same value to provide it. This is where sacrificial chunks face a (mild) challenge since the prover can possibly use the primary chunk to pretend they have the sacrificial chunk when really they don’t.


#80

Let’s say each block depends on 1,000 blocks from before, but which 1,000 blocks those are depends on which block it is. If consecutive blocks don’t share many or any of the blocks that they depend on then any sufficiently long series of blocks will require all of the previous blocks to exist. Recomputing them on they fly would become prohibitively costly.

And that is the point. It’s not important to make cheating impossible, it’s enough if it’s cheaper to actually have the claimed amount of free space than to lie about it..

The seed for the algorithm would depend on the section ID so that section members can verify each other. This makes this kind of cheating impossible unless one can place two or more vaults in the same section.


Proof Of Storage prototype