- more hashes possible than atoms in the universe
- Deduplication makes same file upload by different users only stored once
- Uploading costs safecoins, will be more costly if little storage is available
- Garbage collectors will be implemented at some point
One thing that could happen though is a sudden dip in network storage capacity, e.g. Needs to store 50PB, but has capability to store 100PB, but then something happens so that storage capability suddenly drops to 30PB, anyone knows how the network would handle that?
There is a clear risk that the hunger for data is vastly understimated. Nothing so far invented would come close to SafeNet storage. No data ever lost, instead of keeping track of 4-5 computers, 10 memory cards and trying to make sure nothing is lost when upgrading phone, PC or tablet. You would know that what you store is there and private forever and no hassle ever again. And this hassle we have now in itself has caused people to store way less digital data than they would if it worked like SafeNet will.
This fact alone may suprise everyone and the hunger for data could be huge. From what I can gather, if I buy a harddrive with 500 Gbyte of storage and make it entirely available to Safenet, I will get safecoins ever so often when someone gets data. I will surely get more Safecoin from making my 500Gbyte drive available to the net than I would pay to store my 500 Gbyte of homemade porn on SafeNet. Thus it would always be cheaper for me to store my data on SafeNet and use my harddrive to gain SafeCoin.
Today its not that people think its expensive to buy harddrives. It is not the price of buying harddrives, memsticks and memcards that keeps people from storing more of what they film with their mobile phones. Not at all infact. It is the hassle to move all the movies when they upgrade their phone or pc. Its the hassle to buy and install a new harddrive. Its the hassle to connect to whatever storage devices they own. This is the reasons that peoples data collection is kept small, not the price of buying storage. Thus Safenet storage will be both cheaper and more convenient. The amount of data the average person keeps stored will skyrocket as an effect.
I would also like to point out that VR headsets will redefine personal data completly. The biggest impact of VR headsets will be a higher value on personal data. Today data is stored in a way that make self produced digital data feel less personal and less attached. VR headsets will change all this since they will make the sorting of personal data much more inuitive, much more akin to how our brains work. The effect will be more personal data stored. Infact this is the killer app for VR headsets.
The way they will make our personal digital data feel will change everything. When you put on your VR headset you will sort the pictures you took today with you phone by simply dragging them from your phone into a book you just marked summer of 16. You will sit infront of yout VR coffetable, having your phone and a book on that table. And store you pictures by dragging them into the book. Then you will close the book, mark it and go place in in one of your bookcases. Movies you have, you will fecth from you VR movie collection and watch on you 65 inch VR screen while also checking the VR intepretation of your phone on you VR coffetable infront of you. This will change everything when it comes to personal digital data. It will feel real and important on a different level because it will be intuitive. And that is in stark contrast of how it is today. The effect when combined with cloud storage will change our world, and it will happen soon and fast.
The price of storage will reflect supply vs demand.
If people want to store more, they will be able to, but only at a price which is sustainable.
As for behavior changes re data storage, this is speculative. What we do know is that people have lots of spare storage, which is idle until it is eventually filled up. This spare storage could be put to good use on safe net.
I think there was some discussion of archive nodes that will specialise in holding large quantities of basically inactive chunks - the kind of data people store but rarely or never request - those archive nodes will likely run on the old rotating platter hard disks which have multi terabyte capacity at mass market prices already. For the kind of data that is in high demand, regular vaults will cache it, and they will use solid state drives to provide fast access.
So, a combination of standard and archive vaults should provide near unlimited capacity whilst retaining fast access to popular files, so long as the incentive system works as intended
Yes. I am sure price of storage will be cheap enough that noone would think twice about storing data. But debating if that is a correct assumption is not what I want right now.
Could someone better at math and more knowledgable than me on SafeNet give me any number on the risk of duplicate hashes if SafeNet in a few years would have a storage of somwhere around 160 trillion chunks? At what number of total chunks would we start seeing a problem with duplicate hashes? How many chunks could SafeNet store at present tech implementation before the risk of duplicate hashes would break it all down?
@Dirk83 just answered this above.
The hash address space is mind bogglingly massive.
This question comes up every so often.
You might want to read that topic
I read through the thread you linked to. I see no info on how many chunks the current implementation of SafeNet can handle before duplicate hashes corrupt to much data. I also can not understand that the number of unique hashes is a greater number than the number of atoms in the universe. Can someone link me something that explains this or atleast somthing that verifies this to be true?
Above all, I would like a number. How many chunks can the present code handle before the number of duplicate hashes would be to large to consider further data storage practical?
Anyone that says this is not a problem should be able to give me this number. How many chunks? If you dont know how many chunks then how can you know it is not a problem? When do we have so many chunks on the net that we get atleast a one in a hundred chance that one chunk in a 1,3 Gbyte uploaded file hashes to the same value that already is out there and present on the net? (Even though its different data obviously)
Your question is giving my desktop calculator a really bad morning, if this helps at all - the original white paper mentioned various hashing algorithms that could be used, the version of the safe vault on this weeks test network is storing chunks with 88 character file names which leads me to believe that they are currently using SHA-512 as the naming hash, so what you’re really asking is what is the probability of a hash collision using SHA-512 for a given quantity - say your example of 160 trillion different (1MB) chunks…
The result, my calculator shows all zeros, this just means you need to use a better calculator than i have access to, this old article http://www.backupcentral.com/mr-backup-blog-mainmenu-47/13-mr-backup-blog/145-de-dupe-hash-collisions.html talking about database collisions when using hashes as identifiers came up with a minimum of 71 Quadrillion blocks to get a clash.
One additional note, when the SAFE network chunks a file it includes a measure of redundancy so that if a small subset of the blocks you are requesting are lost in transit it’s still possible to reassemble the original state of the file, this could be of use in the unlikely situation where a hash collision resulted in one of your requested blocks containing another files data, the reassembly process would (presumably?) just interpret the data that didn’t fit as being corrupted and discard it in favor of the backup data.
Unique hashes is 2^512 = 1,340781e154…
Every packet itself should be unique because its data is tied to local account data, time packet seq, etc hashed together…
@neo is better in this I believe maybe he can help/correct…
That is right
2^512 is approx 10^154
10^80 == # of atoms in visible universe
So for a hash to be guaranteed unique would require all data is shorter than the hash length. The “secure” part of hash algorithms like SHA etc. are the fact they should not collide, hash is not secure in terms of encryption etc. it’s not encryption or compression. It is purely a “fixed length unique” representation of any data.
A good way to think of a hash technique is that
given any random input a hash will produce results that are evenly distributed across an address range that is a binary tree representing length of hash
I hope that makes some sense. Collisions can happen, the more secure the hash the less likely a collision will happen, add in lengths being limited then the attack vector or error area is much reduced.
Thanks, good answers. Frostbyte, dirvine and 4M8B gave answers that clear things up an makes me feel much better. The counter-intuitivity of the birthday paradox where the chance is 50% of two persons having the same birthday already when 23 persons are in a room still would make it interseting to calculate how much every human on earth could store on Safenet in Terabytes before we would run into problems.
I for one would love if someone with the skills could come up with the number of Terabytes each and every human could store before we would run into problems. I would find that number interesting and possibly good to have to explain SafeNet when it goes live.
This is much harder than it seems. There will not be a number but a probability of falling within a range. It very much depends on how much is deduplicated etc. on one side and how much unique (live stream your life etc.) type data on the other. All this is also linked with number of nodes of what quality and how long they are on line etc.
The way this is worked around atm is the network charges more to store more with less resources or less with more resources.
There are several things to consider
- Time to upload/pay verses quantity of data stored.
- Should old data die? perhaps named immutable data is valid (we we’re discussing this in house, but early discussions yet).
- Is there a valid tertiary storage requirement (archive nodes ++) etc.
Good news all of this can be handled via evolution of the algorithms and does not all need answered on day1. We do need to have routes open to allow all this though and this is where correct code and strong types matters. Fast rushed messy code would die in this situation. We are not far form comfort here, but always a way to go. The track is right though.
Don´t think you´d be able to do so. If space was scarce you may have to pay 1 USD of SAFE value for one MB. How much would you upload then?
EDIT: oops, didn´t saw the answers for some strange reasons. Has been answered sufficiently, I guess
Old news and not collapse but some missing data did not bring Dropbox to its knees. Not so worried if some goes missing at a point on the Safe Network too. http://www.zdnet.com/article/dropbox-sync-glitch-results-in-lost-data-for-some-subscribers/
I for one would love it if you would do that. It doesn’t require special skills, it is just arithmetic and a willingness to do a bit of research.
The amount of data produced isn’t constrained by anything other than our capacity to capture and store it, and that is increasing exponentially all of the time… So as soon as you find an answer it will be wrong…
I don’t think SAFE’s economics are sound. I don’t think it has the ability to create economic back-pressure in all of the places it may need to create economic back-pressure… If it is lucky Moore’s law and abundance of resources will make it a non-issue. If not, then it could succeed to death… Time will tell…
To answer just the question in the OP, if we use SHA-512 (as it is now I think) and we assume it’s a “good” hash (its values are evenly distributed) then the chances for a collision of two blobs are 1 in 2^256 (1/115792089237316195423570985008687907853269984665640564039457584007913129639936) and that is very unlikely to happen within the lifetime of the universe. I’ll try to get the birthday-paradox answer as well; just gimme a second.
Sometimes it’s said that when a probability is smaller than the chances of getting killed by a meteorite, it’s time to move on and worry about something else
UPDATE: based on the p2/22n+1 approximate formula from stackoverflow: the global chance for a collision with p = 160 trillion messages and SHA-512 (n = 512) is around 9.5e-127. With SHA-256, it is “only” 1.1e-49 (0.00000000000000000000000000000000000000000000000011).
The number for SHA-512, in everyday format, just for kicks: 0.00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000095
To spice it up even more, let’s assume a trillion times more blocks, and “only” SHA-256: the chances that the hashes of any of our 160 trillion trillion messages collide is around 0.00000000000000000000000011. I think we’re safe on this front.
Thanks for the reply. Very good to know that this is not an issue.