Can the SAFE network replace the current internet?


#10

Thanks in advance for your patience with this thought exercise. A quick follow up question…

Theoretically speaking, if one or two groups are that powerful (e.g., ~30% capacity), could they collude to wreak havoc kind of like Bitcoin miners have done (e.g., threaten to go offline all at once unless certain stipulations are met)? In the mid- to long-term, the intersection of supply, demand, and price would restore stasis to the network, but in the short term, how might the network react/cope?


#11

Since the data is distributed approximately randomly across all sections then this would be an impossible situation. On the other hand if one or two data centres had that capacity then look up the “google attack” topic for a lot of good discussions about if an attacker tried to do this.

Since one of Maidsafe’s goals is for home farming then I am confident that the reward structure will be such that data centre farming with them having to pay for all their inputs is unable to compete successfully on a global situation against the home vaults with their 100Mbit/s to 1Gbps to 10Gbps bandwidths. So while they could turn a profit they would still be looking to getting income from renting their services out to those still needing them as that would be more profitable. Thus only farming to use their “hot” spare capacity

Also the distribution of home vaults across all the ISPs means that bandwidth clog points are more distributed than massive data centres with 10s or 100s of thousands vaults all requiring usage of their limited number of 10 or 40Gbps links to the backbone Whereas 10s or 100s of thousands of home vaults are spread across 100’s of countries with 10’s to 100’s of ISPs each and their backbones into the internet.


#12

That was a nice set of approximations to see played out, thanks mav.

A more recent set of figures can be found in IDC’s Data Age 2025 study, sponsored by Seagate, April 2017. They estimate that the current total amount of digital data is somewhere around 25ZB and will go to 160ZB in 2025. Probably will go higher. I wonder if these figures include redundancy and data duplication? Probably no and yes. So 8x redundancy coupled with 8x reduction due to deduplication probably makes this a reasonable estimate

jlpell’s other guestimations:
Most dedicated desktop users who want to get involved will throw 500GB to 10TB at SAFE on launch. Prosumers will throw about 10TB to 100TB and mobile, timid, or just curious will be about 10GB to 100GB. Business ventures will be in PB. Depending on timelines, these numbers might be higher by a factor of 2x. I don’t think storage will be the issue, and the surplus storage will allow for extra redundancy to help spread out bandwidth load. My hypothesis is that working with ISPs and forming new ones or mesh networks in order to get low latency and stable connectivity in an evolving regulatory landscape will be more difficult as popularity rises.

Yes, from a basic user’s point of view no one wants to sit and wait for a 500GB download. I think about 1 hour is a psychological limit for most, before they need to start seeing some kind of safecoin flow their way no matter how small. Current typical broadband speeds allow for a single 10GB vault to be filled in about an hour. Multiple 10GB vaults could be run in series to fill up a 1TB drive, but there are limits to going the multi-vault route based on the number of processor cores and computational requirements for each node process.

Tiered groups ranked by performance level might alleviate this, ie. smaller groups of higher performance nodes vs. large groups of lower performance nodes. This keeps google competing with amazon, and us competing with each other. Kind of like weight classes in sumo wrestling :smile:. Computation is going to need nodes that are clustered by performance level anyway, although nothing says that computation nodes and storage nodes need to coincide on the same machine.


#13

My first thought is how much of that is backup data. I’d expect that a lot of big businesses have a number of full copies of their data and how much is 5, 10, 15 years old with the latest backup containing all the data stored on the previous data plus changes. Then of course all the incremental backups.

Does that figure include the RAID parity blocks?

Maybe then we could half the 25ZB figure when excluding all the backup media and RAID parity blocks.


#14

I do not like FaceBook, Twitter etc etc. This is why I join safenet. Thanks for the calculations, and explanations, it help a lot. Thank you.


#15

How much of that was pointless data only of value to Facebook and their advertisers?


#16

It’ll never ever replace this is what people don’t understand it’s a layer. Just because we have airplanes we don’t replace cars, same goes with this. And this is why creating conduits via browsers and especially existing browsers but also on clearnet websites and apps is critical. It’s what’s going to define our success. If we can come up with ways to do this we’ll change the world.


#17

Clear net is the unsecure net.

Its also the sponsored captured censored spied upon net.

Layers is Quantum SAFE and SAFE on Soft Radio Mesh. And SAFE on LiFi and SAFE on longer distance line of sight free space optical and SAFE on some of these new open sat nets.


#18

I see some interesting algorithm here.
When a new node joins a section, how will priority be set for data to be stored in it?

First of all, I would assume we want to ensure minimum copies of a chunk.
That would always be priority.
After that, storing new data should be important for at least a couple of reasons:

  1. Providing storage for the network as soon as possible.
  2. Enable faster feedback (rewards) for joining farner, as to increase uptake / minimise dropout.

But if a section is not requiring all nodes to keep a copy, letting a new node take part of the stream of incoming new data, will that mean some older node is missing out (given that they still have free space) - I.e. seeing a decrease in receiving rate? Could be a couple of opposing interests and needs here.
The rationale behind a priority for this is an interesting question, with lots of dynamics.


#19

The rule is simple: a node stores a chunk of data if the address of the node is near the address of the chunk, more precisely if the node belongs to the group of the 8 nearest nodes of the chunk. There isn’t any priority, and this rule is valid for any data (old or new) and any node (old or new).

8 is a parameter of the network that may need to be adapted after some simulations and tests. Also, datachain will add the notion of archive nodes. But beyond those 2 elements, I don’t expect this rule to be fundamentally modified in the future.


#20

How about redundancy? It’s 4x by default, right? 2 TB. Content you mentioned is personal content so deduplication doesn’t help. But maybe the Facebook figures were with redundancy. Then 500 GB is correct. Somebody mentioned RAID like blocks for Safe Network that give more efficient redundancy.

Correct. On to the next question:

Not at all. Most farmers will be small (run one vault), some will be bigger (several vaults), less big (many vaults), few huge (datacenter full of vaults). If we apply Pareto’s Law for back of envelope calculation recursively:

  • 80% data will be served by 20% of farmers.
  • 64% data will be served by 4% of farmers.
  • 51.2% data will be served by 0.8% of farmers.
  • 40.96% data will be served by 0.16% of farmers.
  • 32.77% data will be served by 0.032% of farmers.
  • 26.21% data will be served by 0.0064% of farmers.
  • 20.97% of the data will be served by 0.00128% of the farmers.
  • 16.78% of the data will be served by 0.000256% of the farmers.
  • 13.42% of the data will be served by 0.0000512% of the farmers.
  • 10.74% of the data will be served by 0.00001024% of the farmers.

Some of those in words:

  • Over half of the data will be served by less than 1% of the farmers.
  • Almost a third of the data will be served by less than 1/3000th of the farmers.
  • Over a fourth of the data will be served by less than 1 in 15,000 farmers.
  • Every tenth chunk will come from about 1 in a million farmers.

Averages are useless for this.


#21

This is not what I meant.
The node will have to catch up by receiving data already present in the group. This could take a while if it’s a lot of data. So, would it prioritise to catch up, or would it slice in some new chunks - if so, at what rate?


#22

The way that a new node gets the chunks it is responsible for doesn’t matter because it must get all of them (old ones + new ones added while it was getting the old ones). I would say that the most practical method should be chosen.

I think that current implementation is good: a new node receives from its neighbours the ids of existing chunks it must store. Then the node asks for the complete data from them, id by id and not all at once. This process is slightly parallelized (one request per data holder), and result data is returned asynchronously. This means that the node can receive new data in parallel as soon as it arrives, which is what we want.


#23

Its the pareto principle. 80% don’t do very much while the 20% hit the throttle on the accelerator. Nobody understands why this is the case.


#24

I think you make an important point. In the underlying structure of the network averages matter very much, but the participation by users will (probably) not be averages.

The pareto law is also an assumption which may or may not hold true, but it seems more likely (to me anyhow) to be a closer model for participation than the average law.


#25

Gotta be one or the other :wink: if it’s valuable to someone then it’s valuable, right?!


#26

If you mean it for individual vaults, which hold a set fraction of the network data, then it’s a tentative “yes.”

Tentative, because it matters if two vaults belong to the same farmer: Vaults owned by the same farmer have very fast network connection with a higher probability than otherwise, and if there are extremely large sets of vaults with LAN-speed connection, then the average bandwidth between vaults is meaningless because most traffic will occur at LAN speed but some at a much lower speed, so there may be a bottleneck much below the average speed.

Yes, it’s a likely assumption because everything similar follows the Pareto Law. However:

  1. It’s a model with infinite upper limit on speed and storage, which is unrealistic.
  2. Probably incorrect for small values. However, errors there have no significance.
  3. Uses a specific parameter for the tail exponent. Real value is probably different.

It’s good first approximation for moving the discussion to more realistic grounds.


#27

I love the the idea! I hope it get better browser and standard support. Wake up FF and Google!!!


#28

If someone could tell me, if half of the above is true, what does that mean for the safe coin price? I have nothing to hide - I am here for a profit.


#29

Thats a discussion for the topic we limit price and speculation to. Ask specific questions there. I think yours was a little too general for an answer