Current unsolved SAFE questions

#1

In a recent conversation with @neo I was talking about how the SAFE network is (to me) the ultimate nerd snipe (fatal cerebral distraction) full of really interesting catchy problems.

Many of those problems are solved or well on the way, but there’s still a lot of room for investigation and snipery.

If you know any nerds, maybe this list of unresolved questions will appeal to them.

Network Topology

Should relocation of vaults be to the most-in-need section, or should it be random?

How many nodes should sections contain, and what portion of those should contribute to consensus?

If consensus were to be partitioned (eg network membership consensus operates separately to data mutation consensus) what would that look like?

Is there an optimum secure messaging strategy?

Does the network naturally centralize and what causes this? For example, does latency cause section vaults to gradually become geographically concentrated?

How much impact does churn have on network operations and how far can the effect be minimised (or rather, end-user-related operations maximised)?

Is there a maximum network size?

With respect to an attacker flooding the network with malicious nodes, at what rate does more vaults joining the network become dangerous? How does this change under differing initial conditions?

What is the difference between a healthy network and an unhealthy network?

Safecoin algorithm

Given the existing proposal in RFC-0012, what is the likely short-term, mid-term and long-term distribution of safecoin likely to be?

Can RFC-0012 be improved and how so?

What are the compromises being negotiated in the creation vs spending of safecoin?

Should safecoin parameters be adjusted periodically and if so how should that be done?

How should pricing of storage be calculated?

What’s the best way to make safecoin divisible?

Growth and Development

Bitcoin led to the development of SHA256 ASICs. What technology will a successful SAFE network give rise to that would otherwise go undeveloped?

What portion of vaults will be operated by home users compared with professional farmers?

What is the likely capacity of the network (spare / unused disk space) and how much will it differ from the optimal farming capacity?

At what rate will data be uploaded, by who, and why?

How can vaults be upgraded and what happens when there’s a bug?

Can the network be forked?

If a disaster turns off a significant part of the network (eg 25%) for one hour then those vaults come back online, how quickly can the network return to normal operation? Can this be generalised for X% and Y hours?

User Experience

Can secure key management be simple for users to perform?

How much SAFE usage will be done in the web browser compared to apps?

What will the experience be for users who decide they want to take the first non-read-only step, eg sending an email or upvoting a reddit post? What will their onboarding experience look like?

Can identities be managed in a trustless way? How sure can I be that who I’m communicating with really is that person, eg my bank?

How much can SAFE add to operational security compared to existing systems like TOR?

Can bitcoin or ethereum run on SAFE?

What is the optimum amount of parallel processing a client can do on the network?

How can computation over massive data sets be done without locally caching that dataset?

How will push notifications be implemented?


Think you have some answers? Got some questions of your own? I’d be keen to hear.

31 Likes

#2

I think you could expand this to include calls and chat. OR do you mean internode messages as part of the SAFE protocols???

For non SAFE internal messaging
My thoughts are secure messaging as single messages is for the “normal” hopping to occur. ie keep it simple and use the same process as for requests and responses by clients.

For chats, gaming, calls, etc then my preferences are for
client <-> relay <----> relay <-> client
For this the relays have to be close (lag time) to the respective clients and need to also choose relays that have good bandwidth (ie good in respect to slowest of the clients)

This means that a bad actor acting as a “client” cannot easily discover the other client process, but also means that 2 clients will get reasonable response times.

Maybe allow another where the clients are given the IP addresses of each other and talk encrypted directly. Probably greatest usage would be for gaming.

So 3 levels for differing purposes.

OR even do it as specifying the relays. Each client can specify if they want a relay node or not.

I would think the network has to be built with practical infinite size in mind. By this I mean not infinite but when considering the foreseeable future its practically indistinguishable to infinite. For example the 256 address range is practically infinite for real life purposes, yet is certainly not mathematically infinite.

A derivate of Fraser’s method seems to be the best at the moment. That is without the silly 1/4 of 10^-9 and keep to the pure decimal that all/most humans on the earth are taught. It will bring confusion to most non-binary thinking people.

As many as practical should be home users. That is after all one of the early concepts for SAFE. But if the current experienced 50 or more Mbits/sec resource check is kept them its unlikely there will be many for at least 5 - 10 years.

You should define “forked”

  • software forked but still use the same underlying network
  • New complete network with same or modified software.

These are two completely different types of fork. For SAFE fork is really a different thing to blockchain forking.

But the answer is already yes for both scenarios. Each test net is essentially a network & software fork combined. The network ID determines a network fork.

Better be.

Certainly not as they run now.

To use safe as the full back end for a miner would slow them down way too much. For a slow activity node then it should be fine but a high activity node may have problems

I would think that there needs to be a change to the software so that caching on the computer is performed to provide the speed and SAFE can be used for the unchanging blockchain database. Dedup would mean that there is only one actual copy of the data based stored.

Then changes to the messaging between nodes to utilise SAFE messaging would also be required.

Maybe this can be done with a special “block chain” layer that intercepts the requests for these various services to incorporate caching and to direct messages across SAFE.

TOR needs people to be exit nodes and these exit nodes can view users who send unencrypted data over TOR. Whereas SAFE encrypts without the browser or user doing anything. TOR requires setting browser to https only and relies on apps being ultra careful and specifically use TOR channels. SAFE Apps will have to specifically bypass SAFE.

EDIT:

nerd snipe
I worked out a solution to this. But you have to make some assumptions, including but not limited to the effects on the speed of electricity flow (~ 1/2 speed of light), the ability for electrons to actually flow when potential differences are so low.
.
Between the 2 points is ( 4/pi - 1/2 ) ohms.
.
So now nerds can say Meh I know the answer and live a full and healthy life
.
And for anyone interested
across one resister it is (1/2) ohms
diagonally opposite is ( 2/pi ) ohms
.

10 Likes

#3

This may depend on how much it is possible to earn by farming. It has to be lucrative deal, because the concept of “excess disk space” kind of doesn’t exist in my experience as a digital device user. No matter how much there is, it is eventually going to be full. If the earnings are not significant enough, the nuisance of having to re-arrange your files sooner could override the motivation to farm.

Professionals on the other hand dedicate all the space for farming and are in general more willing to make effort for that, since it is their profession, way to make a living.

Then again, what is a pocket money in one part of the globe is a months salary somewhere else. So if there is good enough connections and devices in a poor country, there might be much more home farmers there.

Maybe it is going to be roughly: industrial farming north, peasant farming south?

1 Like

#4

I beg to differ. For some this is true, but in my experience of fixing people’s computers (friends, relatives. word of mouth) its a mixed bag. Ever since desktops came out with 1TB drives I found it rare amongst non-geeks etc that the drive was mostly empty, rarely above 200 or 300GB and of those above 300GB some dropped below 300GB after a disk cleanup.

The point is that there is a great percentage of users out there who have bought a desktop/laptop with more than 1TB of drive space that have over 50% free. This is because now there is such a range of people buying computers. To surf the web, for their kids schooling, doing the companies books and so on. These people almost always use less than 1/3 of the modern disks. And with 2-4TB drives becoming the new norm this free space will be even greater.

Some of these people can backup their data on a 64GB usb drive or SD card. And most can on a 128GB stick/card.

Some though with a passion for photographs is another group that will use up disk space, but a lot of those have USB drives 1-2TB which they use to hold their photos rather than keep them all on the computer. These I expect will store their photos on SAFE rather than on their computer drive. Thus plenty of space for farming.

Yes there is a number of other people like ourselves that never have enough space, but are are becoming the minority of desktop/laptop users.

Oh and with 6TB becoming well and truly affordable the issue of space is becoming a non-issue even for geeks. And there is the option of the 14TB drives for those of us who use a lot. Honestly home drive space in 5 years will not be a sticking point for home vaults.

4 Likes

#5

For this one, I think each mutable data would need an associated list of subscribers. Maybe this could be stored as part of the MD meta data (requires removal of MD size cap, which was recently mentioned as a possibility on dev forum). Subscribers would indicate if they are interested in any changes in the whole mutable data, or just a certain key (so basically some filtering criteria to minimize network overhead. Keep filtering simple so as not to provide high overhead for vaults / open vault DOS attack vectors).
Subscribers would pay a fee when setting up or renewing a subscription, which would be good for X mutation notifications. If that is too contrary to the “free GETs” ethos, maybe a notification would be just treated as a GET? It would be preferable to having a bunch of polling going on after all…but what would stop spamming the subscription list?

2 Likes

#6

Random will work if the network is big enough because temporary inefficiencies will get smoothed out.

Sounds like a valid concern. Vaults farther have a higher probability of timeout due to network glitches, for example. It may be mitigated by allowing for multiple strikes, discounting clustered timeouts, and having a (probably self-adjusting) threshold that’s well over the round-trip we expect for the farthest vaults.

That would go against the self-governing nature of the network. Either the parameters need to be got right at first, or there needs to be a way to introduce a new method (instead of new parameters) to adjust to the new circumstances.

I liked the idea of coin denominations. I saw it explained somewhere but I can’t remember who brought it up. We’d start with a very few coins with really big face value (for example 10,000 safecoin) and they would be followed by longer and longer “stripes” of smaller and smaller valued coins, so the value of a coin would become smaller as its address got bigger, on average.

Depending on the current value of storage, the network could select which address range to pick the rewards from. There was also an argument that even an infinite supply is not necessary to cause inflation because we’d just have to grow the number of smaller coins slower than the rate their value goes lower.

This is strongly connected to divisibility.

Rewards are discrete, coin or no coin. With fixed valued coins, after the network is big and there are millions of them, miners would have to wait for months or years before they would finally get their reward as a single coin that’s worth a lot. It’s easy to see how fixed value coins favor centralization when the network is big.

With coin denominations, we would have a fixed frequency of rewards with coins of smaller numerical value, so even individual miners would get “something” often enough that it would be worth running a vault.

Real life identities cannot be managed in a trustless way because trust is how we define identity: "I trust you are whom you say you are."

We’ll need a way to represent trust in the network, and I believe it will be through chains of real-life relationships expressed within the RDF graph.

For your example, my friend-of-a-friend-of-a-friend may be one of the employees of the bank who knows his manager who knows her manager who knows the CEO. Now, if the CEO says “this is the bank’s account” you can probably trust it really is. If enough of this chain is supported by corroborating evidence through a dense social network, that’s probably as much trust as we can hope for.

It must be or there will be no Safe Network to talk about. So, I’ll use this opportunity to push for my evil agenda of capability based access control again beacuse it’s the Right Thing©.

Users could give access to their data to their apps or their friends in what’s similar to everyday language in both form and implications:

  • “let this photo app add new pictures to my photos folder”
  • “let this group of my friends look at my holiday pictures for the next two weeks”

Users could be certain that nothing else is implied, only what’s explicitly stated (principle of least privilege).

This type of access control, that includes delegating rights without side effects, is impossible to do with ACLs.

As far as I know, mobile app stores don’t usually allow apps that download code. Stuff that’s running in the browser are exception, probably because they are sandboxed.

Everything can be done in the browser at native speed nowadays with WebAssembly so I assume all apps will run in the browser, at least eventually, after devs will have given up on experimenting with other things. I may be wrong but I would be very surprised.

8 Likes

#7

[quote=“Toivo, post:3, topic:28228”]
This may depend on how much it is possible to earn by farming. It has to be lucrative deal, because the concept of “excess disk space” kind of doesn’t exist in my experience as a digital device user. No matter how much there is, it is eventually going to be full. If the earnings are not significant enough, the nuisance of having to re-arrange your files sooner could override the motivation to farm.[/quote]

tLA1EhYl

… your files will be stored on the SAFENet.

6 Likes

#8

It’s not either-or. My simulations showed that a good algorithm for network growth and homogeneity is a random hop followed by another hop to the section the most in need in the neighborhood of the target of the first hop.

A small percentage of local relocations improves even further the network homogeneity.

10 Likes

#9

Interesting perspective. My understanding of the current default-opinion of why storage is better than POW is that production of hdd units is scarce but within each unit the disk space is an abundant resource. The abundance is what will allow SAFE to be cheap. I like the way you contrast these two dynamics here of ‘production and distribution’ vs ‘consumption’ with respect to scarcity/abundance.

I think lots of free space is also becoming more common thanks to the increasing popularity of streaming. SAFE allows even more data to become streamed (eg personal photos are currently stored on phones rather than streamed) so maybe free disk space will become even more abundant once SAFE is live…?

Does this imply some necessity to avoid extremely large vaults? No good having 1/3 spare of your 4TB drive if each vault is required to store 5 TB.

Sorry I should clarify I mean automated adjustments like bitcoin difficulty, not human intervention via software patches or configurations.

Do you consider bitcoin to have done a good job with secure key management?

2 Likes

#10

That’s not a good comparison.

Identities, relationships, access rights and their delegation to apps and users are crucial for the Safe Network to fulfill its role as the “new internet”.

Bitcoin started with a much more restricted scope (even less than what it’s been shoehorned into since its inception) and yet we ended up with HD wallets because there was a need for better key management.

3 Likes

#11

I certainly fall into this category, though not by passion, but by profession. But I don’t like USB drives, I wish to keep it all in one place, and have backups on separate disks of course. Hope that SAFE will solve this :wink:

Or then manufacturers will start to make smaller disks. Or larger, so that they can sell the devices as earning machines. Pure speculation on my part, difficult to know.

My point in general is that the “actual excess” is different from “experienced excess”. I don’t know how widely my views can be generalized, but I’m lazy and tend to follow the path of least effort. That’s why I don’t really experience the excess of free space, though it may actually be there. And if SAFE would become huge success I might get rich enough to not care about every penny I could make by farming. Some humming noise or a blinking light in my home might be just too much. But it wouldn’t be for those who really have to collect every penny. That’s why I think that the global distribution of home vaults could be affected by the local average income vs. connectivity. Say one can gain 5-10€ / month by farming with your excess HD space - who cares about the money and who cares about the nuisance it brings?

Let’s say that there are plenty of people who don’t really need that bit of extra, but it is not that much of a nuisance either. Then there spreads a rumor, that the money you gain doesn’t cover the costs. It doesn’t have to be true, just a rumor. This might prompt many people to turn their vault off. Not the professionals of course, because they do their math, but lazy folks might fall into that.

So, one unsolved question is how can SAFE exist in the irrational sphere of human behaviour? I don’t think it can be solved, it is an experiment to come and certainly an exciting one.

2 Likes

#12

Price of a HDD consists of 2 parts, base parts (chasis, controller, citcuit board,…) and actal data storage (magnetic plates, memorychips). First part is almost constant, second part is almost linear with capacity. For every HDD generation, there is certain usefull size range, where going over means paying “early-adopter” price and going below brings you smaller HDD for almost same price.

For running vault (earning money) you need HDD space, conectivity, electricity and few other things we can ignore in this case (CPU, RAM,…). In order from best to worst:

HDD price:

  1. somebody who bought PC for different reason and has unused space
  2. HDD manufacturer
  3. investor, who bought HDD for running vault

Connectivy (throughput, latency):

  1. datacenter
  2. home/company with fiber connection
  3. home/company with ADSL, wifi, satellite connection

Electricity:

  1. somebody who already runs computer 24/7
  2. somebody who runs computer part time and will switch to 24/7 for running vault
  3. investor who run computer only to run vault

Because fastest vault takes the reward, I would say latency is the key. Here are few examples:
datacenter to datacenter in same city …<1ms
home fiber connection to datacenter in same country…8-10 ms
home connection with several wireless hops …30-40 ms
…different country, same continent …+20-40 ms
…different continent…+100-300 ms
spinning HDD access latency …5-10 ms
SSD access latency <1 ms

So, who will earn most? How will typical vault look like? I dont know, but here are few things I think we can deduce:

  1. I dont think vaults will “centralize” in one side of the world or on one continent. At least continentwise distribution of vaults will will copy network usage. Distance latency gives advantage to vaults in locations with many users and less than average number of vaults, so bigger motivation to run vault in such place.

  2. Users with wireless, ADSL and other types of connection with higher latency dont have much chance to earn something running vault. They will be competing with a lot of people with fiber connection on the same continent.

  3. Centralization will heavily depend od price. Higher price means it will be profitable to pay for datacenter space and use the advantahe of low latency. Low price means decentralization to users who run computers for different reason and running vault will cover part of the costs they already have to pay.

  4. Profesional vaults will run as close as posible to internet exchange points in highly populated areas. They will try to have lowest posible latency to highest number of networks (people).

5 Likes

#13

This is not how SAFE will view the times. Unless the whole section’s elders are in one data centre your analysis is incomplete and lead to errors.

The issue is that the elders are scattered. As such in a typical section the elders will be in different locations and countries. This will have the effect of evening out those lag times and it is more where the Vault is located to where the closest majority of elders are located. The closest majority of elders is particular to each vault and will also have an evening effect.

In the end having the fastest rig in the fastest datacentre could still be slower than say a EU home user with a SATA rotating drive that is 4 years old and on a 1Gb/s optic fibre link. And of course may not too.

This will mean that the home user does have a better chance of being faster than what it first appears to be.

Also the actual metrics for determining who is paid has not been decided yet. It has been said that pure speed may not be the main metric.

Also it was thought that if say Fraser’s division is implemented than all vaults responding with the chunk may get an appropriate portion of the reward.

The point is that its not clear cut in relative lag times (for fastest), in how the “winning” vault is determined, and in how reward may be paid out (maybe proportioned out).

Also there are ways to assist home vaults in competing for the rewards too. An original goal was for it to be home users using spare resources and so I’d expect that the algorithms will be tailored for this. Obviously a resource/capability test has to be passed before coming a vault since network performance is a important. Although absolute speed is not the only consideration when creating a fair scenario.

5 Likes

#14

Maybe, if the compute layer of SAFE ended up being each section running something like an Ethereum shard. A Bitcoin SPV can run in an Ethereum contact. Or taking it further, a token can be mined directly from a contract.
In the nearer term, SAFE could function as something akin to the Blockstream satellites, providing an alternative, hard to detect, means of block propagation.
SAFE can also help with bootstrapping nodes through sharing of peers files.
With lightning network, one of the key challenges seems to be backing up the state of channels. People have accidentally broadcast old channel states, which results in loss of all funds in the channel. Maybe running your lighting node on a SAFE virtual disk would help with this issue. Or maybe the latency of the drive would be too much for that application, especially for large, well connected lightning nodes.

2 Likes

#15

Having a Bitcoin SPV in SAFE would probably be useful for migrating MAID to safecoin actually…

3 Likes

#16

Some history of how updates have been received by clients on the regular web (chronology is roughly in line with my own usage patterns, not necessarily their time of invention):

First we had manually pressing the refresh button in the browser.

Then javascript allowed the window.location to be set periodically which would automatically reload the whole page with the new data.

Then we had ajax using XmlHttpRequest which allowed fetching data in the background and updating just a portion of the page. This is like ‘invisible’ polling.

Then we had ajax long polling, where the server would receive a request, keep it open, and only when new data was available send the response. This allows data to be received as soon as it’s available (instead of at the polling interval), but the server must be able to hold onto many open connections for a reasonably long time. It reduced the total number of requests but increased the total amount of time spent requesting. Main benefit was live update when the events happened.

Then websockets came along and allowed the connection to remain open and two-way. The server could send the client data any time without closing the connection.

Then webrtc came along which allowed client-to-client communications and improved the performance of data transfer over unreliable or low quality connections.

Then there were webhooks which turned servers into clients, having the server make a request whenever a specific event happened. The difficulty is the endpoint needed to be able to be clearly specified for long intervals (easy for servers, hard for most clients), and there’s some extra handling of error conditions and unsubscribing etc. This is mostly a server-to-server solution.

Then there were server sent events. I haven’t used them so can’t really comment on the use case or benefits or drawbacks.

I don’t see a pay-to-subscribe option working for SAFE. People would just use polling instead since free+often should be cheaper than pay+sporadic.

I can easily imagine a long-poll type situation where the GET request is for a later version of the MD than is currently available, eg MD is currently at version 100, I send a GET for MDv101, and the vault holds onto the routing request until that version becomes available then responds. How does this affect memory consumption and timeouts for vaults etc… I don’t know but this model seems possibly easy to incorporate rather than using new protocols or routing algorithms.

6 Likes

#17

I think you’re right.

Side note - we always say that GETs are free, but really they are paid for via inflation (paid by safecoin holders, which the GET-er may not even be; it’s the price we pay for easy network accessibility)

1 Like

#18

Again with my very limited understanding of the project still surprised by:

Can the network be forked
What to do if there is a vault bug
How to parallel compute without local data cache
How much more secure than Tor

0 Likes