Introduction to MaidSafe: what it is, how it works, and how it compares to Bitcoin


#1

15-20 minute read

  1. Preface
  2. Bitcoin
  3. What is the SAFE network?
  4. SAFE Advantages
  5. How does the SAFE network work?
  6. Conclusion
  7. More Resources

Preface

Like many others, the discovery of Bitcoin in early 2013 led me down the rabbit hole of decentralized applications, and got me thinking of a world that enables true freedom of speech, personal privacy, sound money, and much more, all powered by trustless apps, free from human manipulation and corruption. The only problem is that current decentralized applications, including Bitcoin, are built upon an insecure, centralized system: the internet as we know it, and thus they are putting band-aids on a much bigger problem. MaidSafe is an organization I came across recently whose lofty goal is to reverse this approach: build a decentralized, secure internet infrastructure, called the SAFE network, upon which all apps will automatically be secure and decentralized.

I couldn’t find any good easy-to-understand resource that explains the ins-and-outs of SAFE, and the information that does exist is dispersed on various websites, forums, and videos. I’ve seen a lot of misinformation about it floating around, so I hope to not only clarify how it works to others, but to also solidify my own understanding while writing this. Since many researching the SAFE network come from a Bitcoin background, I will bring it up throughout the post and will explain the major differences. Let’s get started.

Bitcoin

Let’s have a quick primer for how bitcoin works (skip this section if you’re already well-versed). Bitcoin is a currency that solves difficult problems inherent to all digital currencies: how could it be that a digital currency is not controlled by any person or organization? If it’s just software, what’s to stop someone from hacking it to make it seem like they have more than they have, or making infinite copies of coins? The genius technology behind it is a public ledger (the blockchain) that everyone has a copy of. In other words, every node in the network is aware of every wallet balance and transaction. So if you attempt to send more money than your wallet (or address) has, not a single node in the Bitcoin network will approve the transaction or include it in the blockchain.

But what if your bitcoin address has 1 bitcoin in it, and you simultaneously send it to two addresses at once? Which one will be approved by the network? It is the Bitcoin miners’ job to approve the transactions and add them to the ledger. Each miner will take a list of recent transactions (and only one of yours) and continuously add random characters to it (a nonce) to compute a unique hash (digital fingerprint) that meets a certain criteria. On average it takes miners 10 minutes of generating literally quadrillions of hashes per second around the world to find one that meets the minimum criteria (difficulty). The consequence is that the transactions that were used to calculate the correct hash are now approved by all the nodes of the network and added to the ledger, and the reward to the miner that discovered the hash is all the transaction fees, as well as freshly minted bitcoins. Because it takes time and hard work for computers to do this, there is enough time for all nodes to reach a consensus on the current state of the ledger, and it’s practically impossible to maliciously reverse transactions that are in it.

What is the SAFE network?

While Bitcoin’s purpose is a currency, the SAFE (Secure Access for Everyone) network, according to its homepage, is “a fully decentralized platform on which application developers can build decentralized applications. The network is made up by individual users who contribute storage, computing power and bandwidth to form a world-wide autonomous system.” Users will be able to store any kind of information in a decentralized manner on the network, whether files, text, websites, or applications. An example application is a basic file-storage service like Dropbox. When you upload a file using an app powered by SAFE, behind the scenes it will break up your file into small chunks, encrypt each chunk such that no one knows what they are, and send them to the network. At least 4 copies of each chunk will be stored on nodes on the network around the globe. If a node or chunk becomes unavailable, the nodes connected to it immediately detect this and make another copy from one of the others. This same process is used for all data stored on the network. This also means that there are not any centralized hosting providers, such as AWS, where your data will be stored. The SAFE network is your cloud provider.

The network will have a single app already built-in: a currency named Safecoin. Like Bitcoin, it can be used to make any kinds of purchases, but more importantly, to facilitate payments for storage and services on the SAFE network. Unlike Bitcoin, however, there is no blockchain. It takes an entirely new approach that will be discussed in detail in the next section. There are absolutely no transaction fees, and transactions are confirmed and irreversible at network speed, usually less than a second. Any users can become farmers that will offer a portion of their hard drives as storage for the network, and will get paid in Safecoin when files are retrieved from their computers. The rank of a farmer’s node(s) increases over time using the factors of availability, disk space, cpu, and speed (bandwidth). In this network, the farmer also acts as the miner of safecoins: the higher the rank of a farmer, the more he will earn freshly minted safecoins. Thus, similarly to Bitcoin, the network is bootstrapped by monetary incentives to provide value to the network. The total number of safecoins that will be created is capped at 4.3 billion.

Although mining plays an important role in Bitcoin, a major downside is that a) it requires expensive, specialized hardware, b) it continuously expends a lot of energy to compute hashes, and c) most importantly, it serves no other function than to validate Bitcoin transactions. On SAFE, anyone can join the network with existing computers and earn money for providing real value to the network by sharing their hard drives and bandwidth.

File storage is but a single use for the network. Most of what can be done with the current internet can be done with SAFE. Facebook, Twitter, and LinkedIn-type social websites can be built upon SAFE, as can real-time communication (chat/email), e-commerce stores (eBay/Amazon), media/streaming (YouTube/Twitch), news (CNN), mobile/desktop apps, and anything else you can think of. The SAFE network inherently provides the decentralized database, authentication system (logging in/out of apps), and security system (automatically encrypting data at rest and in transit).

Another major innovation is the incentive to create useful applications on the network. In addition to earning safecoins for proving storage and bandwidth, application developers will earn safecoin simply for creating applications used by others. In other words, it will now be possible to earn money for creating useful open-source applications. The more an application is utilized on the network, the more the developer is rewarded safecoin. Additionally, there are no ongoing hosting costs that the developer needs to worry about. All of that is already taken care of by the network, making it much more compelling to innovate on SAFE than the conventional internet. This opens the gates to an entirely new business model that rewards creators who love what they do.

SAFE Network Advantages

How do we benefit from all of this?

  1. Privacy. Everything is automatically encrypted end-to-end. Developers are not burdened with encryption overhead inside their applications. It is already taken care of for them. The network is resistant to IP address identification.
  2. Reliability. Due to the redundancy built into the network, the chances of your data being erased is near zero. This is currently not the case where your data is stored on centralized computers that are prone to malfunction or become corrupt/erased due to human intervention and error.
  3. Easy authentication to services. It will no longer be necessary to sign up separately for every website you use; your authentication on the SAFE network itself will work for most services.
  4. No hosting required for apps and all the hassles that come with it. That also means no system administrators required.
  5. New business model that rewards application developers.
  6. Sony-type hacks are not possible.
  7. Your data is yours. The Facebook-like website you use on the SAFE network cannot use your private information to track you or sell your information to others.
  8. You will not experience any down-time for popular websites and files, such as the famous “reddit hug of death” (no possibility of overloading centralized servers).
  9. Due to the decentralized nature of the network, all apps are inherently censorship-free. Because there are no centralized servers and all information is encrypted before it even touches the network, it is impossible for an oppressive government to shut it down, let alone pinpoint the location of any data on the network. A major step-up for freedom of information and speech in the 21st century.

How does the SAFE network work?

Who owns what data? How it is possible to have instant-confirmation, zero-fee safecoin transactions? How is double-spending prevented without a blockchain? What if a file undergoes a DDoS attack? Let’s dive in and look into how the network works in order to answer these questions.

When you join the SAFE network, a public key-pair is created for you. Each node you create will have its own public key-pair based off this one (so that they all belong to “you”), and your master key-pair can invalidate any of them at any time. For the average user, your personal computers and devices will be your only nodes. As soon as one of your node comes online, it will automatically be assigned a completely random ID in addition to the key-pair. The pool of available IDs is astronomically large: (2512 - 1) to choose from! That’s more than all the atoms in the universe combined! Your personal identity is not tied to this ID in any way, thus you remain anonymous.

When you log into the network, a virtual hard drive will be mounted on your computer. All the files that you add to the network will display here. Although it will look like all your data is there, in reality it is all broken up, encrypted, and dispersed throughout the network, ready for you to call it up when you need it. All data will be shared at the file system level, which means there will be no need for HTTP, FTP, SMTP, etc.

You will have the option to specify how much of the network’s data you’re willing to store on your hard drive, thus becoming a farmer and turning your node into a vault. You will earn safecoin by sending out mining requests to the network after responding to successful GET (file retrieval) requests on your
hard drive. The amount of safecoins you earn are based on a set algorithm, whose speed will fluctuate according to the demands of the network. This algorithm is set to decrease the rate of mining as time goes by, eventually stopping at the 4.3 billion safecoins mark. As time goes by, your vault will be ranked higher by the network according to your uptime, cpu, diskspace, and bandwidth (speed). The higher your rank, the more safecoins you will be paid. If you provide less resources to the network than you consume for your own data, you will need to pay for the excess using purchased safecoins. There are discussions under way deciding whether or not to give away some free space for new accounts.

When your node is ranked highly enough, it is considered to be validated as a trustworthy node. At this time, your vault can take on one or more other personas: a client manager, data manager, vault manager, or transaction manager. All personas manage each other in the network. Let’s go over exactly what happens when you upload a file to the network:

  1. The SAFE client on your computer will split up your file into chunks no larger than 1MB in size. Each one is hashed and encrypted. To further obfuscate each chunk, every chunk is passed through an xor function using the hashes of other chunks. Each chunk is then broken into 32 pieces in a smart way that requires any 28 pieces to recompile the chunk. Key -> value pairs are added to a table on your computer that serves as a data map, i.e. that described the locations of each chunk on the network. The key is the original hash of a chunk, and the value is the xor’ed value, which I’ll refer to as the chunk’s ID (which also acts as its location, as you’ll see). At this point, the files cannot be accessed by anyone except the holder of the private key (you). All of this happened before it even left your computer.
  2. All of your pieces are then passed to your 32 client manager nodes. These are 32 machines that are the closest to your node ID in xor distance. In layman’s terms, if your node has an ID of, say, 100, the existing nodes closest to you may be nodes with IDs of 96, 98, 99, 101, 103, etc. (Note that when we talk about distance here, we mean it in the mathematical, not geographical, sense. Nodes 100 and 101 may actually be on opposite sides of the globe.) Their job is to take your chunks and send them out to the network.
  3. A minimum of 28 out of 32 of these client manager nodes will then pass their chunk pieces to groups of 32 data managers whose IDs most closely match the chunk IDs (which is why the chunk ID also acts as its location, as mentioned above), using xor networking (described later). In this way the transfer of info can withstand small loss (up to 4 pieces) without retransmitting the whole data again (this is used in many places). This process is called the scatter <-> gather approach and uses Rabin’s Information Dispersal Algorithm. The data managers’ job is to distribute the chunk they received to nodes on the network (with vault ranks always being under consideration), and to continuously make sure there are always at least 4 copies available. At this point, each broken up chunk is now in its own data manager group. The data managers recompile the pieces into whole chunks at this point.
  4. The data managers will choose four vaults to send their chunk to, but not before getting a 28-of-32 consensus from group. Instead of communicating directly with the vaults, however, the data managers will communicate and send the chunk to each vaults’ group of 32 vault managers that are responsible for the chosen vault (again, they are the nodes closest to the vault in xor distance). All 32 data managers store the IDs of the four vaults holding their chunks. Only they know the locations of the chunks; not even you!
  5. The 32 vault managers’ jobs are to send the chunk to the vault for storage and continuously communicate with it to make sure it’s online, and that the file has not become corrupt. They do this by asking for the hashes of random chunks, which are created using the chunk’s hash + random string. The correct value can only be returned if the correct version of the file exists. As soon as the vault managers detect that the node or a chunk has gone offline, they immediately inform the chunk’s data managers, who will proceed to duplicate one of the other copies to another vault.
  6. The vault receives the chunk. There are now 4 copies of each chunk distributed throughout the network. Each vault gets a chance to earn safecoin when it is the fastest to deliver the chunk during GET requests.

All of the above happens seamlessly in the background. Retrieving the uploaded file will follow the same kind of route. To the average user, it’ll look like a file is uploaded or retrieved in a matter of seconds, or less.

Quick summary: 1. Your machine (client) breaks a file into chunks (let’s say 3 chunks) which are encrypted and broken up. 2. All pieces are passed to client managers (nodes closest to you). 3. They send the appropriate chunk pieces to their respective data managers (each chunk will each be sent to its own group of data managers.) 4. Each data manager group compiles pieces and chooses four vaults to store their chunk on. 5. They send the chunk to the chosen vault’s vault managers, who then forward the chunks for storage on the vaults. 6. From now on, the vault managers will be keeping an eye on the vault and the chunk. If it disappears, they tell the data managers to make another copy.

As you’ve probably noticed, there is a pattern here: all communications are done through groups of 32 nodes. This prevents a rogue node from creating problems on its own. This is the foundation of security on the SAFE network. It is impossible to choose the ID of your own nodes, or to decide which data you store on them, as that is all decided with the help of the network. Every time a node disconnects from the network and reconnects, it is assigned a totally new, random ID. Again, a) it takes a 28-of-32 node consensus to do anything with data, and b) it’s impossible to decide which roles and IDs your nodes take on. It is for this reason that you’d need to control 88% of the network in order to reliably attack it (compared with Bitcoin’s 51% attack). The larger the network, the stronger it becomes.

By now, you may already have an understanding of how safecoin works without a blockchain. Instead of everyone having a copy of every transaction like in Bitcoin, every address will be handled by a group of 32 transaction managers. The only difference with transaction managers is that there is an additional layer of security: there is a 7-group chain; the first group of transaction managers must get permission from another group of 32 nodes, and so on. This means that, to get a balance of an address or send money, a limited amount of nodes needs to be involved for each step of the process, instead of the entire network. This methodology is extremely scalable, as scalable as the network itself. While bitcoin is currently limited to 7 transactions per second, safecoin is only limited by the number of nodes in the network, and can easily scale to the thousands, and eventually, hundreds of thousands per second.

Further enhancements are implemented into SAFE which make the network faster and increase overall security. Some include:

  • Network caching. Intermediate nodes continuously retrieving the same chunk (due to popularity) will cache the chunk themselves, bringing it closer to requesting nodes.
  • Flood prevention. A node that sends too much information to overwhelm other nodes gets disconnected by the nodes closest to them. This helps prevent DDoS attacks.
  • Churn is an advantage. Nodes throughout the network are constantly going offline and coming online. This increases security as the IDs of nodes throughout the network are constantly changing.
  • Protocol rule enforcement. If any node, no matter the persona, breaks the rules by trying to do something that’s not allowed, it is immediately de-ranked or disconnected.
  • Holes fill quickly. When any node becomes overwhelmed and unreachable, it is immediately replaced with another one. This also mitigates DDoS attacks. As you can see, the network is very capable of self-healing!

It may seem that sending and receiving files over the network would be a time consuming process with so much going on, but routing over the network is quite efficient using a Kademlia-like distributed hash table (see below). With millions of nodes, it seems impossible to find a node closest to a certain ID. However, the amount of hops required to find a node closest to a particular address is O(log n), where n is the total number of nodes in the network. Put simply, in the absolute worst case scenario it will take 23 hops to locate a node with a particular ID in a network of 10 million nodes! Once located, they can communicate directly with each other.

A brief explanation on how a node can find any other node in the network quickly, using Kademlia: every node has its own list of nodes at varying, increasing degrees of distance from it. For example, node #1 will have the information for nodes #2, #4, #8, #16, and #32. If he’s looking for node #19, he’ll contact the closest node in his list, #16, and ask him if he knows about #19. Node #16 may have #17 and #20 in his list (since he’s closer to them), so although he doesn’t have information on #19, he’ll ask #20 for the info. #20 will have #19’s information (due to being as close as possible to the node). Thus, in only a few hops #19’s information is returned to #1.

More Resources

Conclusion

I hope this article gave you a good understanding of what MaidSafe and the SAFE network is and how it functions, and at least one precious “aha” moment. Although we covered a lot, there is still a lot to learn before the network goes live in early 2015. Just for fun, here are a couple bullet points to think about for the (distant?) future:

  • When the network becomes large, it can be made to function similarly to AWS: you’ll be able to buy a virtually unlimited supply of on-demand servers.
  • Connections currently rely on IPv4/6, but eventually, as mesh networks become larger and more usable, nodes will be able to communicate solely using SAFE IDs.

A big thank you to Nick Lambert and David Irvine from MaidSafe for proofreading this post and explaining many of the details, and a big thanks to the community for the ongoing support!


This post was taken from my blog at http://blanshey.com. I will try to keep both the blog and this post updated as time goes on.


Security of chunks before being sent to locations on Network
Http://maidsafe.org/ still down
On Transaction Managers and its protocol
Will users' IP addresses be discoverable?
Record locking; and getting into SAFE
As a beginner, here is what I don't understand
Tor + Blockchain SSL Stripping Bitcoin thefts
Storage proceeding
#2

This is amazing man, I’m gonna send this to all my friends!


#3

Thanks for this breakdown @eblanshey! Although much of the underlying technology is new to me the more I read the more I get those “aha” moments :smile: Back to more reading!


#4

Just a small clarification here (sorry missed this earlier). All data is typed, by that I mean every data type is in its own address space (so the 2^512 is for every data/ node type).

So we assign types to data so a vault key (actually 2 key pairs) is a PmidKeyType (the other is (AnPmidKeyType and used only to sign PmidType to allow revocation). Therefore a PmidKeyType is stored for you on startup so it takes up a small piece of the storage of PmidKeyTypes.

TL;DR IF there is a key already you cannot store another and the AnPmidKeyType means the collision chance is actually squared (extreme well beyond millennia of all the computers in the world creating keys).

The network uses this for security as it will not allow you to create a key to join a group that has more than 2 initial leading bits in common different from the rest of the group, This makes attacks where people try like mad to keep reconnecting to get a good key in a group they want beyond the normal realms of security. After that rank does as you say so it is pretty secure in that area (something we have improved a lot during testnet1 and 2 just to be sure).

Again great work, all this is very very hard to explain as you have done. With @ioptio having the overview she did, this and the system docs I think our documentation is getting pretty good now, at last I hear everyone say :wink:


#5

I remember I asked some time ago on another forum and someone said PUT doesnt generate safecoins, but this one does say that. Did it change or was always like this? If thats true, isnt this a flaw? Teorically data will be stored on nearby nodes, correct? Then you dont need 88% of network to exploit(boost safecoins), you just need to own few nearby nodes and send data to it.

Still about nearby/latency/caching: say I upload something to the network. Now say I have 10 nodes in some area very far from other nodes, be it because of the network is too new or maybe that im just really to far from other possible nodes.

Now I start to download over and over the content I just uploaded. On the cache teory, my 10 nodes will cache the content i uploaded and kept downloading.

Now I just keep downloading more and more, and now that content is on my own network, I wont even spend my isp bandwidth, thus only spending energy to generate safecoins.

If all I said is alreqdy covered, great, no need to reply in details. Thanks in advance


#6

It costs to put (maybe free or (more likely) tiny for public data) -> user pays
It earns when people retrieve from your vault -> vault owner earns

Gets form cache do not earn safecoin.

No point in uploading and then Getting, this is the scale issue safecoin is not earned on every get but a modulus number, the chance you guess which get works is amortised across the address range. All you will do is fill up your own bandwidth, if there is a vault there it will get de-ranked etc. Its purely the scale of the address range that makes this kind of theft really difficult (and hopefully very very boring :wink: )

Hope that helps.


#7

Can you clarify this part? Are you saying that the two initial leading bits must the same as the rest of the group you’re trying to join? And what if there aren’t enough nodes to fill a group that meet that criteria?

Meanwhile, I’ve removed/simplified that part of the post.


#8

An id will look like this
00100100100010

A group may already be like
00100110100010
00100100100010
00100101100010
etc.
So they share
001001
As their leading bits
A node can join if it has not more than two bits different so in this case
we could have
00100100
001001001
001001000
etc. able to join but not
00110100 and so on


#9

Thanks for the post. I try to learn a bit more about Maidsafe everyday.


#10

Great explanation, thanks.


#11

someone had a productive weekend :smile:
thanks for writing this up. i really do think this should
be thrown out there for bitcoiners and the less technical
crowd.


#12

Does the owner earn if his vault is public (open for access to anyone)?


#13

Amazing! Is this your first post on your blog? I don’t see anything else.

I will certainly be sharing this within my corner of the bitcoin community! Thank you!

One thing I’m wondering about, though…

Vaults do not earn safecoin for PUTs, only GETs.
And what’s the difference between 1) and 2) here? I assume you’re referring to farming, right? The action of farming IS the action of a vault earning safecoin for users successfully getting data. Unless I’m misunderstanding what you’re trying to say.

Anyways, great job at the writeup and thanks for doing it! :slight_smile:


#14

Thank you so so much! First time for me to realize those 28-32 pieces transmit mechanich. Where did you get this?


#15

This is an update to routing (routing_v2) which is being finalised in parallel at the moment. Should finalise testing this week all going well. It includes scatter gather and Information Dispersal for pretty large increases in speed and again significant efficiency savings (faster churn handling)

It also means the API’s for lower layers adopt a asio::async_result mechanism which allows callbacks/futures or co-routines to be used. This is a little techie, but means asynchronism is more fluid and adaptable for each action type. Looks like not much, but allows much faster and more efficient application logic for developers building on the API when using parallelism and concurrency patterns.


#16

First of all: Great post @eblanshey !! Thank u very much for that.

So, only 28 nodes (pieces) are needed to recreate the chunk that was dispersed into 32 pieces and sent to 32 nodes? Is this the same trick PAR files use to fix a broken file on usenet?


Preventing data loss using parity checks
#17

Similar to RAID really the link takes you to the pdf describing Rabin’s algorithm. Its pretty cool, not secret sharing, but info dispersal.


#18

Yes,the first word come up in my head was RAID when I read that part. And it is a “flying RAID”…:smiley:


#19

Amazing! Is this your first post on your blog? I don’t see anything else.

Yup, first blog post :smile:

Vaults do not earn safecoin for PUTs, only GETs.

Ah I misunderstood David’s explanation; he stated that it costs money to PUT so I assumed it goes to the vault. Who gets the money?

And what’s the difference between 1) and 2) here?

I was under the impression that getting paid for GET requests and getting coins that are “created” (new coins) are different things. Maybe @dirvine can clarify?

And thanks for the praises everyone, I’m glad it helped you!


#20

As I understand, it goes back into the network farming pool and this is the mechanism for recycling.

The network disperses new coins via GET requests which is the “proof of resource” mechanism. http://maidsafe.net/SystemDocs/system_components/proof_of_resources.html