Hi @Blindsite2k, good questions!
- I think @drehb answered pretty well. But I’ll also say that I am deliberately leaving the accessibility a bit rough for now, because it is an unstable alpha, so there can be blue screens and stuff like that. So those who try this out now, it’s better they are a bit more used to that. But naturally, if you’d like to try anyway we can help you set it up.
- As @Nigel said, Exabyte. Very large number. The reason it shows
2,0 TB free of 7,99 EB
, is because it has no conception yet of how much total space there is, so it is usinglong.MaxValue
bytes (9223372036854775807) minus the current process 64 bit VirtualMemorySize (which showed about 2,0 TB on my VM for some reason), which gives 7,99 EB. - This could absolutely be a possibility. It’s quite open for suggestions here. It might be configurable as well how we want it to be displayed.
- You can delete and move around files and folders just like on any drive. The data that has already been stored to the network will always be there, but your drive will consider it erased just as if it was a local drive that you removed something from. There are some more details on limits and file sizes for practical usage with current version, but I’ll explain more in the details of the inner workings that’s coming up!
EDIT: And you’ll always have as much space, on any single drive, as you can pay for, until the network runs out of space
Wow, that’s great @Dimitar, thanks!
Thanks @ravinderjangra, yes that might be a good start. Cross platform GUI is a bit of a quest. I have also been looking at Xamarin Forms 3 for this: Xamarin Blog - An open source mobile platform for building Android, iOS, macOS, watchOS, and tvOS apps with .NET.
It will add full support for desktop apps on Windows, Mac OS and Linux (current status here: Platform Support · xamarin/Xamarin.Forms Wiki · GitHub).
It’ll still be some time before I have the backend ready for that
Thank you Mark! Yes I was thinking the same! Together we now have those platforms covered.
I’ve been planning to get in contact with you about FUSE as soon as I get ready for it. I’m sure there are plenty of things we can discuss regarding the general problems also.
Thanks everyone for the comments!
Well if it does everything you say it does then it certainly has the potential to become one of those go to apps everyone uses. Could certainly replace the need for external hard drives and thumb drives in a lot of ways though one would probably want those for backup and archival purposes in case the net went down. Have you tested the data transfer speeds this gives? I mean if it takes me an hour to copy an mp3 file then that kind of defeats the purpose of having mass space.
The inner workings - Overview
Storage architecture
Event sourcing
Being an event sourced
system, every change to the filesystem is recorded as an event.
These are encrypted and stored in a local SQLite
database, as an append only log a.k.a. WAL
(write ahead log), and subsequently, in a transaction, applied to an in-memory representation of a filesystem. This means that as you work with data on the drive, read and write it, it will be manifested in the in-memory filesystem. This also leads to some limitations that we will cover further down.
One of the reasons for storing into a WAL
, and building the current state as an in-memory filesystem, is to get minimal latency when working with the network drive; the aim is to make it feel as snappy as if it was a local drive. Another benefit of this WAL
approach, is that after initial connect, you can be offline without noticing any difference, and your changes will be synched to the network as soon as connection is back.
The event sourcing, and the perpetuity of the data in the network, also means that you would be able to reconstruct your drive as it has looked at any point in history, by just replaying the events, change by change - i.e. restore it to a previous version.
Event synchronization
A background job is detecting activity on the drive, and as soon as you leave it idle for a few moments, it starts synchronizing the events and the content to SAFENetwork
.
If your machine was to go down, you won’t risk losing any changes, as the WAL
is kept encrypted locally, and will continue synchronizing to the network on next start.
The events are stored into StreamADs
(appendable data) of the SAFE.AppendOnlyDb project recently presented.
If the written content of a file is larger than what fits into a slot in a StreamAD
, it will instead be stored as immutable data
, and the datamap stored to the StreamAD
.
The SAFE.AppendOnlyDb
is an infinitely expanding data structure, which uses common indexing techniques to allow you to get good access times of your data, even as it grows very large.
As you use a drive, events are produced, and a history of all the changes builds up. Any time you connect to the network, from any device, you will download this log - without the actual data - and build up the folder and file hierarchy locally. Using a technique called snapshotting
, this will be a rather small amount of data and fast download, regardless of how long time and how many changes you have applied to your drive. It would take a very large folder tree with a huge number of files, to make this initial synchronization notably slow. (But naturally, the limit to how large this folder hierarchy can be - without the actual file content remember - will be bound by how much working memory your machine has.)
The actual content of files are downloaded on demand as you access it, whereby it is cached in-memory while you use it. (Cache eviction is still on todo-list.)
Merge conflicts
You might be guessing by now, that by choosing this strategy, we have traded speed for complexity, because when the WAL is asynchronously uploaded to SAFENetwork, any changes you (or a team mate, family member, etc. etc.) might have incurred on the same drive from another device, might lead to a conflict, which isn’t detected until only after you have happily continued your work as if the changes went through just fine.
This is a big area which will probably need most of my focus from now on. First of all identifying all compatible changes that can be merged automatically. Second of all, identify and implement a strategy for dealing with other conflicting changes that cannot be automatically merged. This is not a new problem, on the contrary, it is a quite common problem today. So there will be plenty of resources to dig through, and then see how it most sanely can be applied in this situation.
Drive data handling
As a LocalEvent
is produced, it is encrypted into a WAL entry, which is stored in a local db file (one per drive). Asynchronously, this log will be worked down and uploaded to SAFENetwork, in the form of a NetworkEvent
. Unless you shut down the application before the last entry has been synched to the network, all local drive data will be wiped as the application is shut down.
Security and configuration data
There is currently a convenience approach to this, and there are rooms for improvement.
You create a user on your machine, by providing a username and a password. The password will be used to encrypt the user and drive configuration that you store locally on the machine, as well as the data in the local WAL db.
In your encrypted configuration file (one per user), you will store the SAFENetwork credentials to each drive.
It’s certainly possible to go about this in some other way, for example use several drives per SAFENetwork account, or not store the network credentials locally, etc. I’m fully open to ideas and requests, as to craft the solution that is most desirable for personal or collaborative use.
Performance
My initial experience with this alpha version, is that it actually does feel very snappy, thanks to basically being an in-memory drive. The write throughput to the network will primarily be restricted by your upload bandwidth, and secondarily CPU and local implementation details, which I hope are sufficiently optimized for practical usage, but could surely be improved upon otherwise. This is also something we will find out in better detail as it is being used.
Limitations
The SQLite
local database for intermediate storage of WAL
entries, has a limit of 1 Gb
per row. I have currently not implemented splitting of larger files into multiple rows, so for now it can only handle files up to 1 Gb
. But this is a priority feature, so it will soon be able to take larger files than that.
Being an in-memory filesystem also presents some challenges and limitations to how large files can be worked on at any given time. The available RAM will restrict how large files can be handled at any time, and currently also how large proportion of the filesystem you have accessed during a session, since there is currently no cache eviction. This is also a priority feature, as to not put a limit on how much content that can be accessed during a session.
Also, it is currently only working with MockNetwork (which is storing SAFENetwork data to a local file). Naturally it will eventually be possible to configure which (real) network to connect to.
Cloud drive and file system framework
I was able to find a cloud drive abstraction, and a good implementation of IDokanOperations
, with related tests, that I could use as a base. CloudFS
project: GitHub - viciousviper/CloudFS: The CloudFS library is a collection of .NET assemblies as gateways to various publicly accessible Cloud storage services. and GitHub - viciousviper/DokanCloudFS: A virtual filesystem for various publicly accessible Cloud storage services on the Microsoft Windows platform.. It is virtual drives over various cloud storage providers, and was an excellent template for my work. It is much more generic than I had need for, as it is supposed to be able to handle any additional implementations of cloud storage providers. I’m just interested in one
I have refactored this mentioned code, and used it in some new ways. It sits on top of the storage architecture described in previous section (the event sourcing, WAL synchronization, local current state as in-memory filesystem, etc.). I have cleaned up a lot of unused functionality and updated the code base to fit with the newest C# code features and my personal coding style. There’s still a few parts of unused code to clean up. I can probably also do some architecture improvements and simplifications, since it was written to be very generic, and SAFE.NetworkDrive does not aim to be generic. There’s no other storage provider needed when you have SAFENetwork
Deeper dive
I’ll post this for now. It would probably be nice to go even further into the implementation details, with code examples, as well as some visual representations, but I’ll do that in another post in that case.
so just to be clear the purpose of this app is so you can save/retrieve files on the SAFE network and it will appear/act like a thumb drive (with impossibly large capacity)? Sounds like just the kind of thing we need to bring SAFE to the cavemen like me that don’t know much about coding and are used to self explanatory boxes On these forums we are a minority but I assure you the majority of people in the world need good interfaces like this, not an open engine and a wrench.
That’s right! You start the program, in your explorer the drive(s) will show up like any other drive, and behave more or less exactly like it.
I completely agree. Most often it should be no more, than the simplest way it can help people do what they need to do.
Wow that’s a great explanation of what’s going on under the hood! Thanks for that
The future is here. amazing
Oh that’s looking nice, keep it up! we’ll be rooting for ya.
Just to answer this directly (it’s also addressed in the inner workings description above):
With smaller files you’d see the data transferred as if between two real drives, i.e. fast. In theory same would go for bigger files, but I haven’t tested that with chunking the data for the SQLite row size limit.
The upload of the data to the network happens with a slight delay, it’s like Dropbox synchronization actually.
That transfer speed would be limited by your upload bandwidth. Unless you need to login, upload and quickly log out, or access it directly on another device, you wouldn’t notice that.
But basically: actual data transfer to network won’t be slower than any other way to upload data from your machine to some remote location, and the user experience will hopefully be almost as if it was a local drive.
Bravo. You get it oetyng. Ever consider working for MaidSafe?
@oetyng, You and Mark @happybeing are going to create the killer product that will make SAFE useful to just about everyone who has ever used an extra drive, USB drive, thumb drive or wanted to with their phone.
It will be one of the most used programs for the SAFE network in my opinion.
Might even get people to backup their data
I take my hat off to both of you.
Amazing work, this is a very powerful but simple interface for users. That’s no easy thing to achieve so congrats on the work so far. I think (and probably you too) that the backend is the ‘interesting’ part but the UI is the most important so I’m happy to see it looking so tight!
What is ‘Location’ and ‘Secret’ in this prompt?
The location is your account secret and secret would be your account password ?
I’m concerned with how it says each SAFE drive is tied to one SAFE account. From what I understand people can create multiple public IDs to preserve their privacy and interaction with one another. Could one tie a drive, partition, or folder to a particular public ID? What I’m thinking here is if you give permissions to, or share as, a particular Public ID, a certain face if you will, you don’t want your different identities and their data getting mixed up. You don’t want your employer going through your activist materials or your porn collection and you might not want to advertise on the SAFE Dating site your theory that the moon landings were a hoax perpetrated by the ghost of Elvis reincarnated as an alien or something. Point being keep your reds, blues and greens separate. So with physical thumb drives you can give different people different drives, that’s why we love them. But if the SAFE drive pulls from one entire account that could be a problem as a lot of different data could be stored on there. So by what mechanism does the SAFE drive “sort” this data for sharing with other users?
For that matter if I want to set up a safe site do I need to use the SAFE web hosting app thing or can I link directly to the file since it is uploaded to the SAFE network after all? And if so where is the url?
@Blindsite2k I’d say that the employer example etc is not going to happen since your personal account would not be used as your business account the Boss, or girlfriend sees etc
Of course you should be able (later on) to “send” files to other people, which would be sending them the datamap.
But you make a good point. It would be good to have multiple drives in the one account that can be treated as one treats the thumb drives. So one for business, one for personal (relationships) sharing, one for that secret stuff, etc. Oh and ones to give to other people as one does with thumb drives
Another idea for a SAFE drive would be a drive where you could grant someone access to it for a given time period. Kind of like having someone borrow a thumbdrive with the expectation they would return it after a given period. That way the drive becomes reusable much like the thumb drive one would loan out would be as well.
Given that files are permenantly stored on the SAFE network and once one copied the data to their drive it would be theirs indefinitely this might have limited applications. Still the idea being one might want to give one only temporary access to a given amount of data.
This can certainly be changed, but the initial idea was exactly this, to create a new account for the drive. It is not tied to anything or connected to any of your other accounts. So it is the absolute isolation of that drive. If you share the credentials, it is like sharing a thumb drive, because after all, someone who’s got the thumb drive can do anything with it.
If you want to give a copy, then create a new drive, copy over the data (will be deduplicated so minimal waste of space) and give the credentials to that new drive.
In addition to that, or instead, one could choose to share access to the MDs, since they have fine grained access control. But if it’s about sharing write access, that would be a much more convoluted and work intensive way to do the same thing.
As for sharing read access, it would have to be implemented in the protocol to maintain a certain access level to all MDs (ADs), according to input from owner. It’s definitely a use case, so something like it will be there.