GitHub code: FileSystem - Unlimited Size (working on Mock Network)


#1

GitHub code: FileSystem - Unlimited Size (working on Mock Network)

(Cross platform dot net core 2.1)

Very early stage, tests passing:

  • Creating files and folders (even empty folders)
  • Writing to files
  • Reading from files

Inherently in this is quite a lot of functionality, like for example the basic indexing of files by filename.

Infinitely expanding structure

As you migfht know, I have crafted an infinitely expanding structure on SAFENetwork, on top of the MutableData structures available, which otherwise are each limited to 1MiB in size or 1k key-value pairs.

I was inspired by @happybeing, who has been building a filesystem, and he mentioned that he still had to solve the limits of number of files.

I wanted to see for myself how I could use my structure for this purpose.
Now, I don’t code javascript at all (it’s not my cup of tea). So I looked up some existing FileSystem implementations in C#, and found a couple; one more advanced in dot net core, and one simpler.

So, to start with I used the IFileSystem interface and filesystem path from the simpler one, but as I get more acquainted with the problem space, I’ll move towards the more advanced solution.


Filesystem over SAFENetwork architecture

Architecture

The basis for it is the DataTree structure, that will expand vertically (append heads) and horizontally (add leafs), as either get full.

  • Every new head, is a new level, and so every level means an expansion of the potential size of the structure, of ten to the power of three (1k new entries in the new head). So 1 level gives 1k, 2 levels gives 1 million, 3 levels : 1 billon, 4: 1 trillion … .
  • Each level from head and down to level 1, stores MdEntryPointers under keys fom 1-999 (entry 0 is level), which points to a specific entry in an IMd in the level directly below it.
  • Level 0 is the leaf level, where StoredValues are kept in the entries of actual MutableData (under the IMd abstraction - I stands for interface, and IMd is not to be confused with ImmutableData).

This way, the tree is filled, slot by slot, Md by Md, level by level… infinitely. And what is stored is always a reference to the current head, and by that, you hold the keys to the entire structure.

On top of that is logic for managing multiple DataTrees for various things:

Indexer

The basic indexing is to find any item stored in a DataTree by just specifying the key. Without indexing, we would have to search through on average half of all entries in the structure, to find what we are looking for.
We solve this by using more DataTrees; one for the indexer itself to keep track of all different types it is indexing (we call it TypeStore), and one for each specific index.

A file or directory path, is a unique index; there is only ever going to be one exact same path (in this file system). That means that we will create a DataTree, which has only 1 value.
In effect, that means it will only be one MutableData in the tree.

On the indexer startup, it loads its types into a dictionary, and every time you want to find a specific path, you access it via this dictionary. What it returns, will be quite an interesting thing…:

Directory

The directory is what this filesystem structure builds upon:

  • Root path is a Directory
  • A Directory has a DataTree, where it stores any number of references to other DataTrees.
  • A Directory adds a subdirectory, by adding a reference to a new Directory, i.e. to its head in the DataTree.
  • A Directory adds a file by adding a reference to an IMd to its DataTree.
  • When interacting with the filesystem, you call the root path, and asks for the parent directory of the path you wish to act on. The root Directory structure then recursively derives a Directory instance representing the parent directory, where you then call Create, Exists, Find, Delete and so on.

Files

Files are currently a byte array stored in the entry of a MutableData. A more advanced version will adjust how the file is stored based on the size of the file. Maybe the byte array is split up over multiple MutableData, maybe it goes to an ImmutableData. We’ll have to find out what would work best.
Currently there is no process or OS lock on the files, but that code is available in the more advanced example I saw, and could probably be used. For network concurrency, the mutable data version would be a basic concurrency management approach, but that surely has many dimensions to it.
Writing to network is buffered, and would be flushed at convenient times.

The files are encapsulated in a Stream implementation as to model the above, and within it, it has the reader and writer functions that reaches into the SAFENetwork (or any other thing you implemented there under the hood).

Next up

Well, I will be adding functionality, refining code, piece by piece making it possible to use this. I actually have no idea how FUSE and Dokany work, so I’ve got some investigation to do before it can be hooked up that way.
This will be my playground to explore and evolve the patterns of data storage on SAFENetwork, maybe I can come up with some ideas for additions or improvements, maybe I can see how RDF can fit in, maybe I’ll just be learning - as always - how to code on SAFENetwork.


#2

Pardon my ignorance, but isn’t this the sort of thing NFS on SAFENetwork seeks to provide? Are there limitations (such as indexing) which NFS does not deal well with?


#3

Woah, this is fanatic news, bigger than SAFE Drive IMO and @oetyng, if you want help sitting FUSE on top of this I’m here. This might be a better way to go, certainly looks like it, so I’m happy to discuss ways to collaborate etc My aim is to get a local file system mounted on SAFE, I’m not attached to the code I’ve written so far, but maybe there are ways I can help.

Looks like you may have cracked a few issues here. I’m aware there are changes likely in the API that might affect both approaches (re RDF, append only storage) but I’m not sure at what level or how.

High five man. Let me know if you want to chat at some point. I’ll plod on with SAFE Drive for the moment, but can easily switch my time to help you (though it is reducing significantly now).

Well done. Your hard work has paid off :slight_smile:


#4

That would be absolutely splendid @happybeing! There are so many things there that are still unknown to me, and you have gather a lot of knowledge in the area. There should be plenty of ways we can collaborate. Great news :slight_smile:
I’m just starting here so still many things to implement “properly” in the code for reliability, performance, threading, concurrency and so on, things that I simplified to get first tests passing. There will also be need for thinking about a slightly different way of handling unique indices. So, in a while having a look at FUSE etc.

NFS has a few limitations, of which the most notable is that you have containers (MutableData) as the base structure, which are used like directories, instead of like in my solution: DataTrees (ever expanding tree of MutableData), that are used for various things, like directories, indices and such. The latter certainly requires indexing because of the potential size, it would be slow to use without it (with any real world sizes of data). But sure that base also allows searching generally.
NFS will have a hard time to grow like that, it is like the minimum implementation.
So, this is like a redesign of NFS, for scalability.


#5

This is what we really need I think. A SAFE file system that available on all platforms, so different desktop OS, web and mobile APIs, ideally using the same underlying API. This is why I chose to build on SAFE NFS.

If not, data stored using one SAFE file system app will not be automatically accessible to other apps or platforms. So, for example I save stuff in my web editor, but it isn’t readable on my mounted SAFE drive or vice versa.

So one question for me is how could we enhance the underlying built in APIs to solve the problems I’ve hit which @oetyng has found solutions to? There are different options here and we don’t need the answers now, but if we want a universal SAFE file system I think that’s the key question, and will need to involve '@Maidsafe and their plans, which I understand are still evolving (wrt append-only data and RDF).

Mean time @oetyng and I can work together as well as on our separate projects, learning and flushing out further issues that need to be considered.

How does that sound?


#6

The more you two are able to prototype and find working solutions to in whichever language is most comfortable/efficient for you, the easier it will be (for MS?) to refactor in Rust and integrate with the core NFS api. So in other words, I agree that what you’ve done so far is the best way one could go about doing it given the time/resource constraints and learning curves involved. I hope to be able to join in on the fun at some point, but have been learning a lot by watching your progress as I sit on the sidelines. Thank you!


#7

Yes, totally agree. Actually I haven’t yet reached the part where I know how that could be done. So far only beginning to explore how an expandable structure can be used.

There is so much interesting to get into with regards to indexing and searching that will also affect our aims and wishes when it comes to data structures in the network. I will head towards that in a while (as well as deeper into distributed RDF world), first however: needing to get more solid results from these recent ideas, see that there are no lurking limitations, solve those I already have spotted, get to acceptable reliability and performance.

Absolutely great. I need to focus on some other things for a while now, but after that I’d be very happy to have a chat and talk more about all these things!


#8

Me too, as noted elsewhere my available time is reducing quite a bit now. Hopefully I’ll have enough to get a useful product (eg decentralised git - that’s my current aim) meanwhile we can feed into the Maidsafe processes, and keep in touch.

Great work @oetyng, exciting to see and I’m very interested to see how it develops.


#9

Concerning distributed version control systems on Safe; I also think it is smart to focus on the very popular Git, but best also to keep an eye on Pijul. That seems to be a better match technically, because the same change in the controlled data causes more underlying data (that has to be effectively stored) to change in Git than Pijul, if I understand it correctly. I know HappyBeing is already aware of Pijul, but could be informative for others who read this.
I wonder if it could be useful to use inotify on Linux in another, simpler implementation of a file sync solution between local storage and the Safe Network. Inofity can probably only be used for syncing local file and directory changes to Safe. Not in the other direction. It seems interesting to me to maybe try something with inotify on Linux, in combination with python (Safe bindings).


#10

This something to think about?


#11

Imo Rsync is about as easy as it gets for managing a local folder with a mounted safe drive.


#12

Yes rsync is a very useful tool. But inotify has another function, like make it possible to rsync automatically and immediately after a file changed and not periodically.