GitHub code: FileSystem - Unlimited Size (working on Mock Network)
(Cross platform dot net core 2.1)
Very early stage, tests passing:
- Creating files and folders (even empty folders)
- Writing to files
- Reading from files
Inherently in this is quite a lot of functionality, like for example the basic indexing of files by filename.
Infinitely expanding structure
As you migfht know, I have crafted an infinitely expanding structure on SAFENetwork, on top of the MutableData structures available, which otherwise are each limited to 1MiB in size or 1k key-value pairs.
I was inspired by @happybeing, who has been building a filesystem, and he mentioned that he still had to solve the limits of number of files.
I wanted to see for myself how I could use my structure for this purpose.
Now, I don’t code javascript at all (it’s not my cup of tea). So I looked up some existing FileSystem implementations in C#, and found a couple; one more advanced in dot net core, and one simpler.
So, to start with I used the IFileSystem interface and filesystem path from the simpler one, but as I get more acquainted with the problem space, I’ll move towards the more advanced solution.
Filesystem over SAFENetwork architecture
Architecture
The basis for it is the DataTree structure, that will expand vertically (append heads) and horizontally (add leafs), as either get full.
- Every new head, is a new level, and so every level means an expansion of the potential size of the structure, of ten to the power of three (1k new entries in the new head). So 1 level gives 1k, 2 levels gives 1 million, 3 levels : 1 billon, 4: 1 trillion … .
- Each level from head and down to level 1, stores MdEntryPointers under keys fom 1-999 (entry 0 is level), which points to a specific entry in an IMd in the level directly below it.
- Level 0 is the leaf level, where StoredValues are kept in the entries of actual MutableData (under the IMd abstraction - I stands for interface, and IMd is not to be confused with ImmutableData).
This way, the tree is filled, slot by slot, Md by Md, level by level… infinitely. And what is stored is always a reference to the current head, and by that, you hold the keys to the entire structure.
On top of that is logic for managing multiple DataTrees for various things:
Indexer
The basic indexing is to find any item stored in a DataTree by just specifying the key. Without indexing, we would have to search through on average half of all entries in the structure, to find what we are looking for.
We solve this by using more DataTrees; one for the indexer itself to keep track of all different types it is indexing (we call it TypeStore), and one for each specific index.
A file or directory path, is a unique index; there is only ever going to be one exact same path (in this file system). That means that we will create a DataTree, which has only 1 value.
In effect, that means it will only be one MutableData in the tree.
On the indexer startup, it loads its types into a dictionary, and every time you want to find a specific path, you access it via this dictionary. What it returns, will be quite an interesting thing…:
Directory
The directory is what this filesystem structure builds upon:
- Root path is a Directory
- A Directory has a DataTree, where it stores any number of references to other DataTrees.
- A Directory adds a subdirectory, by adding a reference to a new Directory, i.e. to its head in the DataTree.
- A Directory adds a file by adding a reference to an IMd to its DataTree.
- When interacting with the filesystem, you call the root path, and asks for the parent directory of the path you wish to act on. The root Directory structure then recursively derives a Directory instance representing the parent directory, where you then call Create, Exists, Find, Delete and so on.
Files
Files are currently a byte array stored in the entry of a MutableData. A more advanced version will adjust how the file is stored based on the size of the file. Maybe the byte array is split up over multiple MutableData, maybe it goes to an ImmutableData. We’ll have to find out what would work best.
Currently there is no process or OS lock on the files, but that code is available in the more advanced example I saw, and could probably be used. For network concurrency, the mutable data version would be a basic concurrency management approach, but that surely has many dimensions to it.
Writing to network is buffered, and would be flushed at convenient times.
The files are encapsulated in a Stream implementation as to model the above, and within it, it has the reader and writer functions that reaches into the SAFENetwork (or any other thing you implemented there under the hood).
Next up
Well, I will be adding functionality, refining code, piece by piece making it possible to use this. I actually have no idea how FUSE and Dokany work, so I’ve got some investigation to do before it can be hooked up that way.
This will be my playground to explore and evolve the patterns of data storage on SAFENetwork, maybe I can come up with some ideas for additions or improvements, maybe I can see how RDF can fit in, maybe I’ll just be learning - as always - how to code on SAFENetwork.