Code: FileSystem - Unlimited Size (working on Mock Network)


#1

GitHub code: FileSystem - Unlimited Size (working on Mock Network)

(Cross platform dot net core 2.1)

Very early stage, tests passing:

  • Creating files and folders (even empty folders)
  • Writing to files
  • Reading from files

Inherently in this is quite a lot of functionality, like for example the basic indexing of files by filename.

Infinitely expanding structure

As you migfht know, I have crafted an infinitely expanding structure on SAFENetwork, on top of the MutableData structures available, which otherwise are each limited to 1MiB in size or 1k key-value pairs.

I was inspired by @happybeing, who has been building a filesystem, and he mentioned that he still had to solve the limits of number of files.

I wanted to see for myself how I could use my structure for this purpose.
Now, I don’t code javascript at all (it’s not my cup of tea). So I looked up some existing FileSystem implementations in C#, and found a couple; one more advanced in dot net core, and one simpler.

So, to start with I used the IFileSystem interface and filesystem path from the simpler one, but as I get more acquainted with the problem space, I’ll move towards the more advanced solution.


Filesystem over SAFENetwork architecture

Architecture

The basis for it is the DataTree structure, that will expand vertically (append heads) and horizontally (add leafs), as either get full.

  • Every new head, is a new level, and so every level means an expansion of the potential size of the structure, of ten to the power of three (1k new entries in the new head). So 1 level gives 1k, 2 levels gives 1 million, 3 levels : 1 billon, 4: 1 trillion … .
  • Each level from head and down to level 1, stores MdEntryPointers under keys fom 1-999 (entry 0 is level), which points to a specific entry in an IMd in the level directly below it.
  • Level 0 is the leaf level, where StoredValues are kept in the entries of actual MutableData (under the IMd abstraction - I stands for interface, and IMd is not to be confused with ImmutableData).

This way, the tree is filled, slot by slot, Md by Md, level by level… infinitely. And what is stored is always a reference to the current head, and by that, you hold the keys to the entire structure.

On top of that is logic for managing multiple DataTrees for various things:

Indexer

The basic indexing is to find any item stored in a DataTree by just specifying the key. Without indexing, we would have to search through on average half of all entries in the structure, to find what we are looking for.
We solve this by using more DataTrees; one for the indexer itself to keep track of all different types it is indexing (we call it TypeStore), and one for each specific index.

A file or directory path, is a unique index; there is only ever going to be one exact same path (in this file system). That means that we will create a DataTree, which has only 1 value.
In effect, that means it will only be one MutableData in the tree.

On the indexer startup, it loads its types into a dictionary, and every time you want to find a specific path, you access it via this dictionary. What it returns, will be quite an interesting thing…:

Directory

The directory is what this filesystem structure builds upon:

  • Root path is a Directory
  • A Directory has a DataTree, where it stores any number of references to other DataTrees.
  • A Directory adds a subdirectory, by adding a reference to a new Directory, i.e. to its head in the DataTree.
  • A Directory adds a file by adding a reference to an IMd to its DataTree.
  • When interacting with the filesystem, you call the root path, and asks for the parent directory of the path you wish to act on. The root Directory structure then recursively derives a Directory instance representing the parent directory, where you then call Create, Exists, Find, Delete and so on.

Files

Files are currently a byte array stored in the entry of a MutableData. A more advanced version will adjust how the file is stored based on the size of the file. Maybe the byte array is split up over multiple MutableData, maybe it goes to an ImmutableData. We’ll have to find out what would work best.
Currently there is no process or OS lock on the files, but that code is available in the more advanced example I saw, and could probably be used. For network concurrency, the mutable data version would be a basic concurrency management approach, but that surely has many dimensions to it.
Writing to network is buffered, and would be flushed at convenient times.

The files are encapsulated in a Stream implementation as to model the above, and within it, it has the reader and writer functions that reaches into the SAFENetwork (or any other thing you implemented there under the hood).

Next up

Well, I will be adding functionality, refining code, piece by piece making it possible to use this. I actually have no idea how FUSE and Dokany work, so I’ve got some investigation to do before it can be hooked up that way.
This will be my playground to explore and evolve the patterns of data storage on SAFENetwork, maybe I can come up with some ideas for additions or improvements, maybe I can see how RDF can fit in, maybe I’ll just be learning - as always - how to code on SAFENetwork.


#2

Pardon my ignorance, but isn’t this the sort of thing NFS on SAFENetwork seeks to provide? Are there limitations (such as indexing) which NFS does not deal well with?


#3

Woah, this is fanatic news, bigger than SAFE Drive IMO and @oetyng, if you want help sitting FUSE on top of this I’m here. This might be a better way to go, certainly looks like it, so I’m happy to discuss ways to collaborate etc My aim is to get a local file system mounted on SAFE, I’m not attached to the code I’ve written so far, but maybe there are ways I can help.

Looks like you may have cracked a few issues here. I’m aware there are changes likely in the API that might affect both approaches (re RDF, append only storage) but I’m not sure at what level or how.

High five man. Let me know if you want to chat at some point. I’ll plod on with SAFE Drive for the moment, but can easily switch my time to help you (though it is reducing significantly now).

Well done. Your hard work has paid off :slight_smile:


#4

That would be absolutely splendid @happybeing! There are so many things there that are still unknown to me, and you have gather a lot of knowledge in the area. There should be plenty of ways we can collaborate. Great news :slight_smile:
I’m just starting here so still many things to implement “properly” in the code for reliability, performance, threading, concurrency and so on, things that I simplified to get first tests passing. There will also be need for thinking about a slightly different way of handling unique indices. So, in a while having a look at FUSE etc.

NFS has a few limitations, of which the most notable is that you have containers (MutableData) as the base structure, which are used like directories, instead of like in my solution: DataTrees (ever expanding tree of MutableData), that are used for various things, like directories, indices and such. The latter certainly requires indexing because of the potential size, it would be slow to use without it (with any real world sizes of data). But sure that base also allows searching generally.
NFS will have a hard time to grow like that, it is like the minimum implementation.
So, this is like a redesign of NFS, for scalability.


#5

This is what we really need I think. A SAFE file system that available on all platforms, so different desktop OS, web and mobile APIs, ideally using the same underlying API. This is why I chose to build on SAFE NFS.

If not, data stored using one SAFE file system app will not be automatically accessible to other apps or platforms. So, for example I save stuff in my web editor, but it isn’t readable on my mounted SAFE drive or vice versa.

So one question for me is how could we enhance the underlying built in APIs to solve the problems I’ve hit which @oetyng has found solutions to? There are different options here and we don’t need the answers now, but if we want a universal SAFE file system I think that’s the key question, and will need to involve '@Maidsafe and their plans, which I understand are still evolving (wrt append-only data and RDF).

Mean time @oetyng and I can work together as well as on our separate projects, learning and flushing out further issues that need to be considered.

How does that sound?


#6

The more you two are able to prototype and find working solutions to in whichever language is most comfortable/efficient for you, the easier it will be (for MS?) to refactor in Rust and integrate with the core NFS api. So in other words, I agree that what you’ve done so far is the best way one could go about doing it given the time/resource constraints and learning curves involved. I hope to be able to join in on the fun at some point, but have been learning a lot by watching your progress as I sit on the sidelines. Thank you!


#7

Yes, totally agree. Actually I haven’t yet reached the part where I know how that could be done. So far only beginning to explore how an expandable structure can be used.

There is so much interesting to get into with regards to indexing and searching that will also affect our aims and wishes when it comes to data structures in the network. I will head towards that in a while (as well as deeper into distributed RDF world), first however: needing to get more solid results from these recent ideas, see that there are no lurking limitations, solve those I already have spotted, get to acceptable reliability and performance.

Absolutely great. I need to focus on some other things for a while now, but after that I’d be very happy to have a chat and talk more about all these things!


#8

Me too, as noted elsewhere my available time is reducing quite a bit now. Hopefully I’ll have enough to get a useful product (eg decentralised git - that’s my current aim) meanwhile we can feed into the Maidsafe processes, and keep in touch.

Great work @oetyng, exciting to see and I’m very interested to see how it develops.


#9

Concerning distributed version control systems on Safe; I also think it is smart to focus on the very popular Git, but best also to keep an eye on Pijul. That seems to be a better match technically, because the same change in the controlled data causes more underlying data (that has to be effectively stored) to change in Git than Pijul, if I understand it correctly. I know HappyBeing is already aware of Pijul, but could be informative for others who read this.
I wonder if it could be useful to use inotify on Linux in another, simpler implementation of a file sync solution between local storage and the Safe Network. Inofity can probably only be used for syncing local file and directory changes to Safe. Not in the other direction. It seems interesting to me to maybe try something with inotify on Linux, in combination with python (Safe bindings).


#10

This something to think about?


#11

Imo Rsync is about as easy as it gets for managing a local folder with a mounted safe drive.


#12

Yes rsync is a very useful tool. But inotify has another function, like make it possible to rsync automatically and immediately after a file changed and not periodically.


#13

@oetyng I’ve been revisiting your old threads and am realizing just how forward thinking you’ve been.
The debate around AppendableData brought me back to this thread in particular.
Why can’t we implement your idea here for an infinitely expanding data structure?

What are you thoughts around file versioning? How would you handle it in the context of this thread?


#14

Hi @hunterlester, nice to hear :slight_smile:

The principle could probably be used.

I am currently investigating various things related to this.

One thing is that I am looking at how to implement SSTables and LSM trees without MutableData, i.e. with something working as the AppendableData does - to the extent that I know of how that would work.

Another thing is that I am working on a BTree as well as a B+Tree for the existing infinitely expanding data structure (IED), as a replacement for the current indexing. (The current implementation is not so write-efficient, and does not allow range queries as is.)

So, there are various things to reconcile:

What I have here basically forming the IED, is a heap based on MD.
To make this a searchable and practical storage, I need to combine it with indices.
It can then become basis for a database, or a filesystem.
Now, the IED doesn’t need to be a heap, and if MD is being replaced with AD, then it seems more complicated to build a heap (don’t know yet if it is practically doable).

Enter LSM trees.

However, LSM trees also use indexing, for which B+Trees are quite apt. And B+Trees in the same way as these heaps, as far as I can see right now, seems to become difficult to implement efficiently without MD (or something like it).
(Another complication is that the SSTables use timestamps as to get them reliably merged. If we’re assuming writers from more than one machine, the timestamps will not be reliable. Maybe that will have to be a problem for the application.)

An AD could well be an SSTable, if we disregard the timestamp issue.
And so an AD in the role of current state, could store a map of current SSTables (ADs). After every merge of the tables producing ever bigger tables, we append a new map to that AD with current tables.
So far so good. But there’s still the question of how to construct an index efficiently with AD and ImD as building blocks.

But I am still investigating, only started.

As to my thoughts so far about AD, I was thinking about how I would want it to work as to be most useful within the ramifications of how I understand it to have been suggested.
I am looking at it as both a simple value with an opaque history, as well as a stream. The API I propose to be useful would be something like:
ad.AppendInSitu(value) // always appends, value directly stored in AD
ad.AppendToImd(value) // always appends, pointer to value stored in AD, value in ImD
ad.TryAppendInSitu(value, expectedVersion) // optimistic concurrency
ad.TryAppendImd(value, expectedVersion) // optimistic concurrency
ad.GetValue() => (value, version) i.e. current version
ad.GetValueAt(version) => value
ad.GetAll() => array of values, where index would represent version
ad.ReadForwardFrom(version) => array of values from [version] to current version
ad.ReadBackwardsFrom(version) => array of values from [version] to version 0
ad.GetRange(version_a, version_b) => order of a and b determines order of returned array // [a b], a != b, a,b > 0,
Possibly even a subscribe functionality, so that a new version is pushed. Polling would be needing a “hardpoll” as to not hit a cache.

About how the AD could be built internally:
We would be talking about the principle for the IED, as a basis for AD, so talking of the Rust layer, and that might give some more wiggle room, if the vaults can be allowed to have mutable storage, just for the internal use, and implement B+Trees for indexing.
MD right now is handled by a close group. But an AD, would it also?
Would be more scalable if an AD was internally built of fixed length segments, that are linked. If we reuse the MDs, remove the mutation feature, and just let the last entry be a link to a new MD, then we have something that could be an AD. It wouldn’t efficiently support the API I suggested above though.
The way I make the IED be infinitely expandable, is by building a tree of the MDs, which can have a massive fanout with 999 slots pointing to an MD each, in turn pointing to 999 each, and so on.
There’s a little bit of a chicken and egg problem if we’re looking at current IED as basis for AD, as the IED is not very practical without indexing, and without MD maybe no indexing? And then no [manageable] IED for AD? Well, that is the nut I want to crack :slight_smile:

The file versioning. If we look at how the storage solution is implemented here currently, it just stores data directly to an MD entry. I have already planned for this to change so that data is stored in ImD, and only pointers in the MDs.
This way, a file will always be an ImD. My first idea of file versioning would be to, in a new ImD (i.e. new file version) always include a pointer (well, that would be the datamap) to the ImD with the previous file version.

I have a lot more thinking to do here. I want to be able to either say: “Allright, scrap the MD, I will be able to build these datastore things with an AD this and this way.” or, alternatively: “I will not be able to build a datastore with the AD and ImD only, please do not scrap the MDs”.
The VFS stuff I have designed here is based entirely on this datastore functionality, so I would want to “secure” a way to implement the datastore before saying too much about the VFS.


#15

Hey @oetyng, I second @hunterlester here, really cool stuff. Would you be able to summarise what you woudl say are your tech requirements for the datastore?
I can guess some reqs ofc, like being able to add new versions to a certain piece of data indefinitely, and be able to do some fearly fast lookup of any version of it, what other things in summary would describe your reqs?


#16

Hey @bochaco thanks. Absolutely, let me get back to it, I’m just about to turn off the lights.

(On the AD subject, we could emulate the MD, by keeping a root AD with a map of ADs forming the entries. It won’t have the transactionality that current MD has, which was nice for example when creating a new index over existing data as we could commit large number of entries in a single commit. But it might have benefits with regards to making it handle range queries I think, since every new version of the map in the root AD, could easily be sorted after a key deletion, as it needs to be updated anyway, which was not the case with MD entries when keeping (key, value) as mdvalue and collection index as mdkey).


#17

Did you ever try reactive programming? It would be nice to have this implemented as an observable.


#18

I have mostly thought of this in terms of what can be done, with what we have. It’s been very investigative, and continues to be. For that reason I haven’t really set out a goal for what should be accomplished.

Now with MD going away it was the first time I was wondering if I could even get a basic datastore together. But I’m quite sure that ADs can be used to build MDs (like I wrote in previous post), however with slightly different properties, for better and worse. I’m just beginning to explore what implications it might have, and how to adapt to it.

As to build a very basic datastore, I currently don’t think there’s anything lacking. Depending on what you would mean with basic.

Things that would be valuable to have might add to the requirements, if we were to expect them.

Things you can expect from a regular datastore today is:

  • Concurrent writes from multiple clients.
  • Single writes to be atomic.
  • Thousands of reads and writes per second (some even millions).

I think this can be achieved, with various techniques. But that is basically what I am trying to find out.

A database called Noria (written in rust, implemented over RocksDb) has an interesting technique to avoid read-write locks.
(overview starts right here in this youtube presentation: https://youtu.be/s19G6n0UjsM?t=2000).

LSM trees write to in-memory tables before merging, and we could perhaps at some point use the same technique to avoid locks on the root ADs and B+Tree structures holding the indices, at the cost of duplicated data for the pointer maps. Otherwise we would perhaps need some distributed locks (and I have been thinking about those as well).

All of these things might introduce some detail to the requirements, as to work well, details we may not yet be aware of, that will emerge while developing.

One thing that isn’t trivially solved is latency. So write latency for a single writer can be somewhat circumvented with the LSM trees, but as soon as we introduce more clients, those writes will be as delayed as the merging of that writer is (i.e. persisting the in-memory tables to the network), alternatively as delayed as a consensus can be reached in a cluster, to which all clients connect (in case we set out to replicate the in-mem table, which maybe or maybe not could be a way to build such a cluster).

But I think it is the right direction to focus on write efficiency when choosing algorithms, as the reads can probably be served better with caching.

I feel like I’m in the very beginning of a long duration effort of both learning and experimenting here, so I’m unfortunately a bit vague with regards to requirements.


I’ll try get a summary though:

  • Indefinitely add new version to a piece of data (this is what emulates mutability of this piece of data, when we use a GetValue() API that returns the value as of latest version.).
  • O(1) on the above (both an append, as well as a read of latest value).
  • Fairly fast lookup of previous versions.
  • Writing multiple values in a single commit to a structure (for an AD, that would be like adding a range of values in one request, where each value is appended as a new version).

That way we can do something like this:

You load address of your datastore root AD (acting as ValueAD, i.e. we care only about value at current version), from your app container.

This AD contains pointer to ImD, which has a map of AD addresses (each acting as a StreamAD, i.e. an AD where we care about all the values/versions). Such a StreamAD can for example be a registry of what “types” your datastore contains. So you have an “Appointment” type, or “TemperatureMeasurement” type in your application, of which each instance of it you translate into a row, which is stored as an ImD, that has its datamap stored in an entry in (possibly) various StreamADs, thereby forming indices.

An entry in this example Registry table (i.e. StreamAD), is a pointer to a ValueAD. So the entry representing “Appointment” type, points to the ValueAD for the Appointment type. The current value of it, points to an ImD which has the following:

  1. The primary index ValueAD address, this ValueAD latest version points to the current StreamAD index structure.
  2. … n Secondary indices ValueAD addresses, these ValueADs latest versions point to their respective current StreamAD index structure.

This is loaded once at start.

Whenever a new index is created, the address to the new ValueAD is added, and this is serialized and put to a new ImD instance, and the ValueAD for the Appointment type is appended with the pointer to this new ImD.

You only append to the map of the StreamAD index structures (a StreamAD too), not the indices themselves as you want them to be sorted. So what you do is you keep an in memory table that you insert to (and query), and as a certain number of entries has been reached or time passed, you merge sort this in memory table, with a persisted table (a StreamAD) of same size, and keep merging while any smaller or same size persisted ones exist. Then send the sorted result to a new instance of an AD (so, this is the writing multiple values in a single commit feat), and append a new value to the ValueAD that points to this new StreamAD (the new table, containing the sorted entries of the previous table, and the in memory table), or add a new to the map if there was no merge (so flush only). This will give log(n) number of index StreamADs.

The indices are tree structures made up of ValueADs pointing to StreamADs, each entry in the StreamADs pointing to a new sub-tree of ValueADs and StreamADs.
Basically, a ValueAD+StreamAD is like an MD entry pointing to another MD.
The leaf level of these trees, point to an ImD with the actual data (in case of primary index, otherwise to the ValueAD of the primary index, as to only update one place when modifying underlying data) as well as (in case of B+Trees) to the previous and next ValueAD at leaf level (remember the entries are sorted), so that range queries can be made by first doing a point query, and then scanning from there on, until the end of the range has been hit. But there will be multiple indices to hit, actually log(n) of them, each taking log(n) to query. Using bloomfilter will improve the query performance which would otherwise go to (log(n))^2.

That’s the first idea so far, of how to build a datastore using B+Trees and LSM trees based on AD. Needs a lot more work obviously :slight_smile: So, I’m currently not sure what more would be needed to achieve the above. I’ll post this for now as a start.


DataStore over AppendableData design
#19

Hey, I haven’t used observables much. It is more about bringing changes in underlying data to the UI, and I haven’t worked much with UI. But I’ve written a lot of event sourced code, which would be the backend part, the foundation for these streams, over which observables could be implemented.


#20

Now after having discerned more here above, I can see that file versioning is very simple:
The file name (the full path) will have its fixed ValueAD in the primary index, and each version of it points to an ImD. You have all the versions in that AD.
So, that would be my second idea :slight_smile:

There’s no real distinction between ValueAD and StreamAD, it’s just what they represent to us, i.e. how we use them. It’s just AD all the same.