For C# developers: SAFE.DataStructures


#1

I’ve continued my work on the SAFE.Datastructures, as outlined in the dev forum.
But there’s not as much discussion going on there as in this forum, and I’d like to discuss these things, so posting about it here instead!

I now have:

IPersistentStack<T>
public interface IPersistentStack<T>
{
    Task PushAsync(T item);
    Task<Result<T>> TryPopAsync();
    Task<Result<T>> TryPeekAsync();
    Task<List<T>> ToListAsync();
    Task<long> GetCountAsync();
}
IPersistentDictionary<TKey, TVal>
public interface IPersistentDictionary<TKey, TVal>
    {
        Task<long> GetCountAsync();
        Task<bool> ContainsKeyAsync(TKey key);
        Task<long> GetEntryVersionAsync(TKey key);
        Task<Result<bool>> SetAsync(TKey key, TVal val, long expectedVersion);
        Task ClearAsync(); // not implemented yet
        Task TryRemoveAsync(TKey key); // not implemented yet
        Task<TVal> GetOrAddAsync(TKey key, TVal value);
        Task<Result<TVal>> TryGetValueAsync(TKey key);
        Task<Dictionary<TKey, TVal>> GetDictAsync();
    }
IPersistentQueue<T>
    public interface IPersistentQueue<T>
    {
        Task EnqueueAsync(T item);
        Task<Result<T>> TryDequeueAsync();
        Task<Result<T>> TryPeekAsync();
        Task<List<T>> ToListAsync();
        Task<long> GetCountAsync();
    }

The Stack and the Queue are based on a construct which I call ChainedMd.
The principle is that each Md created as a ChainedMd, has a metadata entry which contains address to the next Md in the chain.
The Stack is fully functional and unit tests are passing.
The Queue still has to get a proper algo, it is currently a stack :roll_eyes:. I am evaluating a couple of options for implementing this efficiently (suggestions are welcome). I’ll update with progress.

The Dictionary is very similar to the latest EventStore code (with sharding and so on, to increase capacity), and I think I will actually be able to include the eventstore as a datastructure in this library (we’ll see).
All tests are passing here too. However, RemoveAsync and ClearAsync are not yet implemented.
But, I think I will refactor this code and make use of a construct which I call LinkedMd together with another one called ValueMd (both still in progress) - as to simplify the code.

When I have chosen a suitable algo for the Queue implementation, I will head on to a somewhat more complicated matter: transactions spanning over several datastructures.

With it, I want to be able to do something like this:

using (var tx = _stateManager.CreateTransaction())
{
    await _dict.SetAsync(tx, key1, value1);
    await _dict.SetAsync(tx, key2, value2);
    await _queue.EnqueueAsync(tx, item);
    await tx.CommitAsync();
}

…and all these operations would succeed or fail together.
Doing this on SAFENetwork is not going to be trivial. But I have an idea. The solution is probably going to be quite slow, but that’s were I’ll start. When the code is up, if anyone has any suggestions on how we can improve on it, please share them!

Repo is here: https://github.com/oetyng/SAFE.DataStructures


#2

Okay, so I chose a queue algo which keeps track of oldest and newest index.
The updated code can be seen here.

Here is a nice teaser snippet
 public async Task EnqueueAsync(T item)
        {
            var last = await _head.GetLastAsync(minEntryCount: 1);

            var newestIndex = -1;
            var niRes = await last.TryGetMetadata(NEWEST_INDEX);
            if (niRes.HasValue)
                newestIndex = Convert.ToInt32(niRes.Value);

            newestIndex++;

            await last.AddValueAsync(newestIndex.ToString(), item);
            await _head.SetMetadata(NEWEST_INDEX, newestIndex);
        }

The queue maintains FIFO in basic scenarios, but there is still some cases to cover.
This solution has some problems, for example if the queue is never empty so that indices can be reset, but I wanted a working queue. If anyone has a better idea please let us know :slight_smile:

So, now the last unit test of the Queue class (first_in_first_out) is passing.
I added another one to test what happens when we exceed the first MD max count.

Unfortunately, we hit the infamous garbage collection of delegates long before that, and I cannot unit test this specific case now.

I guess I will make sure the queue indices are reset when emptied, as well as complete the api for the dictionary, so that ClearAsync and TryRemoveAsync can be used, before I head over to the bigger challenge of transactions.

Edit: @neo gave some nice tips here for how I could modify the dictionary to use less immutable data writes. I’m considering it, at least as a variant, since one of the use cases I currently have, would potentially be quite wasteful with data storage.


#3

Update (Github repo)

SAFE.Datastructures now uses the SafeApp nuget pkg. A lot of consolidation of the conde, refactors and simplifications have been done. LinkedMd and ValueMd has been more or less completed, while most of ChainedMd was refactored out to common baseclass CustomMd, which all these three inherit from.

Additionally, a CategoryDb was implemented, which is a way to separate some of the responsibility of sharding, (I will explain in more detail furher down), and last but not least, the Dictionary was re-implemented making use of all these constructs, using 1/3 of the code in the Dictionary_v1 class (not all of it removed, but a lot placed in the reusable CustomMd base class). Dictionary_v1 was very similar to the current EventStore, which is next up for this refactor.

(And also, as I mentioned in previous post, TryRemoveAsync and ClearAsync was implemented for this Dictionary_v2. Still needs testing though, and GetCountAsync is not working yet in _v2. However queue index reset has not been implemented yet).

So, what is the purpose of these classes?

Well, they capture some common uses of Md:s, into a reusable component, which can be part of larger blocks.

For example, the LinkedMd is used in the CategoryDb, and the CategoryDb is used in the three current data structure implementations: Queue, Stack and Dictionary. The LinkedMd is used in the Dictionary, and the ChainedMd is used in the Queue and Stack implementations.
ValueMd is used mostly in the LinkedMd.

LinkedMd
This models the usecase when each entry in an Md is always supposed to be a serialized MDataInfo (i.e. a “link” to another Md). The LinkedMd is initiated with the number of levels of such links that the tree will form, and LinkedMds at each level will contain the metadata of which level it is, and which the max level of the tree is. The last level is called the last link, it will always have values which are ValueMds (or rather the serialized MDataInfo of them).

ValueMd
These are supposed to hold any kind of value, up to the developer how to use it.

ChainedMd
These are supposed to always have a NextMd link in their metadata, which points to the next ChainedMd. All the other fields can hold any type of data.

CategoryDb
This construct is the connection to the app database entry. It works similar to the Dictionary, in that it uses LinkedMds, but only for a single level and a single data type (the string representation of a category).

Uses
The LinkedMd can be used as to horizontally partition data with algorithmic sharding, over multiple Mds. The Dictionary implementation uses a non cryptographic hashing to generate a partition key, based on
1. The category (the value type of the dictionary) in combination with the level of the linked md.
2. The key under which the value should be stored in combination with the level of the linked md.
The CategoryDb is a way to achieve the same, but tailored specifically to categorisation of data, such as the value type of a dictionary, or the stream type of an EventStore. (This was part of Dictionary_v1 impl, but was now modeled as a proper class and responsibility, which is used by all the different datastructure implementations.)

The ChainedMd works well for Queue and Stack implementations, as we here want to form a chain of Md:s.

The use of ValueMd is to clearly distinguish the Md usage pattern from the two other.

Capacity and performance
Now, the depth of a LinkedMd tree is configurable, but the current Dictionary impl. allows for 1 quintillion entries per db, and with 1k dbs per account, that would give 1 sextillion entries per account. It is more than most would need (you can - theoretically - enter more than 300 million entries per second in 100 years without filling 1 db. So you can go ahead with same pace in 100 000 years with a single SAFE account. Realistically I think such rate of mutations will not be quite possible yet.).

I haven’t done performance tests yet, but for individual reads and writes this design doesn’t really impede much (like O(k log n) where k << n, but let me come back with a proper big o analysis). Trying to list and scan and so on would be a different story.

Other uses
I have made an EventStore implementation (currently in production in a FinTech) based on ServiceFabric IReliableDictionaries (which have been the base inspiration for the IPersistentDictionary over SAFENetwork), so I will see if I can refactor current SAFE.EventStore to use the SAFE.Datastructure dictionary. I know it would be a simplification of the code, and basically reusing designed components which solve the smaller problems of which the bigger problem consist. (As opposed to solving the whole big problem in one solution). So it has been an obvious path that the EventStore would go along, since the first version has mostly been an example and getting familiar with the SAFENetwork API.

General thoughts on development with the SAFENetwork API and the C# bindings
I’m getting more and more comfortable with the SAFENetwork API, but there’s still much that I haven’t properly handled or even touched upon (private/sharing and that part). But this last month since I scrambled some time to get the SAFE.EventStore upgraded to safe_app_0.6.0, and therefore saw it meaningful to spend time on coding on these things (didn’t want to work with obsolete/deprecated things), has been really rewarding. (And soon it’s safe_app_0.7.0 coming up, but it will be less of a change this time :slight_smile: ).

I still see the garbage collection of delegates (which has to do with how the SafeApp C# bindings have been put together) as a rather big obstacle in my productivity - and it has been, all since October last year when I first ran into it (dind’t know until a couple of months ago that it was GC of delegates though). As an example, I can’t unit test what will happen when the Queue has filled its 995 entries and needs to move to another Md, as a call to a GC:d delegate happens long before that, and the app simply crashes.

I’m in contact with MaidSafe developers who are investigating it, but it would be really awesome if someone else from the community with deep knowledge of C# interfacing with unmanaged code, could have a look at that problem. I unfortunately do not have much knowledge of the GC, the inner workings of the language itself and have not been working with unmanaged code or the interfacing with it, so I don’t really know what to do about it.

At any given rate, the foundation for getting in to some serious application development is being laid here - for me at least, as I see these constructs as mighty valuable for simple, robust and fast development.
One big caveat here though, is that I am not near any real decentralized applications. That will require signatures / consensus of some kind, and it’s a whole other story. Instead, we’re talking about individual/corporation use of the secure, replicated and anonymous data storage that SAFENetwork will provide. I was actually intending on creating a topic for this, to get some feedback from the community on what kind of applications on the brink between this old and new internet that they realistically could see themselves be using. I have it on my todo list :wink:


#4

All a bit over my head but I’m glad you’re making progress! Let us know when you need some testing done.


#5

Great work @oetyng!

@JPL among other things, he’s building the next level of structures we’ll need for scalable applications - at some point I expect somebody will be stealing these ideas into nodejs and other languages :grinning:


#6

@JPL, I think this can be explained in better ways. I am still brewing on all the concepts and am very early in getting things sorted, just started to define these custom mds one week ago.

But here is a picture of what CategoryDb does:

I was going to bed but really felt like I should draw a small diagram so here it is. I will do more to explain all of what I am working on :slight_smile:

(I think LinkedMd should be called BranchedMd or something similar instead btw…)


#7

Not sure if I am getting this. Is the upper right value md supposed to be 79?


#8

I’m celebrating tonight with a big drink.

Not only did I today release a feature at my job that has been in doing for many months, and has been planned for more than 2 years…

… I also found out what this Heisenbug in the safeapp c# bindnings was, that has been haunting me since October last year when I first began to notice it: you can read about the actual reason for the bug here: https://forum.safedev.org/t/c-garbagecollection-of-delegates-solved/1761/1

Oh I have been waiting for this day, it’s an absolutely marvelous coïncidence that both these things happen on the same day: The time draining project from work, on which I have spent all my free time is over and the exact reason for this bug has been found out. These two things equate to good stuff happening on SAFE app programming front :rofl::hugs::sunglasses:

( oh, and yes @intrz, that was a copy paste error :slight_smile: I added correct image now :slight_smile: )