For C# developers: SAFE.DataStructures

Update (Github repo)

SAFE.Datastructures now uses the SafeApp nuget pkg. A lot of consolidation of the conde, refactors and simplifications have been done. LinkedMd and ValueMd has been more or less completed, while most of ChainedMd was refactored out to common baseclass CustomMd, which all these three inherit from.

Additionally, a CategoryDb was implemented, which is a way to separate some of the responsibility of sharding, (I will explain in more detail furher down), and last but not least, the Dictionary was re-implemented making use of all these constructs, using 1/3 of the code in the Dictionary_v1 class (not all of it removed, but a lot placed in the reusable CustomMd base class). Dictionary_v1 was very similar to the current EventStore, which is next up for this refactor.

(And also, as I mentioned in previous post, TryRemoveAsync and ClearAsync was implemented for this Dictionary_v2. Still needs testing though, and GetCountAsync is not working yet in _v2. However queue index reset has not been implemented yet).

So, what is the purpose of these classes?

Well, they capture some common uses of Md:s, into a reusable component, which can be part of larger blocks.

For example, the LinkedMd is used in the CategoryDb, and the CategoryDb is used in the three current data structure implementations: Queue, Stack and Dictionary. The LinkedMd is used in the Dictionary, and the ChainedMd is used in the Queue and Stack implementations.
ValueMd is used mostly in the LinkedMd.

LinkedMd
This models the usecase when each entry in an Md is always supposed to be a serialized MDataInfo (i.e. a “link” to another Md). The LinkedMd is initiated with the number of levels of such links that the tree will form, and LinkedMds at each level will contain the metadata of which level it is, and which the max level of the tree is. The last level is called the last link, it will always have values which are ValueMds (or rather the serialized MDataInfo of them).

ValueMd
These are supposed to hold any kind of value, up to the developer how to use it.

ChainedMd
These are supposed to always have a NextMd link in their metadata, which points to the next ChainedMd. All the other fields can hold any type of data.

CategoryDb
This construct is the connection to the app database entry. It works similar to the Dictionary, in that it uses LinkedMds, but only for a single level and a single data type (the string representation of a category).

Uses
The LinkedMd can be used as to horizontally partition data with algorithmic sharding, over multiple Mds. The Dictionary implementation uses a non cryptographic hashing to generate a partition key, based on
1. The category (the value type of the dictionary) in combination with the level of the linked md.
2. The key under which the value should be stored in combination with the level of the linked md.
The CategoryDb is a way to achieve the same, but tailored specifically to categorisation of data, such as the value type of a dictionary, or the stream type of an EventStore. (This was part of Dictionary_v1 impl, but was now modeled as a proper class and responsibility, which is used by all the different datastructure implementations.)

The ChainedMd works well for Queue and Stack implementations, as we here want to form a chain of Md:s.

The use of ValueMd is to clearly distinguish the Md usage pattern from the two other.

Capacity and performance
Now, the depth of a LinkedMd tree is configurable, but the current Dictionary impl. allows for 1 quintillion entries per db, and with 1k dbs per account, that would give 1 sextillion entries per account. It is more than most would need (you can - theoretically - enter more than 300 million entries per second in 100 years without filling 1 db. So you can go ahead with same pace in 100 000 years with a single SAFE account. Realistically I think such rate of mutations will not be quite possible yet.).

I haven’t done performance tests yet, but for individual reads and writes this design doesn’t really impede much (like O(k log n) where k << n, but let me come back with a proper big o analysis). Trying to list and scan and so on would be a different story.

Other uses
I have made an EventStore implementation (currently in production in a FinTech) based on ServiceFabric IReliableDictionaries (which have been the base inspiration for the IPersistentDictionary over SAFENetwork), so I will see if I can refactor current SAFE.EventStore to use the SAFE.Datastructure dictionary. I know it would be a simplification of the code, and basically reusing designed components which solve the smaller problems of which the bigger problem consist. (As opposed to solving the whole big problem in one solution). So it has been an obvious path that the EventStore would go along, since the first version has mostly been an example and getting familiar with the SAFENetwork API.

General thoughts on development with the SAFENetwork API and the C# bindings
I’m getting more and more comfortable with the SAFENetwork API, but there’s still much that I haven’t properly handled or even touched upon (private/sharing and that part). But this last month since I scrambled some time to get the SAFE.EventStore upgraded to safe_app_0.6.0, and therefore saw it meaningful to spend time on coding on these things (didn’t want to work with obsolete/deprecated things), has been really rewarding. (And soon it’s safe_app_0.7.0 coming up, but it will be less of a change this time :slight_smile: ).

I still see the garbage collection of delegates (which has to do with how the SafeApp C# bindings have been put together) as a rather big obstacle in my productivity - and it has been, all since October last year when I first ran into it (dind’t know until a couple of months ago that it was GC of delegates though). As an example, I can’t unit test what will happen when the Queue has filled its 995 entries and needs to move to another Md, as a call to a GC:d delegate happens long before that, and the app simply crashes.

I’m in contact with MaidSafe developers who are investigating it, but it would be really awesome if someone else from the community with deep knowledge of C# interfacing with unmanaged code, could have a look at that problem. I unfortunately do not have much knowledge of the GC, the inner workings of the language itself and have not been working with unmanaged code or the interfacing with it, so I don’t really know what to do about it.

At any given rate, the foundation for getting in to some serious application development is being laid here - for me at least, as I see these constructs as mighty valuable for simple, robust and fast development.
One big caveat here though, is that I am not near any real decentralized applications. That will require signatures / consensus of some kind, and it’s a whole other story. Instead, we’re talking about individual/corporation use of the secure, replicated and anonymous data storage that SAFENetwork will provide. I was actually intending on creating a topic for this, to get some feedback from the community on what kind of applications on the brink between this old and new internet that they realistically could see themselves be using. I have it on my todo list :wink:

12 Likes