[RFC] Data Types Refinement

I’ll clarify over the coming days and we’ll see where we land :slight_smile:

3 Likes

Just wanting to mention that I am very happy to see this simplification for data types :face_with_monocle: :hugs:

6 Likes

Each key in a Map has a vec. The length of the vec is the expected version. That is our convention for denoting the version we expect the key to have next. (The current version of a key, is the last index of the vec.)
A Map instance also has data version, which increments on changes to any key in the Map.
So, any time you do an operation on a Map key, the key version is bumped, but also the Map data version is bumped.

(Read more about expected version here)

Expected version is the value passed in for optimistic concurrency control, it is the version you expect to come, not current version. Which is a convention.
You could pass in current version, but then you’d need to be able to pass in a value meaning “empty”, which is a complication. Additionally with unsigned it would not be representable.

Here is the convention explained (only, it’s called index instead of version , something that got updated later): https://github.com/maidsafe/safe-nd/pull/126

Examples

Private and Public Map

You have keys B and C.

The underlying representation looks like this:

"B": ["item_0"]
"C": ["item_0", "item_1", "item_2"]

But the current state view, would look like this

"B": "item_0"
"C": "item_2"

The expected versions are these

"B": 1
"C": 3
Map instance: 5

This is the same for Private and Public scopes.


Ops on Private and Public Map

Now say you want to send in a transaction consisting of 1 insert, 1 delete and 1 update:

tx {
    "insert" : [ { "A" : "inserted_val" } ],
    "delete" : [ "B" ],
    "update" : [ { "C" : "updated_val" } ],
}

(Here I have excluded the details of concurrency control. So you could attach expected version per key, or for the Map instance for example, and if any did not match, the entire tx would be rejected.)

The result for both a Public and a Private Map is the following:

Results of insert, delete, update:

The underlying representation looks like this:

"A": ["inserted_val"]
"B": ["item_0", "Tombstone"]
"C": ["item_0", "item_1", "item_2", "updated_val"]

But the current state view, would look like this

"A": "inserted_val"
"C": "updated_val"

The expected versions are these

"A": 1
"B": 2
"C": 4
Map instance: 8

You can see that the current state view considers the key B to be deleted.
But there is an API for accessing the key history, both the entirety, and a range, as well as accessing a single entry at a specific version of a key.

Again, this is exactly the same for Private and Public scopes.


Ops on Private Map

So, let’s look at hard_delete and hard_update. These are available only in the Private scope.

Let’s operate on the above result. And we want to send in this tx:

tx {
    "hard_delete" : [ "A" ],
    "hard_update" : [ { "C" : "hard_updated_val" } ],
}

Results of hard_delete and hard_update:

The underlying representation looks like this:

"A": ["Tombstone", "Tombstone"]
"B": ["item_0", "Tombstone"]
"C": ["item_0", "item_1", "item_2", "Tombstone", "hard_updated_val"]

But the current state view, would look like this

"C": "hard_updated_val"

The expected versions are these

"A": 2
"B": 2
"C": 5
Map instance: 10

You can see that with hard_delete, the actual value for A was deleted, which is represented with a Tombstone. Additionally, since we have to increment version, we append yet another Tombstone.
For C, the hard_update meant that its actual value was deleted, and then the new value appended.
B didn’t change since we did nothing there.

Again, these operations are only available in Private scope.


Details

Always write

Also, when implemented, you could specify ExpectedVersion.Any as a parameter, and the tx would always write, regardless of version - given that the keys to delete and update exists, and the one to insert does not exist.

Tombstone

If the current value of a key is Tombstone, that is interpreted as the key not existing when trying to operate on it. An update or delete on that key, would thus fail with KeyDoesNotExist. Similarly, for an insert to succeed, the key must never have existed, or have current value Tombstone, otherwise would fail with KeyAlreadyExists.
There is still an API for accessing the key history, both the entirety, and a range, as well as accessing a single entry at a specific version of a key.

Permissions

A Map in Private scope will have unique permissions for hard_erasure (covers both hard_delete | hard_update).
If you want to collaborate with someone smoothly on some shared private data, but not allow hard_erasure, you’d simply not give those permissions.
Even if you did give those permissions, you’d always see by the versions and the Tombstones, that an hard_erasure had happened.

3 Likes

Previous AppendOnlyData was a sequence of versions, and each version was a map of Key/Value pairs. My example was structured like the following:

V0 V1
KeyA ValueA KeyB New ValueB
KeyB ValueB KeyC ValueC
(global versions listed horizontally)

In your proposal they are structured the other way around, with a map of keys, and each key is a sequence of versioned values. The same example is structured like this:

KeyA KeyB KeyC
V0 ValueA ValueB ValueC
V1 Tombstone New ValueB
(key versions listed vertically)

If my understanding is correct this means that:

  • Beforehand each global version was immediate to retrieve, but each version of individual key was complex to retrieve (but this was feasible).

  • Now it is the reverse: each version of a key is immediate to retrieve, but a global version is complex to retrieve (and I am not even sure this is feasible).

I guess there are cases where the new structure is better adapted, but here I am talking specifically of previous AppendOnlyData features for which the previous structure seems better adapted.

An example is implementation of directories: With current safe-api we can display the listing of files of a directory at any specific version. I don’t know how you will be able to do the same with new structure.

Concrete example: display version 2, 1 and 0 of a directory:

$ ./safe cat safe://hnyynyiq7dzd63qo3a6hpzeckyzcxkiqzmrt4z5fmnuf13wm3hpj8epdkrbnc?v=2
Files of FilesContainer (version 2) at "safe://hnyynyiq7dzd63qo3a6hpzeckyzcxkiqzmrt4z5fmnuf13wm3hpj8epdkrbnc?v=2":
+-------------------------+------+----------------------+----------------------+-------------------------------------------------------------------+
| Name                    | Size | Created              | Modified             | Link                                                              |
+-------------------------+------+----------------------+----------------------+-------------------------------------------------------------------+
| /dummy.txt              | 13   | 2020-01-25T21:55:13Z | 2020-01-25T21:55:54Z | safe://hbhydydu1ty15s4s1p6iyh9whg8pqy1h3ro3aeiu6j9gbpo9mmp93nbgex |
+-------------------------+------+----------------------+----------------------+-------------------------------------------------------------------+
| /img/safe_logo_blue.svg | 5851 | 2020-01-25T21:51:49Z | 2020-01-25T21:51:49Z | safe://hbwynod35mhrikn8ohpdh5iiu6ohnw6up1zfiqhhkaey98txmoh79haqub |
+-------------------------+------+----------------------+----------------------+-------------------------------------------------------------------+
| /index.html             | 639  | 2020-01-25T21:51:49Z | 2020-01-25T21:51:49Z | safe://hbhybynbpbbdgaee46jbdc8qcsd5dm9igyw8ku77mfukiz7a7y69w8cdmj |
+-------------------------+------+----------------------+----------------------+-------------------------------------------------------------------+
$ ./safe cat safe://hnyynyiq7dzd63qo3a6hpzeckyzcxkiqzmrt4z5fmnuf13wm3hpj8epdkrbnc?v=1
Files of FilesContainer (version 1) at "safe://hnyynyiq7dzd63qo3a6hpzeckyzcxkiqzmrt4z5fmnuf13wm3hpj8epdkrbnc?v=1":
+-------------------------+------+----------------------+----------------------+-------------------------------------------------------------------+
| Name                    | Size | Created              | Modified             | Link                                                              |
+-------------------------+------+----------------------+----------------------+-------------------------------------------------------------------+
| /dummy.txt              | 6    | 2020-01-25T21:55:13Z | 2020-01-25T21:55:13Z | safe://hbhydydw9t33q47dgax19r935say134ox89zwgj1dx7bcmfyp541duqhfj |
+-------------------------+------+----------------------+----------------------+-------------------------------------------------------------------+
| /img/safe_logo_blue.svg | 5851 | 2020-01-25T21:51:49Z | 2020-01-25T21:51:49Z | safe://hbwynod35mhrikn8ohpdh5iiu6ohnw6up1zfiqhhkaey98txmoh79haqub |
+-------------------------+------+----------------------+----------------------+-------------------------------------------------------------------+
| /index.html             | 639  | 2020-01-25T21:51:49Z | 2020-01-25T21:51:49Z | safe://hbhybynbpbbdgaee46jbdc8qcsd5dm9igyw8ku77mfukiz7a7y69w8cdmj |
+-------------------------+------+----------------------+----------------------+-------------------------------------------------------------------+
$ ./safe cat safe://hnyynyiq7dzd63qo3a6hpzeckyzcxkiqzmrt4z5fmnuf13wm3hpj8epdkrbnc?v=0
Files of FilesContainer (version 0) at "safe://hnyynyiq7dzd63qo3a6hpzeckyzcxkiqzmrt4z5fmnuf13wm3hpj8epdkrbnc?v=0":
+-------------------------+------+----------------------+----------------------+-------------------------------------------------------------------+
| Name                    | Size | Created              | Modified             | Link                                                              |
+-------------------------+------+----------------------+----------------------+-------------------------------------------------------------------+
| /img/safe_logo_blue.svg | 5851 | 2020-01-25T21:51:49Z | 2020-01-25T21:51:49Z | safe://hbwynod35mhrikn8ohpdh5iiu6ohnw6up1zfiqhhkaey98txmoh79haqub |
+-------------------------+------+----------------------+----------------------+-------------------------------------------------------------------+
| /index.html             | 639  | 2020-01-25T21:51:49Z | 2020-01-25T21:51:49Z | safe://hbhybynbpbbdgaee46jbdc8qcsd5dm9igyw8ku77mfukiz7a7y69w8cdmj |
+-------------------------+------+----------------------+----------------------+-------------------------------------------------------------------+

And a directory is an AppendOnlyData:

$ ./safe dog safe://hnyynyiq7dzd63qo3a6hpzeckyzcxkiqzmrt4z5fmnuf13wm3hpj8epdkrbnc
Native data type: PublishedSeqAppendOnlyData
Version: 2
Type tag: 1100
XOR name: 0x5dd1dc7ecba19c7b8dba18a05d8f555d75923abecab14cb2cd179e35274346a2
XOR-URL: safe://hnyynyiq7dzd63qo3a6hpzeckyzcxkiqzmrt4z5fmnuf13wm3hpj8epdkrbnc
3 Likes

Aha, yes I see what you mean. (Edit: But your description is not quite correct, and your example will be the same with this RFC, see next post.)
There are ways to do it, from the top of my head this for example (using a different way of global version than I previously described):
The global version could be handled similarly as the key versions, as a vec, where every update to key(s) appends the keys involved and their new version.

0: { "A": 0 }
1: { "A": 1, "B":0 }
2: { "C": 0 }
3: { "B": 1, "C": 2, "D": 0 }

To get the versions of all keys as of a specific global version, I’d start at that version and move down towards 0. Every time I see a key for the first time, that version is collected.
The result will be all key versions as of a specific global version.

Very long-lived / active constructs would after a certain number (10k, 100k? Something like that I’d guess) benefit from a snapshot.

Edit: I’d add though that I’d probably rather see all of this managed in a db on disk, instead of using runtime constructs like this. Using a db would be the case I guess when moving away from storing these things into a chunk together with the data (i.e. in one file on disk). I think these things would be more efficiently managed and queried that way. We’ll see when we get there.

1 Like

This is not quite correct @tfa.

I did think there was something suspect with your representation when I saw it. Because I knew AD underlying representation is a single vec with an Entry struct with two fields key and value. Which is not what you describe.

But I was confused because I have not worked with safe-cli and safe-api, and I was on my phone and going to sleep. Now I’ve been at the computer to check this out though.

Summary

  • The functionality you describe is not changed with this RFC (upper layers simply keep doing the below described storage of FilesMap).

Because:

  • ADs are not structured the way you describe.
  • It’s a vec of key value pairs.
  • The version is the length -1 of the vec.
  • Every new global version has a single key value pair.
  • When the AD is used as a files container, the value of one of those is a serialized FilesMap (but that’s app usage, not part of the AD structure).
  • The key of that pair is not the keys we talk about in this case (it is not used here).

Details

cat cmd calls files_container_get with the supplied url (containing the version as parameter), there the content version and the xorname is extracted from the url. Then get_seq_append_only_data is called with this, and that calls get_adata_range with the version as start and version + 1 as end. That returns a single key and value, (from that vec of key-val pairs that I mentioned above).

The files map is then deserialized from this value.

You get the entire files map, as per your example, because it is serialized in its entirety into one single value under a key.

That single value is this:

pub type FilesMap = BTreeMap<String, FileItem>;

And further:

// Each FileItem contains file metadata and the link to the file's ImmutableData XOR-URL
pub type FileItem = BTreeMap<String, String>;

Conclusion

If we’re just replicating the functionality you describe, we’d not use a Map, we’d use a Sequence, and just append new FilesMaps to it. Voilá. Simple.
Because the key in the above example is not used for anything.

If you want to implement this in a Map instead, then that’s possible as well. But I think this suffices for this specific concern you had.

5 Likes

No, I don’t want that. I just want to know how Maidsafe will reimplement current features of safe api .

So, it seems to be OK for FilesContainer which was implemented with a PublishedSeqAppendOnlyData structures and can be reimplemented as a Sequence.

But the exercise should be done for every data structure used by safe api and the pros and cons of the proposed structures should be carefully assessed.

Another example to explicitly clarify is previous MutableData:

  • MutableData could be used for temporary data. This isn’t the case anymore with the new Map structure because hard_delete and hard_update commands don’t delete previous entries.

  • The new map structure isn’t adapted for previous UnsequencedMutableData where values are not versioned. Yes, it can manage this degenerated case, but it is a waste of resources for a simple standard map.

All these elements need to be consolidated in a more complete document, especially as:

  • In the past there has been numerous rewrites of the data structures.

  • Yes, current implementation uses adhoc structures, but they do their specific job well. As the saying goes: if it’s not broken don’t fix it.

Also, some issues you raised could be solved by minor modifications and not a complete refactoring with still so many unknowns.

One of these small improvements could be your idea of not forcing the caller to pass the expected entry version when mutating a sequenced structure. There is no need to break everything for that. For example, you could change mutate_entries function of SeqData to accept a general EntryActions instead of a specific SeqEntryActions (in case of UnseqEntryActions the function would compute automatically the right version of each entry).

2 Likes

It’s a bit drastic to say that that it isn’t the case anymore. Map can very much be used for temporary data. In the RFC it is listed that a key history can be deleted. Additionally, the entire structure can be deleted.

It is true that Map is not designed for allowing deletion of historical values. For a reason; we could extend the API with hard_delete_at and hard_update_at for example, but currently, that would introduce problems with versioning, and require a change to that design.

If you want to see that change, feel free to suggest a solution. If only the versioning is solved I think it would be a desired addition.

Also, it would be great if you wanted to give input on how you would like to amend and adjust things in the RFC, to support things you wish to see. As an RFC, there are many parts that are not fully covered, and you could help out by trying to figure out how the solutions in the RFC could be extended/modified. For example, deleting the key history, should it leave the last seen version? I would think yes, but it is good if everyone can pick things like that up and suggest a way to solve it.

Considering that the versioning is a per-request decision, there isn’t much else to do, than to keep track of versions on all these instances. I guess that could be called a waste of resources, but then so could practically everything else that requires resources as to achieve a goal. We want to achieve the goal of giving the user a per-request choice for concurrency control, that requires keeping versions.

Well, that view is a bit simplistic I’d say, it leaves a lot of room for interpretation of “broken”.
It is explained in the RFC what things are not good, and how we improve them. There are many levels on which things can be “broken”, that introduce different sorts of costs, to developers and to users. It can be runtime complications, it can be code quality etc. It’s not binary, that either it works or not (broken/not broken). All these things are on a continuous spectrum, and there’s always a cost and a benefit relation.

The scope of your evaluation seems to not include the things mentioned in the RFC. But these are important aspects, and there’s been a lot of enthusiasm for getting them addressed.


This is a very small portion of the RFC. But regardless; IMO it’s building more confusion upon an already confusing code base. Now suddenly the sequenced data type can also be not sequenced? This sort of mending is not something I’d do unless really forced to (due to for example not having access to the code base, or a big user base that relies on things not changing, etc.).


Again, thanks for your opinions and suggestions @tfa. I’m looking forward to hear your input on how various things in the RFC, that are yet not clear, could be more clear (and would be great if you have ideas for solutions for specific problems as well, if you are to find that).

1 Like

Changelog Data Types RFC

2020-01-30

  • Clarified Private | Public scope implications for Map.
  • Added description of possible extensions for deletion scheme in PrivateMap.
2 Likes

Just thinking out loud without much analysis and without trying to suggest this is a high priority thing to include, wouldn’t just work if instead of removing the value at version X it’s replaced with a tombstone so versions/indexes don’t change?

2 Likes

That’s basically what we’d have to do, if not changing the implementation of the versioning. But it would break the pattern that any such change bumps the key version. Such a historical delete would look different than a current delete, so that could be a bit messy.

3 Likes

Sorry but this is unclear in the OP: you say that PrivateMap allows for deleting a value and permanently removed it from the network with hard_delete function. But then you have statements that contradict this:

(“virtually” adverb indicates that the key isn’t physically removed from network)

and even more explicit:

In contrast, current MutableData is clear:

  • Each key has only one value which can be updated and no history is retained

  • Each key can be deleted (together with its unique value)

This structure is useful for database like implementations or temporary data and it is still unclear to me if you intend to preserve this possibility or not.

Also the presence of MAY verb at many places in the RFC doesn’t help assessing which features currently implemented will disappear.


I was talking of UnsequencedMutableData which doesn’t have versions

UnsequencedAppendOnlyData and SequencedAppendOnlyData can be merged in one unique structure (AppendOnlyData). The user always acts at the end of the sequence, he just chooses if he wants to pass the expected version or let the network do the action whatever this version is.

For example a forum implementation might need both. Suppose that a topic is an AppendOnlyData:

  • a user creating a post will typically add it at the end of the topic without constraint on the version

  • but sometimes he might want to add it only if the version hasn’t changed, for example posting “FIRST…” in a Dev Update only if the version is still 1 (to be listed just after the OP or not listed at all).

I don’t understand why you argue about this feature because it is included in your Sequence structure. The difference is that my proposal is a much smaller modification because it uses existing SeqEntryActions and UnseqEntryActions structures and so callers are much less impacted because they already build them.


I was talking of UnsequencedMutableData which don’t have versions

Versioning is done by SequencedMutableData. This per-request decision could be implemented right now by passing a generic EntryActions instead of SeqEntryActions to mutate_entries function of SeqData structure. It is an even smaller amending than AppendOnlyData because UnsequencedMutableData and SequencedMutableData are not merged (because they can’t)

1 Like

It doesn’t contradict because it refers to another part of the text, where it says a key can be deleted together with all its history.
I’m sorry, not quoting because repeatedly it seems that you don’t put much effort in actually reading the OP.

In every post you have been leaving out information and focusing on something out of context.

There is also code in safe-nd where this is implemented. I know you do some rust coding so it is well within your capabilities to read it there (I wouldn’t ask it of someone not having done some rust coding).

Your discussion about MutableData doesn’t make sense, and I’m sorry but I don’t find it very productive, it seems to just be harping on, on new tangents for every post.

It is clear you don’t want change. But your arguments have not been nuanced, talking of “breaking everything” and leaving out information as to make claims that is not true.

It’s getting tiring you know.

If you can try be more productive we can continue this discussion.

I don’t necessarily read @tfa’s comments as such. My perception is that he is scrutinizing in good faith based on his viewpoint, understanding, time and energy. It’s also good to see your clarifications since it improves everyone else’s understanding too. I think it is also helping to prove out how the refinements can in theory match the same capabilities as before without a cornucopia of different datatypes.

3 Likes

Very probable, but I’m simply informing that the approach is getting tiring. This can be done in many ways, the specific way chosen by him here is not the only one.

I have other things to prioritize, have been subtly hinting that it’s nice with opinions but please try nuancing, the hints didn’t seem to be registered so went for a more straight forward message :wink:

It has been leaving out information, and still very rigidly claiming (incorrect) facts based on that. It is resource intensive to communicate with people wielding that approach, since the onus is then on me to publicly correct that. If there is not enough effort spent on trying to minimize such misunderstanding, then my energy is better spent elsewhere. Simple as that.
(The possibility that a certain portion of the audience finds the resulting discussion informative, does not make it less resource intensive).

I detect, that the energy spent on (or ability to) trying to minimize misunderstanding, is too low in cases when someone for example is very binary about things and do not register or respond to the appeals for nuancing and reasoning.

Now, this post here is another waste of energy really. I hope people understand, and do not keep harping about that.

2 Likes

The problem is that I had in my mind that hard_delete was the operation that deleted the key. Some elements in the OP strengthened this idea in me and some others not. Hence the contradictions.

Also, both delete or hard_delete are in fact soft deletes, so maybe hard_delete naming isn’t helpful.

Your repo mentioned in second topic isn’t accessible (https://github.com/oetyng/safe-nd/tree/datatypes-refinement). For whatever reason I didn’t see any safe-nd repo in your account (maybe I was blind or maybe it was a private one). Now I see one. I will study it to better understand what you propose. There are 2 branches:

  • replace-ad-with-sequence which is more recent but has less commits ahead
  • datatypes-refinement-demo which is older but has more commits ahead

Which one is more in line with what you propose?

I wanted to explain that MutableData (Sequenced case) could be slightly modified to implement the per-request decision about passing expected version or not. But I agree I did it very badly. I will come later with better explanations (perhaps with some code).

5 Likes

datatypes-refinement-demo

This is a demo (not for merging) with all changes in one, so there you will find code for Map. Note that it is not reflecting all of the RFC, and vice versa, the RFC is not reflecting all of the code. It is WIP moving towards convergence. (For example, delete key + history is only mentioned briefly in RFC, but not put into code, and the order with which operations within a commit is applied, is also to be refined, among things).

The branch replace-ad-with-sequence is the first of these opened as PR, which implements only Sequence (and the permissions changes that it relies on). It’s intended for end-to-end implementation over all repos, and confirming test cases, before moving on with the other types.

Aha, will update :+1:.

Yeah, good point. Open to suggestions.

Super, much appreciated!

6 Likes

This post is about current Mutable Data model and proposed Map model.

I have taken the example of a a key named “MyKey” mutated 9 times with values “MyVal1”, “MyVal2”, … “MyVal9” and drawn diagrams showing the resulting data structures to compare the pros and cons of each of them and the improvements that can be added to them.

Current Unsequenced Mutable Data

image

(“…” stands for other keys in the data structure)

This data structure is useful for database like implementations where keys can be added, updated and deleted without consuming much resources in the network because history is not retained, which also means that it has necessarily a private scope (= unpublished in current terminology).

I have nothing to say about it, except that it should be retained in the new version.

Current Sequenced Mutable Data

image

This data structure adds a version number to each entry value. Same here: as history is not retained it consumes few resources and has a private scope. Two improvements could be added to this data structure:

  • Currently the user wanting to mutate an entry must pass the expected version and the mutation fails if this version was changed by another user when the request reaches the vaults. Some use cases need this constraint but some others just want to update current value whatever its version.
    The user should have the choice to decide this in a per request basis and this could be easily implemented by changing the mutate_entries fn in mutable_data.rs. I have done a PoC of it in sn_data_types/mutable_data.rs at 90e1485ca835b2f865bffbd581b971f54b04dba2 · Thierry61/sn_data_types · GitHub (partially implemented).

  • If a key is deleted and recreated, then the new value will restart with version 0. To correct this we should replace the value (currently a Vec<u8>) by an Option<Vec<u8>>. The None enum value would be inserted when the key is deleted (instead of just deleting the whole entry).
    Note: In this case the diagram should be updated by replacing “MyVal9” by “Some(MyVal9)”.

Proposed Map - Public scope (and Private scope in Commit mode)

image

The strength of this data structure is that it allows the operations provided by current Mutable Data, but also in a public scope, which covers a deficiency. The trade-off is that history is retained and so it consumes a lot of resources when entries are often mutated.

One remark about implementation: the expected version number passed to mutation elements and the stored value are both managed by specific enums and could use standard Option enum instead.
Note: In this case the diagram should be updated by replacing all “Value(MyValN)” by “Some(MyValN)”.

Proposed Map - Private scope in Hard Commit mode

image

I don’t see the added value of this data structure because it doesn’t retain history but it consumes a lot of resources (the worst of both worlds!).

Current Sequenced Mutable Data provides the same feature in a less costly way. Unless I am missing something, the only thing that it has but current MD don’t, is the version not reset to 0 when a key is deleted, but this feature is covered by one of improvements I propose above.

Conclusion

To not lose existing features of MD and gain the new proposed features we should:

  • keep current Unsequenced Mutable Data
  • keep current Sequenced Mutable Data with the improvements I propose
  • add the new proposed Map in standard commit mode for both public and private scope
  • get rid of hard commit mode in proposed Map

I hope this will at least show visually and more concretely what is proposed and what is at stakes, specially the possible loss of Mutable Data (just compare first diagram to last one and imagine the same with thousands of updates instead of just 9).

Also showing things visually may help naming things. For example we can see that current MD resembles more to a map than the proposed Map itself.

11 Likes

Nice!

I will respond quickly to a couple of things for now:

Commit and HardCommit are not two modes, but two APIs available in Private scope. So when you say you don’t see added value of Private Map in “hard commit mode”, that is because there is no such mode. You can commit and hard commit however you like on the same instance. One is for stepping to next version, the other is for stepping as well as deleting previous value.

The example with a long range of Tombstones is an extreme example, when using only one of the APIs. Many cases would be mixing. Some would not of course. But maybe we shouldn’t base assumptions of usability on the extreme case? (Sure merits more looking into, but to me doesn’t seem fair i.e. like the best average use case to compare with. )

“Consumes a lot of resources” is a vague statement. Storage is cheap nowadays. Information is worth keeping.
As the eventsourcing nestor Greg Young says; when you overwrite that field in a SQL database, how often have you made a business decision that this data is useless in the future? How often do you know what you need to know tomorrow about what happened today?

What you see as a cost I see as one of the biggest strengths, a Map which has history for its keys. Immensely useful in my world.

What I think would be better to focus on, because I don’t think the extreme case with a long range of Tombstones is perfect either, is how that can be optimised. I think it can, and that it is a smaller part of the overall idea. So, how to compact that, it doesn’t seem like an overly hard problem.

I’m on phone, so not a comprehensive answer, but wanted to address those things quickly.

3 Likes

I think this is not a correct representation for how databases more and more are designed like. They are more and more retaining information.

I think most of the critique is boiling down to the concern of consumed resources.
Resource usage is fundamental, but I see this as a bit of premature optimization, because these things are part of upcoming ideas that we have not yet gone into. I.e. how data is stored to disk. There’s a lot of optimization that is done there.
For example, how long ranges of same value are compacted to for example
0-7: 0
Where 0 is the encoding for Tombstone, and 0-7 says that on index 0 to 7, there are tombstones.
See, that is not taking much space.

There are a tonne of these things available (people have worked with these things for many many years), so generally, I don’t see that as a especially concerning problem.

Now additionally, the concern with consuming resources, seems to rest in an assumption that (generally) hard disk space is more valuable than information, while I’m pretty much of the opposite view (generally).

That said, I am sure there’s more that can be done with the in-memory representations, to make them even more efficient. But in all honesty, I thought that was OK to save for later.


I think I have addressed most of what you mention by this (the option instead of enum I’ve thought as well, there are pros and cons, but it’s an utter detail).

The one thing maybe left for me to ask you at the moment, is if there are other reasons you’d like to remove hard_commit, than “saving space”?

And also, I want to thank you for this thoroughly worked through contribution. That’s nice to see.

2 Likes