EPHEMERALITY and data-persistence?

Hi, and thanks for your reply. I’m just diving into some other posts containing webrtc.
Just a question though : the following url http://webrtc-security.github.io/ does seem to point to certain weakness in webrtc’s security setup, such as

4.3.5. A Weakness in SRTP
SRTP only encrypts the payload of RTP packets, providing no encryption for the header. However, the header contains a variety of information which may be desirable to keep secret.
One such piece of information included in the RTP header is the audio-levels of the contained media data. Effectively, anyone who can see the SRTP packets can tell whether a user is speaking or not at any given time. Although the contents of the media itself remains secret to any eavesdropper, this is still a scary prospect. For example, Law enforcement officials could determine whether a user is communicating with a known bad guy.

Is the fact we’re using webrtc over SAFE make that concern superfluous ?

Any reason why having preferred webrtc over ZRTP f.ex. ?

Anything decided concerning the audio-video codecs : opus, av1 ?

Also pondering whether it’d be useful to use the DTLS layer of webrtc for direct p2p file transfer ? Could this solve the problem of deleting data on SAVE vaults once there put out ? As far as I understood, only the datamap can be deleted, not the contents itself. Any thoughts on that ?

1 Like

Yes, the more worrying part is signaling. So normally you use NAT traversal servers. We remove those servers. This means we will not be using STUN/TURN as per those specs, but instead, use our own encrypted versions of a slimmed down version of those specs.

No, none at all, we just wished to show a secure person to person communications. The webrtc is an example and for sure we can extend that. I have not used/read up on ZRTP so always keen to know more.

No, as of yet this is likely a client-side choice, although I would love to have as much client interoperability as possible.

It’s a double edge thing, if you want to transfer a file then its easy to transfer the data map. So the content is already stored. If it is a file it is likely you have it (to send it) and therefor deduplicating the storage is OK. I am not sure these files will be temp, but more valuable than that, so stored?

4 Likes

Thanks for clearing that up. I was wondering about the codecs, because of bandwidth usage. Somme apps (like LINPHONE) have multiple audio codecs available with different bitrates, adapting to the bandwidth available.

For meshing ZRTP with webrtc : https://tools.ietf.org/id/draft-johnston-rtcweb-zrtp-02.xml

https://secushare.org/ has a mesh-like E2E approach for the Gnunetwork. They developed CADET, a new transport protocol for confidential and authenticated data transfer in decentralized networks. This transport protocol is designed to operate in restricted-route scenarios such as friend-to-friend or ad-hoc wireless networks. Maybe it’s worth to check it out ?

Could you elaborate a bit more about the editing possibilities for mutable data ?

Can a user delete from the SAFE network not only the pointers to the scattered chunks of the data (datamap), but the actual contents itself (all its 8 copies of it) ? If so, I admit it’d be a huge relief for me. This could be done manually, or, as with some IM apps (and even secure webmail, as protonmail), with a built-in “time-bomb” feature : expiration time of the message or Burn-on-read time (BOR), before the data gets send out.

Secondly, as I stated, some IM apps even allow for recalling sent data from the recipients devices once sent.
This is really fine-tuning the ownership of your information and data, and it just seems so fitting to have these options in such a AIO solution as SAFE. Thanks for any comments on that.

2 Likes

No for immutable data. That is part of immutable data specifications. Only the datamap can be deleted. Once that is deleted then it is impossible to retrieve that immutable file. For one there is no reference to the chunks and thus they cannot even be found then there is no self encryption keys to be retrieved. Easier to find a 1" piece of string in a 1000 foot high hay stack then find any of your chunks.

Now if things change then whatever.

1 Like

OK, and how about the mutable data ?

I’m also wondering how XORing voice and video chats through multiple, yet geographical potentially remote, hops will affect latency …

2 Likes

You can add, delete, edit, create

The APP you use will do these for you. Obviously you choose the APP that suits your desire. There will be so many APPs the problem will be choosing

that’s for sure ? I can delete both the container and not just the datamap?

Another possibility for enhancing webrtc security would be PERC (Privacy Enhanced RTP Conferencing)https://datatracker.ietf.org/wg/perc/about/ and https://fr.slideshare.net/alexpiwi5/perc-webrtc-e2e-media-encryption-with-sfu

2 Likes

I have just this weekend been facing a situation where I’m wishing for a deletable data, when designing a reliable dictionary over SAFENetwork.

The problem comes from our use of such structures today, where we store projections in them. The projections are of arbitrary size, and are fetched and stored again for every incremental change in the form of domain events. So what that gives is that we have almost identical mass of data, with just a small change on a property, that would require an entirely new ImD.

There is MD, but they have fixed size. The ImD will allow arbitrarily sized data storage.
I guess I could craft a way to split data up over n MDs, but I think it’s fair to say that it wouldn’t scale well.

So, maybe this is not a problem at all, considering data storage capacity increase, but at a certain percentage of network applications using this kind of storage there will be a noticeable effect. We would see a duplication of data at the same rate as new events coming in (minus the occasional deduplication if the projections are simple).

We can assume it won’t happen, (other types of applications will be more common) do some estimations, but it is hard to predict how data storage solutions will be used, if there are no impeding direct negative effects on user that limits such use that would not be beneficial for the network.

I would just like a way to solve this problem without creating new (sometimes) useless data at an insane speed.

And then there is the philosophy of not removing data, which I am very keen on too.

4 Likes

Back in the day when storage was expensive and at a premium and a lot of it on Mag Tape we also had to devise ways to add/change small bits to large data on tape. Actually people did something similar in books where you could not reprint in an instant (margin notes in pencil/pen)

While Tape is not immutable data it was very expensive time wise and machine time wise to be rewriting whole tapes because you add a word in the middle of your data.

So what was used for say your dictionary is to store the current “known” facts on the tape and then have a secondary store for amendments on a fast storage (eg Drum/Disk/DECtape and similar to say MD). So then the application would look up the tape first then check for amendments on disk/drum/DECtape in order to create the actual record.

In your case would say storing your current version of the dictionary in immutable data then as changes come you have MDs to hold the changes. You could create the MD address loosely from what record is being changed so that you are not searching linearly through a hundred MDs but only a few.

Then when the time is right create a new immutable version of the data(base)

4 Likes

I thought this was only an alpha network limitation?

Nopes. From RFC

Maximum size for a serialised MutableData structure must be 1MiB;

(Better source would be the source code due to the RFC being out of date).

Other things have changed (entry count raised from 100 to 1000), but size is still fixed for Mds.

1 Like

That is upto, not fixed sized of 1MB

1 Like

Yes that’s what I meant. Fixed upper limit.
Sloppy formulation.

1 Like

This is tied to the chunk size that the network uses, right? ie. In order for it to be easily mutable, it must stay in one chunk, which is controlled by a single section.

2 Likes

I was thinking about the case of custom classes with any number of properties, whose type can also be such an object. So the level of nesting can be any (even though at some point the code smell will become noticeable).

Writing such an object to MDs, where you need to unnest and both try to fill each MD as well as not exceed size limit will be OK with small objects but quickly grows in operation size.
Then if there is some part of the object that is a single data chunk larger than 1 MiB, you need to device a way to split it.

EDIT: (This is what I mean with that it wouldn’t scale so well.)

It just seems overly complicated to me (also the reading of such a designed data source), when immutable data solves it.
The downside of course, is that you have no choice but to eternally commit that data to the network, regardless of your needs or intentions. That lack of freedom is there, and it would be nice with a tad bit more freedom with regards to this.
I mean, it’s not that it would hurt me to commit it eternally, but if I am producing loads of data that I need secure and replicated and globally available for a while, but that it afterwards is pure junk, then I have no other option than to burden the network with the junk. I personally would like a way to not burden the network, even if it would be without additional costs for me to do so.

1 Like

Same thoughts here with respect to scientific computing. Neo has a good idea for temp data attributes…

1 Like

Yes exactly.
It might produce very large datasets, could be that only afterwards it is known what would be valuable to keep, if any. And in the meanwhile massive storage is needed, and replication because it is potentially very valuable, panavailability because of worldwide collaboration over the data, security and anonymity because the data could be highly sensitive.

All of this says: Use SAFENetwork.
Except that thing about only a tiny fraction being valuable to keep in the end.

Would it affect the network? Some estimations could surely be done with some digging.
But this kind of behaviour would be wasteful regardless, that is for sure. (Assuming we utilise immutable data for storage).

But then again, using purely MD’s for it would also be possible. Just needs some more preprocessing clientside. I haven’t looked closely at the ImD implementation and flow, maybe the same kind of operations is needed, since all is based on chunks, just that it is managed in network instead of client side…? Or maybe the Md abstraction imposes a significant cost?

One way this used to be done was chaining and easy enough to apply generically to MDs. Methods to work with limits on storage abounded in the days when main memory was measured in KWords cache was a drum of a couple MWords and each disk in <40MWords

So your data object could be of variable size since you want to allow for data types that contain blobs of no fixed size, would be a series of MDs.

The first MD of the data object would have an address (probably stored in an index scheme) then the address of the second is stored in a field of the first MD. The 2nd and thereafter MDs do not need a set address or tag type since they are linked to from the one before.

In Fact you could have the 1st MD also contain links to (addresses of) all the chained MDs and if you have too many then chain that too.

Thus the first MD of the data object contains fields. And if more than one MD needed then the 2nd and subsequent ones are chained. The data object starts on the 1st field after the links (addresses). And you could even have the elements in the data object as fields too and include an index of where each element is after the links (addresses) but before the 1st part of the data object.

1st MD of single data object requiring “n” MDs with “x” elements

  • field 1: address of next MD
  • field 2->n: (n-1) addresses of MDs 2->n
  • field (n+1) → (n+1+m): addresses of MDs containing the start of the corresponding element
  • field (n+1+m+1): 1st element of data object.

And when there is not enough fields and/or space continue the fields in the next MD. Each MD has the 1st field as the link (address) to the next MD in the data object chain.

This way the indexing of your data objects only points to one MD for each data object and then the data object is its own series of MDs with self contained indexing.

2 Likes

Definitely time you got coding @neo :slight_smile:

5 Likes

It’s great to read about how used to be done (like your other post here before, good stuff!). The technical limitations really demanded great skill.

About this MD stuff, I’m aware of the possibilities to do it like this, I use a similar scheme in my SAFE.Datastructures, for some parts. But there chaining is for stacks and queues, where it’s inherently expected to work like that, and you don’t access all MD’s in a chain in one request (as you would if it was fragment of a single object), rather one at a time, when needing the next object. The other case with tree structure is faster for large objects but there I use immutable data in the end anyway as it is a keyvalue store and it’s simple to not have to analyze the value to be stored and split it up etc.

My concern is the performance as compared to using immutable data. (The additional complexity is OK, someone will do a wrapper in every language.) But I don’t know yet if my concern is unfounded.

1 Like