Syncer: a caching FUSE based filesystem in Rust

Your approach stimulated more thinking so just scribbling as I go…

Maybe the SAFE API could include a way to signal locking of a FilesContainer to simplify keeping things in sync. I guess it may not be necessary with versioning, but don’t know enough to think properly about that.

Then while updates are being made locally, the SAFE FC could be locked until it has been updated, and the version checked to see if it has been changed before locking for any further local changes.

I guess there are many ways to try and skin this cat. I recommend looking at the syncer code. It looks well designed to me, and by someone who knows how to do this stuff, so fairly easy to understand even for a non-rustacean.

4 Likes

I have been tasked with other stuff for now but my first thoughts were to see where this might lead us and to wait for some consensus before spending any more time on SAFEgit,
I am deep in project documentation right now but later tonight I will try and build from that git repo as you did earlier today.

4 Likes

If you have anything useful on safegit please share it and I’ll take a look. Might not be much effort to finish - I just can’t test anything unless I set up a local network which I’d rather not spend time on.

2 Likes

A thought in favour of caching SAFE chunks locally but in Syncer like system is that the content will be encrypted on the local system.

The metadata could be cleared at any time, but rebuilt from the chunks when the user logs in and mounts the FUSE drive, or on demand as the mounted Files Container is accessed. So the chunks could be safely kept on disk, but metadata held in memory (similar to your proposal) and deleted when the device suits down.

Am I daft, or would it be quite easy to add the ability to SAFE libs to cache certain chunks locally - all chunks related to a given FilesContainer for example?

4 Likes

Certainly this is possible and encrypted. For extra safety it would be wise to encrypt the whole local cache as well. I am still thinking about this one, seems feasible and unlikely to reduce security except perhaps knowledge of last time safe was accessed?

7 Likes

Thanks David, I was concerned about security of SAFE activity and also thought it would be wise to at least make the cache an encrypted area. I’m glad it isn’t a significant issue.

I think as option 1 looks very straightforward I’ll try that. As well as being simple to get working, I think it would work really well as a backup system that preserves every version of every file within moments of any mutation. People could use it like an extra big drive, or copy/rsync to it to do a backup. If anyone wants to do this for themselves, by all means jump in and I’ll support you. I have plenty of other things to keep me happy and busy.

Also, if anyone wants to dig deeper into ways to support a SAFE-files-compatible syncer (option 2) I’m keen to help. Not sure I should lead given zero Rust skills and so many other pies with my finger holes in them!

If nobody else picks these up I may think again, but there’s so much I could be doing, so if you fancy having a go, be my guest and let me know if you’d like my input!

@danda quick question as I’ve forgotten: is there a limit to the size of a FilesContainer or the number of entries it can hold? I’m hoping I can just store all the syncer nodes and blobs in one container and forget about it at least for now. Probably not good in the long run, but I’d like to know if there are any hard limits.

Update - rsync features needed for option 1

The following rsync features are needed from safe files sync in order to be able to modify syncer to use SAFE CLI to implement its backend/remote storage.

All rsync commands begin:

rsync --quiet  --timeout=5 --whole-file 

Variations add the following parameters in the given order…

Single file transfer:

file directory

Multiple file transfer:

file1 [... fileN] directory

example:

rsync --quiet --timeout=5 --whole-file /some/path/file1 /some/path/file2 remote:/data/blobs/

Recursive transfer:

-r --exclude=metadata* directory1 directory2

@danda can you confirm correct the above are supported except as noted here:

--quiet  # I don't see this, can we have it?
--timeout=5 # Not supported, could be useful?
--whole-file # Probably not applicable so no problem
-r --exclude=metadata* # Note the wildcard '*'. I think I can test without that so no hurry but I think this is a very useful feature (and can be specified multiple times).

Just to confirm, can we handle the multiple files example? This would be essential so if not is a blocker for now. I don’t think anything else is a blocker.

6 Likes

@anon57419684 you know Rust, correct? Not trying to volunteer you but not sure if you’ve seen this thread yet.

1 Like

Unfortunately none of the files commands (including sync) support multiple file arguments at present. I would like to add this support because it is necessary for bash brace expansion eg photo{1,2,3}.jpg to work. But it will be a fair amount of work to implement.

You can of course call files sync once for each file.

Also, sync does not presently have an --exclude option.

5 Likes

I have syncer init working with a local SAFE testnet. It seems to have initialised the SAFE container as expected, so I’m a bit confused because I’m also seeing this an error from the safe CLI in the console.

As far as I can see everything has been completed ok (listing the syncer content on safe), but it looks like the ‘safe files sync’ is returning an error code to syncer, as well as issuing the error to the console.

@danda if there was a parameter error like the one below, could the safe files sync still have uploaded to the container or would it abandon before trying? And should thesafe CLI return values properly for use by a script?

I’m a bit confused by the console error because the error message suggests the safe files sync is being given a safe URI with /data/blobs/ at the end which I’m not expecting. I need to find a way to see exactly what is being passed to safe CLI!

Good progress though!

syncer init ~/.syncer-test "safe://hnyynyib9mr43r7cthdodtci7kur7rgkc7ayzdkjax1j5dse451d3ft1wgbnc" 1000
FilesContainer synced up (version 1): "safe://hnyynyib9mr43r7cthdodtci7kur7rgkc7ayzdkjax1j5dse451d3ft1wgbnc?v=1"
+  /home/mrh/.syncer-test/data                                                  
+  /home/mrh/.syncer-test/data/blobs                                            
+  /home/mrh/.syncer-test/data/blobs/082ad992fb76871c33a1b9993a082952feaca5e6  safe://hbyyyynwwg1cyjt979cdykr3xz3hmichw7ekqk9hsbcqa4mhuf3x8481hh 
+  /home/mrh/.syncer-test/data/blobs/675e110cbd20023c206bea2a1788c8ab304a7a5d  safe://hbyyyynmme1ei9wisfd9fnnpiqxxtam7se97a4dab9h61mabdpuf771i4t 
+  /home/mrh/.syncer-test/data/metadata.sqlite3                                safe://hbyyyyn3ay4wu4rowxhizia4sxhpjsq319ip758maken19kbyewoo7mny9 
+  /home/mrh/.syncer-test/data/metadata.sqlite3-shm                            safe://hbyyyyn7jpqz5nmbxtq8bsmfpu8iorbxu3owwgbp89fpatpyp3a159fhpn 
+  /home/mrh/.syncer-test/data/metadata.sqlite3-wal                            safe://hbyyyyd8yj4xgo6o4ok9zhmqzjsgwxefwsgr4yjfddqd59rywcni13xdra 
+  /home/mrh/.syncer-test/data/nodes                                            
error: Found argument 'safe://hnyynyib9mr43r7cthdodtci7kur7rgkc7ayzdkjax1j5dse451d3ft1wgbnc/data/blobs/' which wasn't expected, or isn't valid in this context

USAGE:
    safe files sync [FLAGS] [OPTIONS] <location> [target]

For more information try --help

I can probably work around this but will need to write some Rust :dizzy_face:

I can test without this but think its a useful feature for the todo list.

4 Likes

afaik, parameter validation is done before the operation begins. Is it possible safe-cli is being invoked twice?

I think you need to find where the safe cli command is being invoked and print or log it.

1 Like

I’m trying but my lack of Rust is in the way. Can you modify this code fragment to output the command line before it is run. I’m trying to figure it out but am struggling to understand the docs around Debug, fmt etc.

pub fn run(&self) -> Result<(), Error> {
    for _ in 0..10 {
      let mut cmd = Command::new("safe");
      cmd.arg("files");
      cmd.arg("sync");
      
      // cmd.arg("--quiet");
      // cmd.arg("--timeout=5");
      // --whole-file is needed instead of --append because otherwise concurrent usage while
      // doing readhead causes short blocks
      // cmd.arg("--whole-file");
      cmd.args(&self.args);
      match cmd.status() {
        Ok(v) => {
          if v.success() {
            return Ok(())
          } else {
            continue
          }
        },
        Err(_) => {},
      }
    }
    Err(Error::new(ErrorKind::Other, "safe files sync failed"))
  }

I think there must a second safe command being attempted that I was not expecting. If I can echo the command to the console I’ll have a much better idea what is going on. Thanks.

No worries, gotit:

println!("safe files sync {:?}", &self.args);

This appears to be repeating the command 10 times. Is that desired?

Can you modify this code fragment to output the command line before it is run

You can try println!("{:?}", cmd);

see: https://stackoverflow.com/questions/61154481/how-to-get-the-command-behind-stdprocesscommand-in-rust

2 Likes

It only repeats if the command fails so I guess its crude error recovery! Thanks for the tip. I can see what’s happening now. :grin:

Not surprisingly the error is because syncer is trying to sync multiple files at once. I was just not expecting it to do that immediately after having synced recursively. I’ve missed something in the code because I didn’t see where that is happening.

Anyway, looks like in theory I can get this working and then try mounting the FS.

3 Likes

I’ve got a little further, making a tweak that I hope means I can manage without --exclude but think I’ve hit a more fundamental limitation that will go a bit far beyond my Rust ability.

It looks to me as if you cannot safe files sync to/from FilesContainer subpaths. Or maybe you can only sync a local directory to a FilesContainer but not the other way around. Can you confirm if these are or are not supported?

I’ve tried both with XOR URI and NRS URI and get the following errors:

Problem using xor URI:

"safe" "files" "sync" "-r" "safe://hnyynyib9mr43r7cthdodtci7kur7rgkc7ayzdkjax1j5dse451d3ft1wgbnc/data/nodes/" "/home/mrh/.syncer-test/data/nodes"
[2020-06-28T18:51:56Z ERROR safe] safe-cli error: [Error] InvalidXorUrl - Problem parsing the URL "safe:///home/mrh/.syncer-test/data/nodes": missing name

Problem using NRS URI:

"safe" "files" "sync" "--update-nrs" "-r" "safe://test1/data/nodes/" "/home/mrh/.syncer-test/data/nodes"
[2020-06-28T18:57:20Z ERROR safe] safe-cli error: [Error] InvalidXorUrl - Problem parsing the URL "safe:///home/mrh/.syncer-test/data/nodes": missing name
3 Likes

At this time, safe files sync only syncs local files to remote FileContainer, not vice-versa.

A rough workaround could be to download all files with safe files get <url> newdir, then rename/replace or rsync newdir to olddir.

5 Likes

I’ll have a think it that’s feasible for Syncer, thanks.

What about sync from/to subpaths?

It might be better to spend my time adding CLI support for these features rather than crowbarring Syncer to get around them, but I’m not sure I want to spend the time it would take given other projects.

Might be a nice change though to learn some Rust. I’ll probably look at the CLI code and think about it. Are you the goto guy for CLI stuff? If so how hard do you think it would be for me to have a go at these kinds of enhancements given I’m a Rust novice, is it a no-no? (I’ve written a lot of code in the past, C++ is the closest to Rust. Wrote a lot of that before it became a standard, so without later features).

2 Likes

I looked at adding a Java based FUSE wrapper for Safe NFS a few years back. I got basic read only support working using the local REST client. I know it has all changed completely now, so I’ve not touched it again. Once the APIs stabilise, I’d like another look.

IMO, it is important to have a standard interface at the safe library level. The implementations for how this is extended for FUSE, caching and so forth can then be done later, by third parties, etc.

I’m afraid I’m not up to speed on what the safe libs currently support. I hope it is moving in this direction though. Layering up functionality likes this breeds a community of developers and their apps.

3 Likes

Option 1 update

I have syncer running and initialising its storage with SAFE using the CLI, but there are a couple of problems. These require some enhancements to the SAFE CLI, a bit beyond my Rust skills at the moment. I’ve raised an issue for one (#589) and given a +1 to another (#512), which should enable me to get syncer going so if you have Rust skills feel free to have a go. I am reading up on Rust and if I think I can do this I’ll have a go at one myself.

It would also be useful to be able to:

  • update multiple files specified on the safe files sync command line, though this is not essential and works by issuing the command multiple times instead (SAFE CLI issue #600)
  • suppress console output with --quiet (SAFE CLI issue #601)
6 Likes

Syncer Update

I’ve paused on Option 1 to investigate Option 2 while learning Rust which may enable me to have a go at either adding features to SAFE CLI to support Option 1, or to maybe try Option 2 which is much more tricky but would probably be a better solution.

I’m in the process of making a lot of notes about how Syncer works, and want to share the latest version of a diagram which summarises the syncer architecture:

6 Likes

I had something about this 2 years ago. We just gotta store the cache encryption key on the Safe Network:

5 Likes