Farming API for statistic app

Hi, what kind of API can we expect to extract information from our vaults? For example I’d like to make an app that gets a message whenever my vaults gets a GET request. Whenever it gets a GET request and receive a reward. Whenever it gets a new chunk to store. Percentage of storage used, etc. What are the plans to support this?

1 Like

Soon you will have very easy access to the vault app and source in a way you can see right into at. This will help, but I am not sure you want to see all the get requests :slight_smile: all going well anyway.

I understand we will have access to the source code and we can modify it to build a custom vault that would do that but I’d rather not dive into the vault code, I probably won’t have the skill to pull it off anyway. So I’m trying to see what would be the best way to build analytics and statistic of a vault performance without having to rebuild the source code. So allow a few more question:

Is the vault client UI interface a separate application than the vault itself? If yes how are they communicating together? Does the vault act as a server and the UI a client that somehow connects to it?

Does the client UI interface provides statistic and analytics of a vault performance? Can the client UI interface know when the vault generates a Safecoin?

Does the vault generates log files of its activity? Something we can parse to get an idea of what’s going on inside?

Thanks in advance, can’t wait to see it all comes alive!

If there’s no such API, you could insert a callback into the vault-reading app, maybe just a few lines of code.
That kind of pull request shouldn’t be too hard to get accepted.
That would then ping your app with more details about the nature of the request.

Interesting, this would cause different vault behaviour like putting probes in a human brain, it will affect the vaults logic. So we would need to see how many folks wanted this. We seen with visualisers even tiny outputs will affect logic and speed etc.

I would say this would be a special vault that would earn less than normal actually, but worth chatting about (after we get back on track with launch schedules)

Speed I understand if there is a big volume of request it could slow down the vault but how would it affect logic and behaviour? It sounds like quantum physics where it changes its behaviours the moment you try to mesure it :slight_smile: , not trying to insult you, just trying to understand :wink:

No insult man, never :smiley: Check this out for a good backgrounder Heisenbug - Wikipedia
Even tiny debug log outputs alter behaviour, debugging symbols etc. and when you have crypto you want to avoid timing attacks and a bunch of other stuff. So this is a huge area, human always want to know everything but sometimes what you end up knowing is not what would have happened if you did not measure if you see what I mean. Even async some of this wil affect performance etc. Not all impossible but easy to make a wrong decision.

I have seen similar projects even show globes of where the users are (in a privacy network, it’s not great to say the least, like our visualiser, looks great good to early debug logic (to an extent), not good to roll out). I removed a debug log output a few months back and the system went 10 times faster so yes there is always a price to pay. It’s just not obvious and the instant thing people want is more info, sometimes you just need to accept the ant turned left as opposed to right for many different reasons and we cannot measure them.

Kinda philosophical, but there are real issues adding stuff that feels good, but very easily we can interfere with the system itself. I am in no way dismissing it though, don’t get me wrong, I am just saying the edge cases are sometimes not known when we see a ton of benefits. Like Agent orange, locking aircraft cabins, ddz on mossies etc. it’s one of my personal things reacting too quick to what looks good to find out it actually caused harm. In this case, you will know how much your vault earned as your wallet will fill up and any attempt to alter that will get penalised so knowing ratios may or may not be a good thing. I am not sure.

4 Likes

Look at that, I had no idea it was an actual concept. It’s funny cause I did encounter those kind of bug in my career. When you find the problem it’s always a big ahah! moment. Thanks for the link!

So then, I guess the best way to benchmark a vault is to create a separate wallet for each vault and see how they perform compare to one another. So much for my grand carnival horse race game I was thinking about…

Thanks for the answers @dirvine.

1 Like

This revelation is a bummer. :frowning:

But all is not lost. Measuring Safecoin’s generation rate per vault (wallet) is still useful. I would like to know how big vaults compare to small vaults, when everything else is equal.

For example.

(1000GB vault) VS (10 x 100GB vaults)

My hope is the big vault will earn the same or a little more because it will also accumulate archive chunks over time.

I’m thinking that maybe the vault can provide some information without falling into the heisenbug territory. For example, a counter of GET request that we can poll through a simple API. It shouldn’t be too taxing for the vault to increment a counter and it would give us an important piece of information. With this you could poll the vault every x amount of time, check the difference in the counter and deduce its GET rate. Divide by the number of Safecoin generated in the associated wallet for the same period of time and you would know its farming rate. Assuming there’s an API to check the balance of the wallet, of course.

The vault could also tell us how many chunks it contains. It most likely knows.

Just thinking out loud, what do you guys think?

1 Like

That’s expensive. If you have 24 vaults with a 15 second update frequency, you’d be updating (and polling) more than once a second.
An async call to a callback program that I mentioned above is much cheaper.

Couldn’t you just poll every hour for each vault and figure out how much GET request they got by substracting the current value by the last one? Why do you need to poll it that often?

You could that and that’s a low frequency (of course if there was a way to change the frequency to 30 seconds, many users would do that). I expected something more aggressive like 30 or 60 seconds.

It’s just be more efficient to not burden the s/w with these counters and create more workload on top of all the other stuff that’s going on. Also if a wallet is for some reason down, you may need to call the API more than once per hour (or wait till the next cycle, like it’s sometimes done with SNMP pollers).
If you used a callback script or program, it would be invoked only when GET happened, so you’d get your updates asynchronously, but only seconds after they happen (that script could be something as simple as a one line curl script that would ping your polling server).

Yeah, I won’t pretend I know the best way to go about it. I think the general idea is that the more data they can provide us without compromising the efficiency and the security of a vault the better it will be.

My guess on this would be that it will be some time before the 10 100gb vaults will not be 10x better than the 1000gb vault.

The network stores by random hash value, which will be randomly created. Thus the network average of data stored will be fairly uniform across all devices till the average exceeds a vault’s capacity. So, a 1000 gb vault will be storing about the same quantity as each of the 100gb vaults, till the 1000 one exceeds 100gb of actual storage. Then it will start to exceed the farming of just one 100gb vault, on average.

If all were full? Good question. It will depend upon the processing-load/time, I’m guessing. If you’re running all vaults on the same machine, I guess the CPU load could get to be a performance factor if the nodes were pretty active.

I’m figuring on running as many vaults on one machine as can be handled without strain, then adding more drive space as the existing space fills up.

Of course, I’m just riffing on stuff I’m much less certain of than I might sound. Think I’m near the mark?

I would say you’re pretty close, with a good strategy to add space as the vaults fill up. We could spin our wheels till our fingers bleed, and still not really know, which is why we need to collect stats.

1 Like

It can be done, we will probably have many vaults reporting info back to us (our vaults) but the cost may be less farming on those vaults if they are to slow or have logic affected. So you can get general stats and report these to folk. I am not in favour at the moment of having that as default in all vaults. We will probably have some of our vaults report back to the visualiser we set up for instance, but it becomes less valuable as the network grows.

So possible to collect stats for sure, we just need be aware that nothing is free and the more you want to know the greater the price you need to pay (I am not saying this won’t be negligible, it may be). It may be better to have some vaults to measure approx stats in the net. The current changes to the code base since testnet2 will make all this much simpler for people to achieve without requiring several Phd’s and years of studying the code :slight_smile:

3 Likes

I agree with everything you said. TestNet3 will help us get a less blurry view of how to go about farming. Like you said, it’s best to monitor the vault “in the wild.” I hope this didn’t bother you as there’s so much your team will be doing on TestNet3. Vault farming is just a small aspect.

1 Like

FYI @fergish, the guys have been running up to 60 vaults on a single commodity PC in the office and that was prior to the significant simplification in code.

2 Likes

Thanks, Nick.

I was aware of that, but also figured that such performance would change when the network is live and handling a lot more action of communication, file storage and retrieval, churn, etc.

So, I figured 10 vaults on a decent pc should be okay, but we’ll see I guess. I’m sure there will be lots of empirical testing done by people all over, from which some best practices (more or less) will be derived.

Efficiency stats would be great if gathering them doesn’t affect performance. But if it does, I think that, as a community, we’ll have other ways to bracket the data.

3 Likes