Data Recycling Incentives

jreighley · March 28, 2015, 5:43am

I frequently see data in my customers databases from the 90’s and early 2000’s …

If you make software good enough that people don’t have to re-invent the wheel every 7 years, data doesn’t need to be recycled.

I ran into a real world problem earlier this month when we deleted some decade old data – only to realize that the offsetting entry happened years later - and deleting an old leg threw my customer’s balances off.

People don’t archive because it is too much trouble, and the risk of deleting stuff far outweighs the cost of letting it be. Hard drive space is cheap, and gets cheaper all of the time. Troubleshooting errant and missing data -particularly when it is decades out of anybody’s memory is very hard.

The real world examples I am talking about happen when we know what the data is -and we know what it is used for – Imagine how much harder it would be to make informed decisions if we didn’t know what the data was, or what it was used for. The risks are far from trivial. Safe network ought to be far safer than saving data on a hard drive – having an “autodelete” for any reason makes it much less safe.

Traktion · March 28, 2015, 7:32am

This would also provide a platform to move a chunk from archive to oblivion - Take chunk out of archive and destroy it.

I am sure even a small recycling reward would encourage this, when/if it becomes an issue.

reivanen · March 28, 2015, 10:34am

A vital aspect to keep in mind is that HDD storage cost halves roughly every 18 months. So we have a “natural” mechanism making it cheaper to keep old data stored.

Al_Kafir · March 28, 2015, 10:36am

That’s true, but would also be equivalently true for any competitor, who would still have the edge of not storing junk.

digipl · March 28, 2015, 11:05am

Cost of hard Drive Space:

Year 2000 → 15$/MB
Year 2005 → 0.72$/MB
Year 2010 → 0.13$/MB
Year 2015 → 0,03$/MB*

We don’t need data recycling

*4TB Hard Disk today

chrisfostertv · March 28, 2015, 11:07am

And in 10 years when all your disks are belong to us and SAFEcoin hits it’s peak, the idea of deletion will seem quaint. I wonder how much cache and ram we’ll be packing…the networks gonna be screaming

DavidMc0 · March 28, 2015, 11:24am

Are there mechanisms in place to ensure that all farmers will be rewarded in proportion to their contribution to the network, and not by whether the network decides to give them new data, or ‘landfill’ data?

Why would anyone provide a landfill vault? We finally have a use for big server farms. They are the only ones able to provide “above average” storage capacity, as well has bandwith. And if the Network does this autonomously, they really won’t have a choice.

If there is no mechanism, those landfill vaults who you say will not have a choice, will simply stop farming if the reward for their service doesn’t cover the costs.

If the maintenance of old data becomes an issue in terms of incentives, perhaps a small ‘tax’ could be added to each new ‘put’ request by the system, and this small percentage of Safecoin per transaction is reallocated to pay for old data maintenance. While this would reduce the amount of GBs per safecoin in some way, it would increase the value of those GBs as it secures them indefinitely, and provide necessary incentives to keep even ‘landfill’ farmers contributing.

betterthantrav · March 28, 2015, 12:29pm

What is the incentive for landfill farmers? The best solution is multiple options. Reward the farmer for storing junk data and reward the user for recycling junk data.

reivanen · March 28, 2015, 1:48pm

The incentive is that when you go over network average in rank your farm rate is a bit higher. This means that you get more safecoin in compensation for one farming attempt. Think of this as extra compensation for also keeping those archival chunks that won’t provide farming attemps at same frequency.

dyamanaka · March 28, 2015, 3:09pm

Yes, resources (storage provided) affects farming rate. The more resources provided, the more you’ll farm Safecoin per GET request. See FIG 1 below.

In theory, this is how it works…

Farms (vaults) that provide “above average” storage farm at a higher rate, while those who provide “average” or “below average” farm at a lower rate.

The largest vaults will likely accumulate “archive data” but they also farm at a higher rate per GET request.

The smallest vaults will likely keep only “active” data but they farm at a lower rate per GET request.

The end result is (One Big Vault) or (Several Small Vaults) should earn “roughly” the same amount of Safecoin. Much like ants with different specializations. Large vaults have more space to keep older data, while smaller vaults have less space and keep active data.

Technically, farmers do have a choice… create many small vaults, and earn at lower rate, may consume more bandwith from caching. Or create one big vault and earn at the max rate. The reason I said “they don’t have a choice” is because either way, they would end up making “roughly” the same amount of Safecoin anyway. If more farmers add larger vaults, older vaults that started as a landfill will end up as a normal vault later on. The (Network Average) is a moving target, and will likely increase as the Network grows.

Don’t quote me on this because it’s not set in stone. We still need TestNet3 to prove it works as expected. I like the idea of “set it and forget it”. Average farmers wouldn’t need to waste time trying to game the system when the benefits are miniscule.

EDIT: I read my explanation and it didn’t seem accurate. So I simplified it.

digipl · March 28, 2015, 5:56pm

I don’t think it works like that and, also, it would be bad for the SAFE net. A farmer must seek maximum profitability and the net must seek maximum efficiency. So there are other parameters we have to consider especially speed and availability.

According to the systemdocs, the network measures the amount of data a node can store and compares this with the data it loses. This number, from 0 to 1, will be its ranking.
And the nodes with best ranking will be, on one side, the different managers, and secondly, the preferred nodes when saving data because they are the fastest. And I think is the policy we should keep, regardless the size of the vault.

Is the manager, in charge of the Rig, who should take care to maximize their profits, based on the size of the different vaults, that will change over time.

About the data recycling, I made a small Excel to calculate the global percentage of garbage data, over the years, based in the rate of network growth and the annual rate of garbage data. According to the two variables the percentage tends to different limits, some examples:

With 50% Net rate Growth
10% annual garbage → 25% final garbage
20% annual garbage → 42% final garbage
30% annual garbage → 56% final garbage
50% annual garbage → 75% final garbage
80% annual garbage → 92% final garbage

With 100% Net rate Growth
10% annual garbage → 18% final garbage
20% annual garbage → 33% final garbage
30% annual garbage → 46% final garbage
50% annual garbage → 66% final garbage
80% annual garbage → 88% final garbage

Those numbers reinforce in my initial idea. We don´t need data recycling.

Al_Kafir · March 28, 2015, 6:06pm

What’s the equation you are using?

digipl · March 28, 2015, 6:40pm

Total Garbage year X = Total Garbage Year X-1 + New Gargabe year X + New garbage year X-1 (Rest total data-garbage)

Al_Kafir · March 28, 2015, 6:58pm

I’m struggling to see why they are not the same figures. I’m thinking 10% of whatever size network is still 10%. If we start with 100 users, then 10% is 10, if we increase Network size by 50% to 150, then 10% is 15, but it’s still 10%, if we increase by 100% to 200 then 10% is 20, but still 10%. Therefore the final figures in whatever scenario should be equivalent in percentage terms…shouldn’t they?
What am I missing here
Ahhh…is it not the -1 part of the equation?
Edit: 90% junk sounds like a lot in any case,so only 10% is active after a few years?
I think you’d have to maybe factor de-duplication into the equation too…

digipl · March 29, 2015, 1:29am

Old data also become garbage.

Well, 90% of junk data means about 80% of the new data is garbage and this data pay for PUT (don’t think there such this percentage of new garbage data).
The most important thing is that, in a growing network, the total percentage of garbage increases very little and, as long this garbage is well distributed, is affordable by the network.

The deduplication has no meaning here as we consider garbage data that will never be used.

This is only a gross simplification. In real life the percentages vary with age and growth of the network will be variable.
My point of view is that, in a constantly growing network, the recycling data don’t matter. Maybe, many years from now, we must consider the issue but I doubt it.

janitor · March 29, 2015, 12:40pm

Above there’s already a comment telling you that this equally helps any competitors, so in comparison you’d be still be falling behind the competitors.

No, the numbers are made up to suit your theory. The scenarios seemingly handle garbage data only because the arbitrarily chosen growth rate of the network and garbage data make it look so.

You don’t know how fast the network will grow
You don’t know how much garbage there will be added every year
You assume that because the network can survive your scenario, that it doesn’t have to be as efficient as the average (or - more likely - most efficient) competitor targeting the same market

chrisfostertv · March 29, 2015, 12:47pm

What is the competition and who are the competitors?

janitor · March 29, 2015, 1:01pm

As far as I can tell no one has a generally available product or service (for anonymously storing and sharing data on the internet), but also there are several sub-markets, not just one.

For example, someone may say Storj (to mention one of “older” projects in this area) is a competitor, while I don’t think it really is.

Currently (since noone is shipping yet) my thinking is MaidSafe would be more suitable for long term online read-only archiving with fast access to data, while other use case scenarios (say, read-write access, trading in files, and so on) may be easier to implement in other solutions. They may also be possible with MaidSafe, but where there’s overlap if everything else is the same, then the lower cost app will prevail.

If MaidSafe creates a solution that appeals to a large market, cost-effective competitors also have the luxury of appearing later.

It’s unwise to underestimate the potential for incremental innovation and cost cutting in what is essentially a free and unregulated market.

digipl · March 29, 2015, 4:56pm

Neither you, but at least I use data matching the Internet history.

Instead your calculations correspond to a world that has never existed.

janitor · March 29, 2015, 5:14pm

There’s no “history” of MaidSafe capacity growth
Your source may be as good as anyone’s. By volume, I would say at least 50% of saved data is never accessed again and it’s getting bigger as our ability to collect and save it grows.
Example:
Dell Says 90% of Recorded Business Data Is Never Read - Slashdot

If it’s never deleted and if the growth rate of garbage one day eclipses the growth rate of MaidSafe capacity, at some point garbage data would certainly comprise more than 99% of the network capacity.

Topic		Replies	Views
Sacrificial data vs non-permanent data Autonomi Network Token	27	3490	August 19, 2015
SafeCoin payment for extra redundancy/download speeds Autonomi Network Token	21	4310	October 16, 2014
Quick little question Autonomi Network Token	7	1312	August 27, 2015
Safecoin Recycle Request Autonomi Network Token	14	1714	March 8, 2016
APP Incentive Pay Model Marketing	31	3271	August 17, 2014

Data Recycling Incentives

Related Topics