Fuzzy logic usage


#1

I’m curious about the degree of usage of Fuzzy logic in SAFE Network and if it should be considered as a good fit for it.

Is it being used in SAFE Network at all and if so where?

About Fuzzy logic

Simple explanation of Fuzzy logic from Wikipedia:

Fuzzy logic is a form of many-valued logic in which the truth values of variables may be any real number between 0 and 1.

I’m reading the book Fuzzy Logic (The Revolutionary Computer Technology that Is Changing Our World) and there’s an interesting quote:

Fuzzy sets … bring the reasoning used by computers closer to that used by people.

by Lotfi A. Zadeh, founder of fuzzy mathematics, fuzzy set theory, and fuzzy logic

Idea

What if we could use logic more similar to how our brains work. We are doing the best to understand how we function and how to improve our technology with that kind of information. Overall, we are still the creators of computers and their software for now.

When dealing with consensus or rating node (vault) behavior we might be able to do better if we consider all available inputs, even those that are not strictly true or false. Knowing that a certain information is 50% true and 50% false is still an information bringing some value (it doesn’t mean 50% chance) and when we add more bits like that we are getting something solid.

Topic’s intention

  • List occurrences of Fuzzy logic in SAFE network.
  • Collect some opinions about Fuzzy logic and its usage.
  • Get some references about Fuzzy logic usage in decentralized networks (routing), security (cryptography) and technologies used by MaidSafe / SAFE Network.
  • I will add more info myself while I’m looking into it.

Topic’s outcomes

Fuzzy logic as part of SAFE Network is present:

  • in consensus (there’s even a paragraph on the Fuzzy logic wiki page for it: Forming a consensus of inputs and fuzzy rules) … Info provided by @dirvine
    More details: consensus at the time a new event is broadcast is not binary (ie it’s fuzzy), however consensus eventually is binary (ie it’s not fuzzy) - all participants are guaranteed to reach consensus (if at last 2/3 in the group are honest) … Info provided by @mav
  • in data replication (i.e.how available is it, how many copies, is it cached, is it guaranteed) … Info provided by @dirvine
  • when vaults decide to rate limit clients (it’s not a fixed global binary limit) … Info provided by @mav
  • for resource capacity (not all clients provide the exact same resources, eg storage amount, bandwidth) … Info provided by @mav
  • for the farm rate and pricing algorithm (always falling somewhere between 0 and 1, more info in rfc-0012) … Info provided by @mav

Fuzzy logic as part of SAFE Network is not present:

  • for managing most malicious behaviour (you can see the list of behavious that have binary detection in the Malice In PARSEC rfc) … Info provided by @mav

#2

This is not something currently used as such, although consensus is a form of that when you think about it, I mean where the decision is not binary, but a summation of agreements approaching 1 where the number of nodes guessing the result gets to 100% (i.e. 8/8 being == 1 and less than 8 being less than 1 but possibly still true especially if the ratio is quorum or better).

Just realised that was a sentence paragraph (sorry).

So as nodes vote they form a system similar to this fuzzy logic approach. Same goes for data replication, i.e. how available is it, how many copies, is it cached, is it guaranteed (never, but closer to that than not if the copies are high enough to have historically been OK.

If you are studying fuzzy logic then it is worth looking at genetic algorithms / evolutionary programming for similar approaches to natural fuzzy solutions. These also have similarities, although in AI activation functions there is stuff like tanh, sigmoid, relu and many more activation functions. So you can think precepteron where the inputs normalise through such functions and activation is decided as to have happened or not, but not because of a 1 or 0 result, but a tendency towards such.

If you add on things like LSTM (Long Term Short Term memory) with genetic algorithms and reduce the fitness (or deep learnings cost function) functions absolutism and allow objective-less search to exist alongside that then you may get into some very interesting areas. Well I think so anyway :slight_smile: It certainly gets more fuzzy :smiley: :smiley:

Bottom line, the world around us has almost no absolutism in terms of a 0 or a 1 when looking at complex things like humans do, only when you can remove nearly all variables can you get to that level of absolutism, it would mean reducing complexity or entropy. that would mean travelling back to the big bang which is unlikely. Programmers can be absolutist in their approach causing horrific bugs. In the likes of SAFE that thinking is extremely destructive, so you do have to have a fuzzy approach, where you are working with probabilities almost continually. So saying this message is signed it must have come from that client/node is just not true, so many things could have gone wrong, but you must keep all that in mind whilst assuming it’s true for a wee while. Then add in, did the message traverse a secure route?, could it have dropped?, did it have consensus all along the routes? and much more. so from that perspective we are nearly always fuzzy and this is why optimisation to soon is terrible, the pedantic people will assume none of the deception exists and it always does. The world is deceptive and therefore the solutions are rarely the obvious ones and almost are never absolutes, but a combinations of fuzzy logic type approaches with very high degrees of probably of the correct answer. then historic results and ensuring the variables do not change too much can give you a great result, that is efficient.


#3

It may serve as an interesting tidbit, but Skype (the original P2P Skype) used to use a small neural network for something about routing decisions. I guess it must’ve been Jaan Tallinn (one of the original devs) because he’s active in AI even today but I may be wrong. I’ve been looking but I couldn’t find the piece where I read about it years ago. I clearly remember the writer mentioning that it was the most widely deployed neural network at the time, and nobody knew it was even there.

A neural network may well be a good way to decide about e.g. the transport that’s most likely optimal under the circumstances.


#4

I’m definitely willing to provide months of consistent computational power to @maidsafe for routing optimization via neural nets but no sooner than the deployment of realtime updates. IMHO a fairly stable network that can be populated with user data and upgraded should be put into place before optimisations are worked out. Until then clearly labeling every pre v1.0 release as un-optimised should enough curb expectations. Academics and vanity fiends can wait. Please for scotch and whiskeys sake, no unnecessary delays. :tired_face:

The world need to witness this thing working as a whole product even if some of its individual components are a bit sluggish at first.


#5

Surprising as it may be, it’s more about ease of programing than premature optimization.

I was thinking along the lines of training a real small network to make educated guesses for the client about choices that are hard to formalize. If the context and the result are both measurable, a neural network with a few dozen learned parameters can be both easier to maintain and more efficient than a complex handcrafted decision tree.


#6

Hi David,

Thanks for your inputs on this topic :slight_smile:

You are right that consensus is a form of Fuzzy logic. It makes a sense + there’s even a paragraph on the Fuzzy logic wiki page for it:

Since the fuzzy system output is a consensus of all of the inputs and all of the rules, fuzzy logic systems can be well behaved when input values are not available or are not trustworthy.

I agree that the same obviously applies also to data replication and Opportunistic Caching as all the collected statistics and voting results are being integrated into the decisions.

I appreciate the food for brain you gave me, I’m still processing it :wink: . And to give something back, I’ve written a couple of thoughts below.

Based on my understanding genetic algorithms / evolutionary programming are mostly about optimization and this is not what I had in my mind when starting this topic. It may be even considered as a premature optimization by some people (like @Stark has already mentioned below).

I was thinking about MaidSafe bootstrap servers and their public keys hard-coded into the Client software. It feels like one of bottlenecks when trying to have a fully decentralized autonomous system. I haven’t figured it out yet, but I had a feeling that Fuzzy logic could help with it, when thinking about it a few days ago, like:

  • a.) Decentralize bootstrap servers by allowing anyone to run them. There are some problems, though. How is a Client software going to know where to find them (what are their IPs)? How to make such a bootstrap server trustworthy when anyone can run it and potentially maliciously modify its code? How to recognize a genuine SAFE network signature and how it should look like?

  • b.) Is it possible for a new node to get into a contact with another node that’s already being part of the SAFE network? And can it be done within some specific time limit (if both nodes are part of the global Internet)?

Trying to simplify these tasks/questions, making them more intuitive, I started to think how these problems are being solved by humans in society. And when people process all inputs to come up with answers their logic is much closer to the Fuzzy logic than to the binary logic.

I must say I like Activation functions and neural networks. They align well with my current knowledge of how decisions are being made and how control evolved. Single cell organisms have all their inputs coming from receptors on their membrane and there’s no brain like system inside of them (I hope I haven’t forgotten any important detail here) thus all the decisions are being made by setup of their membrane (how the receptors are being activated and what actions they fire). The evolution pushed for larger and larger membrane to encode more intelligent decision making and as a consequence when a maximum sustainable single cell size was reached, multi-cell organisms arose to combine their surface and collaborate. Membrane for a cell is like a skin for a human. And funny fact, human brain evolved from the skin (it can be observed during development of human embryo). The gray matter of the human brain is arranged in layers 2 to 4 millimeters thick, and deeply folded, thus comprising a large surface.

You’ve mentioned some techniques from the AI field. SAFE Network uses some kinds of these algorithms and techniques and it intends to do so on a large number of nodes, which may lead to the creation of AI intentionally or as a side effect. There’s the Integrated information theory (IIT), proposed by Giulio Tononi, attempting to explain what consciousness is and why it might be associated with certain physical systems: https://en.wikipedia.org/wiki/Integrated_information_theory It basically says: “more information integration/computation you put together, more consciousness will arise from it”. There a TED talk describing it briefly: “How do you explain consciousness” by David Chalmers


#7

This part is not quite correct. GA/EP is about search, in fact all AI is is a search problem. the power in these is finding solutions is huge search spaces that using normal methods (brute force) for instance would take a computer longer than the universe to compute. To put this into perspective, if you try and find the pixies that represent a cat, or a pattern of speech or even play go, then brute force methods (absolutist) would fail dramatically as the search space is massive.

So it gets deeper, all of the previous things are more pattern matching, or confining the search space, although huge to a subset of “intelligence”. I think of it as pattern matching. The trick is the matcher algorithm is unlikely to be understood, never mind coded by a human. This is why such models (or algorithm parameters) are trained. It is not intelligence in my mind.

So the bigger the problem the harder it is as we can guess, but we are already past what humans can code, so you can see we are way past using AI to optimise, in fact it is the foundation builder, humans were not required or capable to do that.

Then you get much deeper into this search of intelligence, by generalising the algorithms to find not only your desired thing (like a cat picture), but in fact find things you did not know you wanted. This could be the AI finds that cats with certain expressions are in pain and require help or something along those lines. In fact it may find there are different species of cat and so on. Allow the algorithm to get more general, it may find the things a cat will eat and what will eat a cat, including bugs. then it may find killing off of a bug, will extinct the cat and so on. So these are not optimisations in any sense. They are just beyond mankinds ability to code them, all we can do is code something as close to a brain type learning function and let it do what it does. We can optimise this by things like ReLu in deep learning or indirect encoding in GA, but that is only the mechanics of it, not the intelligence of it.

This and many many more papers are all in suoer evolving stage right now, what a great time to be alive as any of us might stumble across the thing that builds everythign else.

tl;dr

when the number of variables increases to beyond 3 or 4 (that we think are the only ones involved) or the number of unkowns is more than 1, then forms of machine learning will likely lead to much faster and more efficient solutions. If the algorithm allows continuous learning with the ability to accept or find more variables, then it is unlikely a human coded deterministic algorithm will work, never mind compete. Well that is increasingly my position anyway.

Also

this is very true, although apparently counter-intuitive. The more complex and efficient solutions, likely have much less involvement with programming and more to do with having a flexible and simple genetic algorithm (like LSTM for instance). So programming effort is significantly reduced.


#8

I think you may also be interested in this list of metrics for network health since a lot of these will be fuzzy. It’s a work in progress so if you have anything thoughts that would be great.

To reiterate some of the previous posts, fuzzy logic is present

  • when vaults decide to rate limit clients (it’s not a fixed global binary limit)
  • for resource capacity (not all clients provide the exact same resources, eg storage amount, bandwidth)
  • for the farm rate and pricing algorithm (always falling somewhere between 0 and 1, more info in rfc-0012)

Interestingly, fuzzy logic is not used for managing most malicious behaviour. You can see the list of behavious that have binary detection in the Malice In PARSEC rfc.

There are some some disruptive behaviours that can only be managed in a fuzzy way, eg noisy clients, data density attacks, disruptive (but not malicious) vault behaviours, detecting out-of-band coercion etc.

Consensus at the time a new event is broadcast is not binary (ie it’s fuzzy), however consensus eventually is binary (ie it’s not fuzzy) - all participants are guaranteed to reach consensus (if at last 2/3 in the group are honest).

I think one of the most important areas that fuzzy logic will be useful will be the economic model. Network health is probably the most important fuzzy area of the network. Economics and health are almost impossible to separate so maybe we could say they’re equally important?!

Definitely. The bootstrap cache is already one part of the solution. This list of ways bitcoin handles peer discovery is also useful.


#9

Hearing what you’ve said (written), genetic algorithms / evolutionary programming can give us deeper and maybe even unexpected insight, find solutions we were not aware of or find the right solution match for our situation.

That said GA/EP seems to be a good addition to SAFE Network designing process. It may be too complicated or premature to include it into software itself as yet, but should be rather engaged to help with SAFE Network design (quite likely generate unforeseen options) and maybe even solve problems MaidSafe team is dealing with. So a question arises here, have MaidSafe team already engaged with GA/EP for this cause?

Usage of GA/EP for SAFE Network design process probably deserves its own topic (I’m thinking about starting one). And I like the idea of machine learning using artificial neural network with ReLu (Leaky ReLu) activation function probably combined with LSTM.

I believe that quite likely future of programmer’s work will be consisting of training AI mainly the same way as a master teaches his/her apprentice. And by training I mean to fill in the gaps AI has and spot its mistakes and fix them, improve it step by step, working like partners, quite similar to what people do. Personally, I hope people have some inherent advantages, like intuition and emotions (our memory is context based and driven by emotions switching the context).

Using AI this way feels a little bit like using a compiler to translate our higher abstraction expression to lower level CPU instructions. AI just moves the abstraction level one floor up. The analogy fits quite well when thinking about it. AI following our orders can automate parts of our work, come back with suggestions and have dialogs with us. Rust compiler does something similar, automate a lot of checks and decision making, come back with suggestions/warnings/errors and kind of having dialogs with us. The main difference seems to be the limit of human abstraction level comprehending. AI can potentially go up and up, breaking through our limits.


#10

Let me bring this topic a bit closer to the ground because I’m not sure machine learning has to be complex or magical.

For example, let’s look at picking a particular method to connect to a particular peer.

There are a number of simple inputs, such as whether we’re behind NAT, if UPnP or NAT-PMP working, whether there’s a fixed TCP or UDP port is forwarded, whether certain kinds of UDP and TCP hole punching attempts worked, if we’re wired or wireless, some router fingerprinting details (for example, to be able to work around some of the stupid ones; this could be a sub-network to be honest), min/max/median times till connection for the already connected peers, and the like. I’m not sure what can be known about the peer’s situation (maybe nothing?) but those could also be inputs. Most of these are just a 0 v.s. 1 choice and some (e.g. connection times) are real valued, probably best expressed in log-space. We’re talking about maybe a dozen inputs.

Add 1-2 hidden layers to mix things up. Picking a nonlinearity is a mostly insignificant choice to be honest (ReLU is the simplest and it’s easy to train) and I can’t see a need for recurrence (LSTM, GRU, etc) because it’s easier to just add a few features to summarize what worked recently.

The output can be a simple log-softmax with one entry for each particular way to attempt the connection. If there are different varieties to pick from, that’s just a matter of interpreting the output a tad bit differently (that is, the same network structure would work just fine.)

Collect a lot of data during the Alphas, like what the latest run did. Train the network to give a high score the methods that worked best for each particular circumstance the clients found themselves in.


#11

Hi mav,

Thanks for your well structured response and provided links :slight_smile: I’m adding information from posts to Topic’s description as an outcomes of all conversations.

I’ve checked the bootstrap cache, bootstrapping node description on wikipedia, the list of ways bitcoin handles peer discovery and a few articles about peer discovery, but they all say the same thing:

When you try to connect for first time, the genesis nodes (hard-coded IP addresses) are used.

The genesis nodes are still a bottleneck of the decentralized system (at least from my perspective) and it seems no better solution is publicly known yet. So if someone hacks or legally overpowers server with genesis node, a user connecting to it might be connected to a totally malicious SAFE Network clone and upload all his/her data believing it is 100% safe.

I understand the term “Network health”, but I’m a little bit confused when you say “Economic model”. Do you mean like efficiency / energy (power) consumption or monetization or something else? :thinking:


#12

I suppose so but it should be easy for clients to check they’re connecting to a valid network based on the presence of certain specific chunks for that user. It would be extremely difficult for the malicious network to pull only the relevant chunks for just the target user.

Presumably complete network duplication would still be a valid attack, but extremely resource intensive.

I agree nobody has ‘solved’ the peer discovery problem, but I think it’s been demonstrated there are some extremely reliable implementations, certainly adequate for practical purposes.

Is there any specific case where you think an attacker would especially benefit from attacking the bootstrap mechanism? I mean, what’s the reason for an attacker to do this? What’s their benefit?

When I say economic model I mean the incentive structure and intended behaviours that come out of the way the reward and punishment system is designed. So part of that is the efficiency and part of it is monetization and there are lots of other parts like behaviour and decision making and authority within the network etc.

The current economic design mainly concerns itself with ensuring there’s some spare space on the network (but not too much that it’s a waste). But maybe it should care about other factors too, such as total number of vaults, average vault size, churn rate, rate of coin creation, number of accounts, etc. Or maybe not!

The current economic design is also very ‘natural’ in that it doesn’t have any magic numbers and should reach some inherent balance based on the latest predominant uses of the network. However this doesn’t mean the intended outcome is what will actually happen. Maybe the economic system lends itself to very few people having most of the power (like capitalism seems to be doing, albeit unintentionally). There’s no clear ‘definition’ of the intention of the network with respect to economics and governance, perhaps for the better, perhaps for the worse. It’s an interesting debate.

I think fuzzy logic could be useful in the economic design so that behaviour stays close to the values the participants intend. For example, there’s a spectrum between one almighty powerful user running all the vaults and seven billion democratic users running vaults. Where on that spectrum will the network end up? It’s largely a question of economics, and the model doesn’t currently specify any particular desire, it ‘just happens’. Maybe that’s the best way?! Or maybe fuzzy logic can guide the process a little. Hard to say but I’d be interested in hearing your ideas.


#13

I would tend to think of things like these as too complicated to just ‘let it happen’ as a result of choices. You run a small chance of reaching any actually desired outcome that way.

The preferable way to engineer a solution is to clearly define the problem and the wanted outcome, then crafting everything as to reach it.

The problem with that is that the desired outcome must be based on rock solid ideas, and at that point you need some real scientific work to be able to tell with any acceptable level of certainty, what ideas are actually well founded in reality.

As an example: what are in reality the benefits, downsides, risks and long term effects of x percent of vaults running in datacenters?
We can’t just state what we want and head for it, without knowing that it is actually a well founded desire.

But finding out how these things interplay is immensely difficult.
How does human behaviour, trends, natural resources limitations, all and anything, influence the way things eventually tend to fall in place?

I realise that what I’m saying here is not making anything much clearer on what to do. I’m basically stating that there are conflicting factors. On one hand, we really want to be fully aware of what outcome we want, as to have any chance of designing the correct solution, on the other hand it is very hard for us to find the right level of abstraction when defining the outcome, and then to verify that that is actually a rational, pragmatic and sustainable outcome - regardless of ideologies and such.


#14

This reminds me of the private contact discovery problem, also unsolved.


#15

Well, yes there is no solid solution, but there are considerations. For instance nodes will beacon on port 5483 (LIVE) and respond to anyone trying to bootstrap. This is only local networks right now and a possible security risk (i.e. internet cafe beacons by bad folk etc. to which there ire fixes) but can possibly go further with multicasting.

The notion being nodes will “find” other nodes by querying networks further from themselves. Anyway this route is a possible mechanism to remove hard coded points at all, but likely will need a substantial sized network and on restart have nodes connect to where they were as much as possible.


#16

Hi Keith,

I’ve read the article “The Difficulty Of Private Contact Discovery”, you’ve sent. Thanks for the link :slightly_smiling_face: I have a feeling though that finding a solution for “Private Contact Discovery” problem is not so difficult as removing genesis nodes from decentralized system.

Here are some explanations why I think so:

  • Private contact discovery service is supposed to be available on a server (some centralized place(s)) and there’s no worry expressed about it in the article. The problem is how to query that service to find out if the contact is already registered with it, but not revealing the contact itself (or any other details that are not supposed to be knowned by the service).
  • There’s a second article released by the same company signal.org three years after the first one: “Technology preview: Private contact discovery for Signal” and they’ve also released Private Contact Discovery Service (Beta) software for it. I haven’t read the whole article, but they say the key to the solution lies in the usage Software Guard Extensions (SGX).

#17

Thanks for the follow up tech preview link dalibor.

Not really a solution then more of a hack work around, SGX is a poor substitute to an elegant algorithm for solving the private contact discovery problem. SGX (and AMD-SP) are full of gaping holes (here are some old links from two previous posts of mine), and that was before all the Spectre and monthly variants each worse than the last came along.


#18

Nice to see some brainstorming on this topic.

As has been mentioned, the group consensus mechanism combined with parsec offers an innate solid fuzzy logic mechanism. :wink:

I’m uncomfortable with the use of hard-coded bootstrap servers long term, but recognize that the use of hard-coded bootstrap servers offers one immediate and practical solution to seeding the network and getting to launch.

As for the private discovery problem, I think the easiest way to solve that one is to go through route that SAFE solves a lot of other issues, randomness, huge size, and group consensus. If a bootstrap server misbehaves or is detected to be malicious, why not give it the boot? Wouldn’t work well if they were hard-coded though…

I’ve recommended this once before, but what about an ipv4 random search algorithm? Slow yes, but might offer some options.


#19

The old Kademlia network had some hard coded bootstrap nodes, and an easy (for the decade!) user interface allowing a simple quick copy paste input of new bootstrap node lists. The user chose which publisher of bootstrap nodes to use from a wide variety of sources: forums, web pages, ftp servers…

This OOB communication of bootstap nodes is probably the most robust time-tested method for defeating any bootstrap node blocking firewall.


#20

I’ve checked out the topic about The Underlay (referenced below)

and I’ve watched linked video Google 2.0: Why MIT scientists are building a new search engine

I must say I like how The Underlay, as a distributed graph database of public knowledge, is supposed to represent information not saying: Here we are providing you with 100% truths, but instead approaching more Fuzzy logic way, like this:

The Underlay aggregates statements and reported observations, along with citations of who made and who published them. For example, it would not contain the bare assertion that “Sudan’s population was 39M in 2008”, but rather that "Sudan’s population was ‘provisionally’ 39M in 2008, according to the UN’s statistics division in 2011(2), referencing Sudan’s national census, as reported by its Central Bureau of Statistics, and as contested by the Southern People’s Liberation Movement.(3)"

(quote from https://underlay.mit.edu/)