Simple Human-centric Search on SAFE

I’ve been playing with the idea of basic search for our basic sites on the basic testnets.

(I’m aware of: this thread, but hoping to get more feedback on some implementation ideas).

Human centric search

So right now, while safe is small indexing could be quite simple. A json file of sites + meta tags (for describing content).

And I’ve made basic steps into a super simple react app for searching this file.

You would maintain your own index. Useful to you, not super useful for all.

Now if that index could reference others you trust… Well then it could periodically ping them and you’re combining your indexes with other peoples. So you get more results that way. And they can do the same.

Any ‘search site’ would be combining a lot of (trusted) people’s indexes. Or indeed just use your own index (which you would need to maintain).

That for me, is basic ‘human centric’ search. No scraping. People maintaining their own lists. Sharing as they find it useful.

You could even have multiple search options in an app (your index, a global index, an index specific for X).

This, theoretically would be nice for garnering results on things that people liked enough to put into their own index AND you only use indexes that you like, and perceive value in.

Bonus: if your list is popular, you get GET rewards (with PtP). Lists could even be weighted with project decorum (web of trust), for example.

That is the theory behind the idea I’m toying with.


Benefits:

List Control: control your list, and what lists you use.
Algorithm Control: no reason you couldnt implement your own algo to get better personalized results.
List Rewards: with PtP, you get rewarded for good content.
SAFE: search is done via the app locally, no search history for anyone to get into.
No ads: Unless you want them. I guess an app could let people bid to show you ads if you wanted. Then you’d get a cut. (I guess that could be it’s own app).


Problem: Scaling

I’ve been thinking about this purely in the small ‘now’ sense, where it would probably work for this forum’s uploaders. But to be useful the idea has to scale. Periodic GET requests would help, but the real problem would be maintaining a list on a user’s computer AND THEN searching any sizeable list. (No PC can match google’s result crunching power).

For a PC for a while this would be fine. This would not work for mobile users. Not realistically.


Can anyone see any other problems / advantageous here? Does it make sense at all? Anyone have any suggestions for overcoming the scaling issue?


I’m thinking to get to grips with the current SAFE API and implement a simple POC (shouldn’t be too complex). But before I fire in, feedback would be appreciated!

(I’d suggest that perhaps you would still use one ‘site’ that would leverage lists as above, but would handle it’s computation on larger scale to tackle larger ‘searches’. But that also feels like a cop out. Surely there is a better / SAFEr way?)

11 Likes

I imagine that if indexes get really huge they could be broken up into smaller files so that the client only has to fetch the parts that it needs to do the particular search that it wants to do. I assume everything would be stored on SAFE, so there should not be any issues with running out of storage space; it seems like the main scaling concern is the amount of computing power needed to pull down the entire lists from the network and merge them. I bet there’s some nice tree algorithms that you could use to do that sort of thing without having to touch too many nodes each time too, though.

1 Like

Yeh, my thoughts exactly re: space / general list management etc.

Could even be lists by themselves are very specific. If you’re into DnD, you maintain a DnD specific list, which is pulled into other lists which compile them, for example.

Interesting point about tree algos. I don’t know what they are (front end dev, farrrr from a networking dev)… yet! I’m off to give it a google. If you’ve any good reference sites please feel free to drop them here! :smiley:

First I want to say great idea! Awesome! But I do have some thoughts on the project.

  1. If Bob shares his index with enough people isn’t he broadcasting his surfing habits? I mean if your list of save lists = 70% porn, 20% cat videos and 10% random other things is it that hard for others to tell what you’ve been up to? Even taking into account that people would collect other people’s search indexes human beings what they are tend to collect things that interest them. So people would tend towards lists that had lots of links that were of interest to them even if they weren’t 100% of interest. Isn’t this a security issue for the user? If Alley, Bob and Collin all have 90% porn, and all the same porn that updates at the same time, is it that big of an assumption they belong to the same webring or something?

  2. In response to this:

What if, as @hdastwb suggests, broke the list down into parts and then outsourced the job of searching them? That is wrote a helper app, or another part of the app, that would receive encrypted chunks of the search lists, decrypt them for the algorithm, search through them, then send the result, be it null, or a found result, encrpyted back to the paying requester. People could be paid for making their processor power available to search these lists. Much like the concept of farming safecoin you could rent out your cpu power to people doing internet searches for a small fee. This way random people would get paid instead of a giant corporation and your data would remain secure.

1 Like

Yeh I like that. I guess if the SAFE decentralized computing does come, should be possible!

Re: Point one. I guess in a way it would (although you don’t need to index everything you visit). But I think an app could publish these different indexes under new anonymous user domains, for example. So you could have control of what sites were in your index, and which indexes you ‘own’, but which you would associate with yourself… Does that make sense?

Even if you give out your list, if it references other trusty lists, there’s no knowing who owns them (unless someone claims it publically… even if you do own them yourself.) So I think that should mitigate the risk of deanonymisation.

1 Like

Hmmm. Okay, after a brief scan, I guess you could use this (simply) via lists having ‘metadata’ themselves and using this ranking for tree searches, only pull ones for search if that metadata even matches… Would speed things up for sure!

Where’s the if possible come in? What’s so difficult about the notion of A sends encrypted request to B, B processes it, B sends encrypted results to A? Moreover we already have safe vaults being written, why not just examine the code and fork it for the new project if need be? I’m just saying it won’t get done until it’s actually done.

Ha. Yeh, I’m not fluent in all forms of code nor different programming paradigms. It’s possible for someone to do. Sure.

If you want to fire in with this, that’d be great. :smiley:

Right now I’m going to start working on a simple POC for this. Which means getting to grips with SAFE APIs authorizations, how PUTs work and all that. (step by step!) So any and all network, distributed programming stuff will have to wait unless people want to jump on board, which would be great!

1 Like

You’re writing this in json and javascript right? If so I’ll do what I can. Just make sure to annotate your code.

Yeh right now, I’m building an electron react/redux app as a POC. Search / parsing and requesting files should be fine (on one machine right now). But I’m failing to get the app to auth right now (docs are out of date /auth).

And I need to head out, but I’m going to cobble it together from the demoApp, so should get something going soon enough when I have time.


On the json topic: an index would basically just be a json file, eg:

"MyAnonIndex":
    {
        "site1" : 
        {
            "url" : "www.thisplace.safenet"
            "metadata" : [ "string", "about", "site"]
        },
        "site2" : 
        {
            "url" : "www.thisplace.safenet"
            "metadata" : [ "string", "about", "site"]
        },
        "site3" : 
        {
            "url" : "www.thisplace.safenet"
            "metadata" : [ "string", "about", "site"]
        },
        "anotherIndex" :
        {
            "isIndex" : true,
            "url" : "www.thisIsAnIndex.safenet"
            "metadata" : [ "strings", "about", "index"]
        }
}

I can’t see this working better than for example, StumbleUpon. While that works - and for a while I enjoyed using it, that time passed. People might tire of cataloguing sites in the way you might wonder would be fun; interesting; and useful.

Even if all you require is a single mouse click… it’s in essence getting the audience to work in order that the machine you’ve built ticks over… and perhaps that is where it falls down. It might work for some but not all.

You might adapt this in some way to reward user-moderators of automated content filters, words and specific phrases suggesting categories… but then you’re stuck with a problem of who you trust. As soon as you introduce incentive, that can be abused; without incentive, why will people contribute.

tldr; the test is would it be better than a simple search engine, doing an automated filtering?

Not to suggest it wouldn’t be useful initially… and hell StumbleUpon is a big success… just not the whole solution to everyone’s interest in finding content.

2 Likes

Yeh I have that feeling also, in a way. But stumble upon is random per interest and not search…

If you can implement your own algo? Tweak it… Weight your results or trust someone to have weighted results by value (web of trust). It might prove more valuable. (might)

Eitherway, I’m not sure how else you could allow this all to work in a SAFE / decentralised fashion though.

Could be that certain parties maintain a list and go to effort (crawling) to do it… and that in turn could reap PtP rewards worth having. I’m not sure.

I think for the moment, as you suggest, it could be useful. And so I’ll be firing in here this week I think. I want to get to grips with the API anyway, so the worst outcome is that I learn something! :stuck_out_tongue:

2 Likes

There is obvious power in engaging users but there has to be motivation… and as suggested money corrupts.

So, there might be real use where there is natural concern. Any breaches of the Golden Rule for example, might engage users who want to limit accessibility to that content… if not for all, then for those who subscribe to their way of thinking. So, for example parents wanting to use some safesite filter, might like option to whitelist sites… and the feedback and trust would be a strong part of that - al la mumsnet.

Finding the niche that naturally wants sites classed and interesting and not-interesting, then could suggest routes to catering for interests.

Certainly there is merit in enabling users to alert others to sites on safenet in ways beyond just providing a simple long list of all sites. Machines can take a lot of the work out of that by filtering the bleeding obvious… so even now I’m grabing language declaration and charset, that can exclude foreign languages or prefer them. Where it gets to a point of opinion, well then you have your opportunity… where is opinion different from machine and logic… I expect you will find religious minds attracted to an ability to play god over what is right and wrong; what should be put in heaven and what to hell. Conservatives the world over rejoice for joshuef has an idea :smiley: … and you can make a profit to compensate for the hassle of dealing with them.

2 Likes

Curration is a valuable task. Might as well get paid for it.

1 Like