SAFE Search App

Hi Everyone,

Inspired by the SAFE Network, I started trying to learn a bit of coding a while ago, and as a project for that purpose I’ve made a little basic search app on Alpha 2. It’s just a basic keyword search and has almost zero security, but you can add or edit your own site, and search what’s already there.

safe://search.safeindex

More interestingly though, I was keen to start a discussion about search from a slightly different point of view to the way it has been discussed before.

Generally, we tend to think of search as everything that Google does, but obviously that’s actually a set of different but interlinked functions, and I’m curious as to the possibilities of separating out those functions, in order to make a search system that is more transparent, resilient and decentralised.

The following points are some of the steps I’ve noticed that are necessary for a search system, and although I have only a very cursory knowledge of the field, sometimes I think it’s interesting to look at things in this way, if for no other reason than to keep in mind that ideal of transparency, so that idiots like myself don’t suddenly get to a point where they feel gratuitously controlled by algorithms!

  • Basic indexing is obviously a pre-requisite of search, and at its most basic level the information for this is held (though not necessarily in an accessible format) in the NRS system.

  • A database of all the relevant information about sites needs to be built on top of this. The format of this data, how it is sourced, how it is stored, and who owns it, are themselves all hugely complicated and technical questions, and could easily be broken down further.

  • The search function itself, which is the bit that people first think of, is vaguely attributable to algorithms, and returns the results that whoever has written the algorithm thinks that we most want to see. These can be very useful, but are in themselves a huge driver of centralisation on the web.

  • Security, and deprecation of malicious sites, which may be achieved using many of the same objectives of the search algorithms.

Interested to hear anybody’s thought on how any of these elements could be de-centralised, and whether some of them could or should be taken care of on a network level.

32 Likes

Really impressive stuff to just jump in and tackle this kind of thing, all while learning to code. Chapeau!!

10 Likes

Thanks Jim! Was intrigued by your suggestion of Maidsafe having a hand in search over on the NRS thread!

3 Likes

All later down the pike… you’re ahead of the game on this one :grinning:

8 Likes

I tried it out after seeing it on the dev forum and it was a very cool experience! It looks like you had room to unlock more features which I’m looking forward to. You really tackled this head on, huge respect!

11 Likes

Cheers Nigel,

Yeah, the admin app does a few bits that aren’t included on the front end side, and I’m looking at the best way of minimising time spent waiting to authorise and connect; some trivial changes, some much less so. I think I’ll save any updates though for the next testnet, not sure when would make most sense to put it up there again, maybe when the next generation of FFI APIs are released.

5 Likes

I am wondering @JimCollinson if the procedure for registering a Name in the NRS could also involve two other functions

  • Optionally allow the user registering the Name to also have the Name added to a list of new Names registered. This would allow the search system to scan the list of new names in order to crawl the site.
    • optional so that if people do not wish to add the site to the searchable sites then thats allowed.
    • Obviously also there needs to be the file on the site to allow restriction of crawling the site as there is currently on the web. Some sites do not wish to have all their site crawled by a search engine and also the sites opting out might be linked to from elsewhere and so they would also have the file saying crawl nothing.
  • reverse lookup for names. We need this since Names in the NRS cannot be scanned as there is no method to do so. Also there is no method to do a reverse lookup from an XOR address. This allows the search crawling to know the site that a XOR link refers to.
3 Likes

Yeah, that’s the sort of thing I had in mind.

I was wondering if it might even be possible to return the results from a scraper/crawler to the website owner, so that they owned them (and paid for the data!) and they were then linked to by the basic list of sites. The owner could then subscribe to the scraper to update their details every day/month/hour, whatever.

This would suggest a structure with the list of sites as a central spine, the formatted info about a given site owned by the site owner (certified by the scraper,) with information about popularity, records of malice etc. as a more proprietary/closed system owned by the search company, or operated in a Wiki sort of style.

3 Likes

That might be difficult since who runs the APP to do this? I would think the better way (I am contradicting myself a little above) is to have the site owner run one of the suite of search APPs which crawls their own site obeying any “robots” rules. Then they do own the search/index data automatically and then it writes a link to the data in the search “database”

The APP searches the site for the owner and generates all the meta data which the site owner owns.

Sorry if that is what you meant above, cause if you did I didn’t read it properly

2 Likes

On the database side, why not promote public databases that search engines can use to do their thing, whether the owner is also the owner of the site, groups of people that look for perticular sites, like literary works, or even a database that links to other databases.

We can leverage communities to collect, create and maintain databases that are usefull to them, small low cost undertakings that I’m sure not many communities would mind, heck this forum is community ran right?

Want to exclude nsfw content? simply use a database of database(s) that exclude them, or reversed if that’s your thing.

As long as we convince everyone to store the data in a perticular way that could work right?
(or that triple data connection thing I just can’t wrap my head around, no idea how it works.)

What are your thoughts and primary problems with this idea?

2 Likes

When I’ve thought on this previously, I always imagined something similar to what you’re describing here @david-beinn @neo @isntism.

Users collating that data (bookmarks as a rating), and then optionally making that data available for others to use (which, w/ PtP would earn them money). Add in curation / collation (some search service apps) and you could (I imagine), get a more humanised version of search (and as you suggest @david-beinn, your choice of algo etc).

I think there’s a lot of possibilities on SAFE for this. Great to see more folk thinking around it!

8 Likes

A little bit off the main track but a bit critical nonetheless I guess…

The discussion in the other topic reminded me that names were store in clear text in early versions - has this been changed by now? Since it would be pretty simple to just use the name as encryption key (the vault owner just stores the AD containing the links with identifier hash(name) and doesn’t know the name this AD represents)

2 Likes

Aha! I have seen similar potential. It must be done!

1 Like

I think we’re on the same page here. I had five minutes before leaving for work this morning so maybe didn’t phrase myself very clearly.

As I see it, running the app in each instance would only scrape and generate metadata from the given site, (rather than crawling site to site,) which is why I was preferring the term scraper. Not sure if you had something different in mind in that regard.

I guess one issue here would be that writers of the scraper software might have different ideas of how the metadata should be formatted. Perhaps some early consensus linked to RDF spec could be useful here??

In terms of structuring this metadata so that it is essentially a distributed database that can be searched in an efficient manner though, I am way, way out of my depth!!

Yeah, this is very much the sort of thing I had in mind. Not sure of the necessity of having different communities maintain different databases though.

Agree this is an issue, and deciding what that way is could be a pretty big task in itself. Like I said above, maybe RDF could be an inspiration for this?? Not something I know very much about though.

1 Like

I like this idea as simple and not easy to subvert.

Just to clarify, are all the ideas you suggest here solutions on the search side, rather than how to generate an index of the basic data? I only ask because I’m not sure if there’s some alchemy that goes on in the way Google puts together the data gleaned from searchers and their chosen destinations. I assume that that knowledge is useless without the basic metadata collected from sites, but people focus so much on the algorithms generated from search data that I wonder if I’m missing something!

Hi Riddim,

Not sure I quite follow here. Are you talking about an improvement I could make to the app, or a network feature?

By AD I’m guessing you mean Appendable Data?

Are you meaning the data stored would be viewable by vault owners? I’d always thought that even unencrypted public data was protected from vault owners by self-encryption?

I’m maybe completely misunderstanding your point though.

It depends. If it’s incentivised enough, enthusiasts may do such things, partly for their own joy, partly for $$ of having The Best music index out there. (Or more likely subgenres, w/ increased specificity eg)

1 Like

Thanks everyone for the responses so far. I’m going to try and tidy up Github and make a readme and a project plan, but just to emphasise, the way my app works is currently very low tech and unsustainable, and only in the absence of anything else, hopefully might give people more incentive to put things on the upcoming test networks.

In an ideal world it would be nice to plot a step by step course from here to realising some of the ideas we’re talking about, but my knowledge and skills are very limited, and I’m aware this is not a small or easy task.

Look forward to hearing more ideas, and would be great if anyone has the skills to pitch in with helping to build the basis of something a bit more sustainable!

4 Likes

Yeah RDF hasn’t revealed it’s mystical mysteries to me either, maybe someone reading this knows of some good infographics that’ll help.

As for communities managing different databases, it’s certainly not needed at all, I just believe that small groups of devoted people are less likely to sell out or fail, mainly because they’re devoted, which is good if it turns out that maintaining one of these databases actually costs money instead of making any.

And there will be bad ones popping up, containing content most find disgusting, which is why I’d have people maintaining collections of databases, and maybe even collections of those, I’m hoping more layers removes more of the filth.

edit:
I’m not too good with code though I generally understand how things work, so if you ever need anyone to bounce ideas back and forth with, feel free to pm, I’d love to help.

2 Likes