Safe-Search, bringing content discovery to the SAFE network

I would be curious to get your opinion, and @Shane’s, on my plan/suggestion for someone (MaidSafe? Us? We? Me?) to philanthropically dictionary attack the network in order to setup a “public reserve” for single word and proper name domains so that domain squatting is minimized and access is granted for Everyone to using these common language names, forever. I suppose you can think of is as an analogy for “public reserves” for plants and wildlife that are setup in order to protect endangered species from poachers. This would also give you an initial known set of seeds from which your crawler could expand out from. I would say MaidSafe or MaidSafe Foundation are the preferred entities to actually do this, but if they don’t I think anyone who agrees with the general idea should ban together and form a charity/foundation to get it or something like it implemented. I suppose the only creative alternative would be some kind of pet name system that you might be able to work into SafeCMS and the Safe Crawler.

1 Like

If I’m being truly honest, I’m against any sort of decentralisation attack.

SafeNet is being designed first and foremost for freedom and anti-censorship. If Maidsafe have control over a central domain name structure, they also have control over the service domains under it, since Maidsafe are subject to both British and European law, as well as foreign copyright laws respected by the EU, this provides a simple method of censoring things: Simply threaten to sue Maidsafe until they remove the offending service from their domain.

I think that we really need to break away from this concept of only certain TLDs being valuable or usable, the (not-so) recent expansion of GTLDs by the ICANN organisation is a good example of this, these days a “.travel” domain is just as valid and discoverable as a “.com” domain.

3 Likes

That’s not how the safe dns works. Public ID controls the base safe://shane, or safe://i-am.shane
anyone could have safe://shane.the-guy-who-made-safecms. And the general idea is not something that MaidSafe would be in control of, just something that would initially be set aside for public use like a wiki page but more natural/better/safer. But this is off-topic for this thread, I just wanted to mention it briefly.

4 Likes

Really amazing work! So great we get more an more projects started on SAFE. One group working on identity-management, the others on an app-store, blog tools and now on search.

10 Likes

This cannot work by design. I would have the same issues you have if this were possible. In fact all that can be demanded is we stop work on it, but that means prison/exile for me as I will not stop. The teams are remote and I would hope they keep going vie community funds or some mechanism, perhaps somebody would release maidsafecoin to them, I am pretty sure they would :wink:

In any case we cannot stop anything on the network. If anybody did see something where we could then it would be removed. We should all watch for such things, just in case.

19 Likes

Maybe we even want to not use any longer a unique naming system …?

There is a pretty old topic about a petname system that i like a lot to be honest

A petname system obviously only makes sense if you can share your name space with your friends and connect them - if you can send another person a link that is valid on both ends…

… Large benefit would be that I would be able to visit your blog by going to safe://shane no matter how many other shanes are on this planet and there couldn’t be Domain squatting…
But yes its very different to what people are used to see in the internet…

Just thought I’d mention it - maybe it’s something you might think about some moments

Ps:

You know what we had some interesting proposals back then - I’ll just throw in another idea by @Seneca

Pretty interesting as well - if you search the forum we had other ideas too (but those two left the deepest impressions in my memory)

6 Likes

Since most of the work on this project is fairly heavy back-end code, the updates will most likely be fewer and further between than on Safe-CMS, but we have a little update here with some mock-ups we’ve done for the design. We’ve gone as simple as possible, without any spots for adverts or extraneous information - people will come to the app to find something and we should make that journey as quick and easy as possible.

This is the home page of the app (after the SAFE-authentication cycle is complete, but it will use the same pre-loading screen as the Safe-CMS project does):

This is the search results page, obviously you’ll get more than 4 results per page, it’s purely an example:

14

Our next major development update will be in around 2 weeks where we will be posting things like application diagrams, details about how the crawler will work, rate limiting, support for robots.txt and sitemap.xml, etc.

Thanks all, @AndyAlban & @Shane.

34 Likes

Great work! It’s really good to see these new apps being developed by you guys!

11 Likes

You know, rate-limiting shouldn’t be an issue in this environment (except for links leading out of safe://).

It was important in the old internet because it involved being kind to the exclusive resources of a specific server so that spider/robot requests didn’t disrupt other visitors or use up a large amount of a paid resource (bandwidth, etc) quickly.

In safe, the crawling doesn’t impede on other user’s ability to access data resources. However, you may still want to rate-limit in order to manage your own network-write costs.

1 Like

Our rate limiting plans aren’t based on domain, but rather on total daily usage, rather than blitz through our crawl in 10 minutes hitting the network in a really bandwidth intensive way, we plan to amortise our crawls (in the early days while the vault network is still small) across the entire 24-hour pre-index period, so the network doesn’t see any unusual data-usage spikes.

4 Likes

Oh, that’s a good point. It wouldn’t be demands on specific servers but the network as a whole – especially in earlier stages when the infrastructure is still developing. Good call.

2 Likes

@Shane @AndyAlban and any developers interested in LinkedData / RDF, see the ‘Solid Application Data Discovery’ link I just added to my post above

2 Likes

Missed this thread last week. Nice work! Great to see all the apps coming in recently :tada:

Couple of Qs:

How is the data being stored? (What sort of data structure is being used). (as @happybeing noted, available as a datatype/in a structure readable by anyone/app? it would be great if we as a community could come up with some standardized data structures for search).

Do you have a typetag you’re using for the data (or plan to use one)?


Some other thoughts:

Toootally.

A centralized crawler/uploader/search app shouldn’t really be needed for SAFE, in my (super idealised safe-world) opinion. Is there anything blocking users from doing some crawling and uploading data sets? (eg. using a specific tag / or using a public MD with open read/insert as you suggest, that could enable anyone to crawl, and add to the data set, I think).

This also allows users to own their data that they crawled. Which is another great benefit of SAFE IMO.

Crawling and searching don’t necessarily need to be the same application/site (at least, as @neo points out, is the intention with APP rewards). They could very much be two separate things.

As long as the data is open/available and standardised in some way (which we should figure out just how that would be*), then there’s a lot of flexibility brought in for users / consumers / other devs to build upon (just as search indexes).


Heh, @Shane @AndyAlban sorry for the barrage of Qs here. I’ve been thinking about search on safe / what that might mean for some time (but never got round to tidying up my dabbles :expressionless: ). Great to see work coming out in this area! I’d love to hear what you think on the above.


(* There are some threads on this: indexed data; an idea built around some older network data types, but a lot should still be applicable).

13 Likes

I’m also very interested in playing with this once you publish it @Shane @AndyAlban , it’s really good someone is working on this ideas already !

While reading the posts, the first thing I started wondering is, in the long term but not necessarily the first versions of it, how I/users will trust the results are ranked effectively by relevance and not by any other interests?
Myself as a user of a decentralised network wouldn’t like to (and probably won’t) use a search tool if I cannot be sure the results are not being manipulated in any form, it feels I’m loosing part of the decentralisation aspect. Thus, how can/will I be sure the crawler you run in your servers to populate the network with search indexes and ranks are not running a modified version of the known algorithm? it sounds like the execution of this algorithm should be somehow decentralised or at a certain level at least, perhaps the search site provides the option to run a crawler as it has been suggested above plus the option to just publish specific sites/files to be indexed.

22 Likes

I’d always been imagining something more personal to affect the results of search. Either some custom params you can tweak, or, (more super idealised) like using web of trust to enhance the results from people you care about. Or people you deem to be well informed. (somewhat like the idea of liquid democracy).

A prominent scientist having more weight / followers on science topics for example. And all they have to do is ‘like’ a site or something for that weight to be carried across into the algo.

While that makes individual page ranking unlikely, that could be left to the sites themselves. So maybe a stack overflow site has a lot of stars/likes/whatevers, in general. And so for programming, its got a good ranking, and then our search could pick up the sites self-made index and search that.

4 Likes

I will be posting (hopefully) satisfactory answers to this over the next couple of days - I’m working on client work at the moment, but I have seen this and plan to reply soon. :slight_smile:

10 Likes

See my message here in a discussion about embedding thumbnails of SAFE sites on the forum. Is the site crawler you have indexing for search, also gathering thumbnails of any kind that could be used to show what a site looks like underneath its link?

@Shane @AndyAlban @bochaco , @joshuef , I have also given the question about a ranking system some thoughts. The barrier to entry will probably be lower on safe network than there is on todays internet because you will not need the hardware infrastructure to compete on safe as you do need on the internet if you would like to compete with facebook or other large enteties. In comparison between the stock market to the coin market, where the coin market has alot lower barrier to entry. The lower barrier to entry opens up for actors that are not serious or trying scam and so on, therefore a ranking system will probably be beneficial to the Safe network and I think that it would also be good to try to prohibit bots that will try to manipulate the ranking system. Solutions could be votes for ranking with captcha or time spent on sites or similar, maybe others have more or different good solutions that will work out great.

3 Likes

@tobbetj have a look at web of trust that project decorum is aiming to implement.

https://www.project-decorum.com/endorsements-sharing/

3 Likes

Definitely agree with this and was along the lines I was thinking too.

Perhaps a standard crawled data format can be established, then created by the user, so that many search engines can read these at will? Something like a json based index file.

There must be some sort of format that crawlers output after going through a site already, but I would assume this is propriotary?

Obviously, validating that the results are honest will be a challenge, but presumably a bad search map could result in the site being ommitted from searches, etc.

4 Likes