Perhaps a site owner could submit via some tool that see them pay for addition to those?.. I won’t suggest the option they limit to just keywords but perhaps if the cost is high that could be considered. The benefit would be the site owner would update relative to their awareness of the sites refresh timing.
The downside would be the index would be only as fresh as owners actioning that; so, perhaps would not work as well as a search engine that did all the work and considered the difference since last pass. Still, I’m suggesting that with a thought that owner push to index is more like decentralized responsibility than relying on a centralized indexing potential point of failure.
Then again, if decentralized the choice perhaps should be there for site owners to submit keywords rather than all words, empowering them to choose.
Data chains appends new information to blocks free of charge. The same could be done for an integrated index. Let SAFE crawl itself and update accordingly. The overhead seems a small price to pay for greater user adoption. Deduplication could keep the index as lean as possible. Any thoughts @maidsafe ?
One advantage of having a SAFE search index early is that it can grow with the growth of the network. So instead of having to crawl billions of pages, in the beginning there will only be hundreds of pages, then thousands and then millions and so on.
The problem is how to make such search index general enough. Google Search has by now an enormously complicated page ranking algorithm that probably includes massive machine learning networks and things like that. Absolutely mindblowingly complicated stuff with perhaps millions of CPUs running millions of lines of source code.
I have an idea of building a SAFE search index by simply hashing the queries and mapping the hashes to pages. A brute force approach that is easy to implement. Unfortunately the tricky part is how to achieve efficient page ranking (plus date range search etc).
Page ranking could happen client side i suppose. Page relevance would be handled by low capacity high cpu vaults or transient low age nodes. So the network basically checks the index for matches, batches them, sends them to the client that then runs the algo on the local machine. Being open source would mean that eventually millions of eyes could result in an efficient system. Especially if SAFE grows as expected.
I think a lot of the search engine will have to be from the users themselves.
Remember that until network computing occurs an application has to run to populate the search engine and this means either someone runs the App themselves or is smart and writes it as a browser addon for those people who wish to help populate the search engine. Maybe reward those people with a portion of the PTD (pay the app developer) rewards. The PTD rewards are likely to be very high for the search APP since a lot of people will be using them.
Only if they migrate across. The two networks use different protocols and if Google don’t implement them they they can’t provide services.
The issue google will have with the SAFE network is how to make loads of money from SAFE. PTD rewards is not enough for them. And when they fear SAFE taking over the internet then they are behind the eight ball even further.
Can you elaborate please? For me this is the core question around whether this network is really any different from the current Internet for the average person. I’m not trying to be awkward, I truly am trying to figure out how things might look in 10 years time.
If my content is not indexed by the likes of Google then only people who know about it will find it, i.e. it’s on the “Dark web”.
It’s taken billions for dollars, and decades of research in order for the Internet to be easy to navigate. Without something like Google it’d be like taking a step back 20 years into the past where you get some semi-relevant information that you’re interested in and then start digging from there.
In order for something like Safe to succeed it’ll need a very good search engine - not necessarily as sophisticated as Google but still it’ll have to be very good. Otherwise, what’s the practical difference between me creating some non-indexed content on Safe and me sharing a link to something I’ve stored (and personally encrypted) in Dropbox?
So, a search engine is needed. The most successful will be the easiest to use. The easiest to use will be the one that has the most investment in it. The one that has the most investment will be the one that is making money. The most successful will be the most powerful and it can drive users to the content it wants to drive users towards…kind of sounds like the current Internet.
In realistic terms, what can Safe do for me that the Internet cannot? If governments want to censor content they will go to the big search engines on Safe as they do on the Internet. These companies will surely still be governed in the same way on Safe as they are on the Internet.
Lastly, I do understand the basic arguments for a truly decentralised global network. I’m just not sure that technology is necessarily going to be the answer. Imagine everyone in the world migrated to Safe tomorrow and Google too, what’s to stop Google doing exactly what it does just now, i.e. censoring results, tracking users preferences, targeted advertising, making a stupid amount of money, etc.
Also, if I personally encrypt a file and then upload it to Google Drive, Dropbox, etc. is it really any less secure than a file I store on the Safe network? If someone wants to view my file badly enough how much harder will it be for them to brute force their way into my Safe account than it would be to crack the encryption. Both I’m sure are prohibitively expensive in almost all cases.
Search could potentially also be decentralized. There would still be the question of how to finance development, perhaps that could be done through an ICO and a token used for something related to indexing and/or searching.
SAFENetwork is far more than encrypting some private files in the cloud. Yes, you can do that - or even use something like Sia - but the scope of SAFENetwork goes far further.
The good thing is, we don’t need to replace the current internet. Google could easily connect to SAFENetwork itself and crawl through safe sites with little modification. They could just present the links with the safe:// prefix to open them in a SAFENetwork browser.
I suspect many apps will do something similar. They will have a dependency on the user having a SAFENetwork account setup and then the app will use it. The benefits of this become more obvious when the app contains personal data which needs to be accessed over multiple devices. If it is very private data, even more so.
So a user doesn’t need to abandon the clear net and all their apps and routines. Some will want the full fat experience from day 1, but that won’t be the average user. Instead, SAFENetwork is likely to knit itself into the rich tapestry which makes up the current Internet.
It’s worth mentioning that myself and @AndyAlban have some plans for this space in the coming few months.
First, we’re releasing SafeCMS, to give people the ability to create content. The plan for V2.0.0 of SafeCMS is to give the app the permissions / ability to automatically append to a shared public mutabledata which contains a list of websites which want to be crawled.
After this, we’re going to give people the ability to consume content. We’re going to build a simple spider / web crawler on top of the SafeNet platform which follows links on participating websites with a permissing (or a lack of a) robots.txt file and builds up an ordered, searchable, compressed index. These indexes will be publicly released as publicly readable mutable data which we will update once daily.
Then we plan to release an app which (on search) downloads the most relevant public search index (caches it locally) and performs the equivalent of a Google search locally before displaying relevant results which link off to other safe:// domains.
Obviously, this comes with limitations:
It requires the search results to be downloaded (which limits the number of pages we can realistically deliver without “taking the piss” from a network bandwidth perspective)
it requires us to actually pay to keep this index updated (but given the index update will only initially be performed daily, this cost will be fairly low and hopefully covered by the small income from running the SafeCMS app through developer network rewards)
It doesn’t scale particularly well if safe:// usage increase quickly - however that’s a problem for 2/3 years down the line.
This is still in the planning stage and you’re maybe 3 months away from seeing our beta for this, but it’s something we’re committed to doing. Sometimes you might see me or @AndyAlban asking really specific questions about SafeNet’s code and wondering why that’s relevant if we’re only working on small projects - it’s because we’re thinking about the bigger picture and having more than one developer working at the same time is allowing us to create the foundation for some pretty cool projects.
Why not include an indexer in SafeCMS. Each website could build an index for itself, that could be used for internal search on that website. The index could also be downloaded by a crawler and combined with other indexes. Potentially various indexes could also be downloaded on the client side from a list of indexes containing sites tagged with some particular topics for example.
Mostly because I’m wary of the vendor lock-in: If I build such a custom ecosystem around SafeCMS, it may stifle creativity with regards to other projects.
I think it’s best that, though I’ll be building them both, as projects they remain separate and I try to be completely neutral. Plus, most of these tools already exist and have maturity, building a custom indexer in to the CMS would require everyone building other websites (without the CMS) to also build or export the same custom files which duplicates work and wastes everyone’s time.
The good thing about a crawler is that all of the onus is on the developer of the crawler to make it work well, everyone else just needs to think about the creative side of publishing content.
It wouldn’t be a waste of time as that would be a means to search each of those sites. You may be right that using it might stifle things a bit though. If someone wants to try new ways of structuring an index an a crawler only accepts certains formats of preindexed data that site might not be indexed.
Why does it have to be google? We need to get away from the mindset that you need a big well known company to facilitate functionality on any platform.
I can remember the days when “webcrawler” and advert ridden “Yahoo” were the major web search engines. Then came this pair who created a cool looking search page that people liked looking at and started to use their (crap) search engine that did not give good results. But it was cool to use and these were two cool guys trying to compete against the giants.
Now all these years later the world thinks google is the internet.
With SAFE we will have a number of these upstarts who have the jump on Google and using the PtD rewards be able to provide a decentralised search engine. And they will be big before google realises they are on the losing clearnet.
Also google has the disadvantage that a lot of SAFE will not be searchable. All that private data that people want private cannot be accessed by others and indeed others do not even know of that data’s existence.
SAFE is a new world where people control their data and no search engine can mine people’s personal data unless those people make it public.
So any search engine on SAFE will rely on submissions and “robot.txt” style files that can only provide pointers to otherwise unlinked public data.
Search engines will become UI interfaces to a search protocol where the index tables are usable by anyone. This will be the basis of decentralised searching in SAFE. Look at what Decorum is doing for social media protocols.
I think people say “Google” - well I do at least - because that is the experience people have come to expect from the Internet. I too remember the old days and while I miss a lot of things I don’t miss sifting through loads of information before finding the content I was actually interested in.
It doesn’t have to be “Google” that provides this service however I argue that it does have to be delivered by a group with plenty of resources - this could be a group of geniuses who are willing to give up their time but I suspect it’s much more likely that profit would be the motivating factor.
Anyway, this is kind of beside my point. Imagine for a moment that Google did decide to come to Safe. How would my experience on Safe differ to that of the Internet of today?
I personally would use Google to search Safe. I’m guessing they could still censor their results and build a profile of me then target me with advertising, potentially sell my profile, etc.
Say something like Project Decorum became the dominant social network on Safe. What’s to stop this from following a similar pattern to many other big players, i.e. start off relatively benign and idealistic and as the money making potential grows the ideals start to crumble. Surely Decorum will be able to identify my account and be able to build a profile for me over time?
Probably won’t happen because the primary goal of the project is to write the protocols to allow social media to have any number of UIs APPs and all use the protocol.
Similar for searching. Have a good read of the replies you got from people because the answers to your questions in your latest post are there.
Basically SAFE turns things on its head as far as searching is concerned.
Data is private unless made available to others and this can be very limited too. So unless the data (incl safe html pages) are made “public” search engines cannot even see them
SAFEsites will use the SAFE_DNS (Distributed Naming System) which also does not allow for crawling.
In SAFE the data is not necessarily a searchable web. In the clear web there is the DNS and registrars to allow crawling of new domains and knowing what domains are out there. SAFE DNS does not allow this.
In other words the search engines need to be given the sites to crawl and then they may only get to crawl small parts of such sites as blogs since the pages could belong to many different people who only allow a closed group to access the pages. Its all owned by the population and not necessarily the web sites owners.
And like decorum project I’d say that a search engine protocol will need to be developed so that different search engine companies can access the shared names of sites and pages that are there to be crawled. Its not something that one company can monopolise because they will need to rely on the public to submit their pages/sites for crawling. And the best way is to have a protocol for this set up. Maybe a MD containing addresses to crawl is then linked into the chain of crawlable site/pages Thus the need for a protocol to define how this is done and most likely one of many simple APPs available to insert this into the list according to the protocol.
Then the search engines can do their magic.
And if you have been listening Google search is becoming very corrupt (by google themselves) and return less useful results month by month. And the latest is that they will not return results for pages over a certain number of years old. So do not look to the likes of google to save SAFE from inaccessibility.
And this is the reason why I say searching on SAFE will have to have a protocol for indexing developed and a whole host of search engines will rise up to suit the needs of different people.
Only if people decide to give up their anonymity and make their data public or give to the likes of a google.
For example, services which crawl sites and turn it into search meta data could be useful. Said meta data could be stored somewhere publicly accessible, allowing search engines/tools to ingest it, without vendor lock in.
Perhaps searches themselves could essentially be a message to/from search curators of this search meta data. Whether this would be integrated into a browser or other tool would be open.
Perhaps people would pay a small amount to have their search meta data created and updated. People already pay Google to add stuff at the top of the list, so I am sure there is a business case for it. A business would be daft not to create search meta data for a small fee.