If the public can read it then a crawler can read it.
The crawler starts off with one safesite and then finds any links in that and then crawls those safesites. Rinse and repeat.
And like now the “dark” sites rarely ever have a crawler touch them because they are not linked anywhere OR they use ports/security that crawlers obviously cannot get past. Which is the same for SAFE. Unknown SAFE sites cannot be found by a crawler OR are encrypted to the general public
uri = URI.parse(url) # parse url
service, domain = uri.host.split('.') # www.something -> domain = 'something', service = 'www'
html = safe.dns.get_file_unauth(domain, service, 'index.html')['body'] # read safe://www.test1/
html ? URI.extract(html) :  # extract links if page exists
safe = safenet_quick
# load list of urls (creates if doesn't exist)
urls_parsed = JSON.parse(safe.sd.read_or_create('list_urls_parsed', .to_json))
urls_unparsed = JSON.parse(safe.sd.read_or_create('list_urls_unparsed', ['safe://www.test1'].to_json))
# parses "safe://www.test1" recursively
while url = urls_unparsed.pop
urls_unparsed += get_links(url)
urls_parsed += url
# save on the network
Then you can put this script on cron and develop a website that reads “list_urls_parsed” and display the scraped pages. Also, you can open the unparsed list to everyone collaborate with an Appendable Data.
I suppose we could also parse the words in each page, store an index of some form in an appendable/mutable, then an app can ask the index…
EDIT We need to write this one really well before some pain in the neck comes with tailored ads…
the Safe Browser could have a tick box to enable/disable contributing to the crawling effort. It would indeed help to prevent non objective selection of what is indexed or not, what results are displayed or not , in what order…and much more efficiency for pages with little or no links from outside.
You would need to be very careful not to forget to disable it while you browse your super top secret agent forum, though.
I didn’t take time to verify but I’m sure there is a topic about this somewhere.
It is slightly different from clear web, for not having ISP and other servers in the middle on the network; so, there’s no option on a top 1 million Alexa list or similar traffic analysis available… it’s all then from client perspective and not from the network… at least as far as I understand it.
The only change to that would perhaps be some future google-analytics like data from sites using whatever that was but they would be known already and that would just be an attempt at traffic ranking.
So, the only crawl on SAFE I’ve seen is the one I’ve done, which simply has a sensible guess at urls and notes responses.
@happybeing: Crawling SAFE is no different to the clear web
Think so too: Either other pages already known to crawlers link to the new page or a scan of GMail messages stumbles over links to this new page. You can also submit it yourself in their webform for this purpose. All other means of initial indexing derive from this?
About submitting pages yourself, according to @Tim87’s proposal it could be done like this for pages on the Safe-network:
In the thread David seems to like this approach, leading to “answer engines instead of search engines”.
Yup! When Google decides to point its crawlers at safe net, the search problem pretty much goes away. Of course, we may want alternatives, but they have won popularity on the clear web for good search results.
The company won’t do ad scans until after a message hits your inbox  on behalf of non-Gmail users, who haven’t agreed to have their emails scanned under Google’s Terms of Service. Because Gmail’s ad-targeting system draws on every email a Gmail user receives, it inevitably catches some messages from non-Gmail addresses. Scans that take place before emails are available to the user are particularly sensitive, since they’re not yet part of Gmail’s inbox. In real terms, that gap lasts only a few milliseconds.
So data can be used any other way, as stated (in 2014):
GOOGLE HAS UPDATED its privacy terms and conditions, eroding a little more of its users’ privacy.
Our automated systems analyse your content (including emails) to provide you personally relevant product features, such as customised search results, tailored advertising, and spam and malware detection. This analysis occurs as the content is sent, received, and when it is stored.
Google does no evil… by redefining good. The small evil for the greater good fallacy is just another symptom of conservative thought that leeches into every area, tempting those who can with more power and wealth.
Reasons we need SAFE to help avoiding those who ‘know’ best what is good for others.
All the above+… it’s not hard to do. Those who put up sites tend not to be trying to hide them. Naturally, I doubt that I guessed them all and I know of no sure fire way to catch everyone that exists.
What I meant is mostly that if Google indexes Safe ( and we can expect they will ) , then their issue with searching Safe is resolved.
The results they will serve are by design oriented for their profit, and do not necessarily serve the common benefit ( some results can purposedly be ommited, or buried deep in the ranking ).
So even if they solve Safe searching, we will still need to create a non profit oriented, decentralized search. ( just like we still need it for the clear web, btw )