Does Google uses Google Chrome users to discover new unindexed pages?

Fijxu@programming.dev · edit-2 5 days ago

Does Google uses Google Chrome users to discover new unindexed pages?

The Octonaut@mander.xyz · 7 days ago

Are you using Google’s DNS?

Pup Biru@aussie.zone · 7 days ago

DNS will only leak domains (and subdomains); not paths

Fijxu@programming.dev · 7 days ago

DNS doesn’t affect at all in this case

chevy9294@monero.town · 7 days ago

100% if you have enabled “Safe browsing” (which is enabled by default). This also applies to Firefox, but I don’t know if there is enabled by default.

Fijxu@programming.dev · 7 days ago

That makes perfect sense since Google Chrome has safe search enabled by default and most people don’t bother about changing their settings.

HopesBeyondTheSky@lemmy.world · edit-2 2 days ago

deleted by creator

solrize@lemmy.world · edit-2 7 days ago

I had some private pages a while back that linked to unrelated pages on other sites. I had to go somewhat crazy to stop the private urls from leaking to the external sites through referer headers when my users clicked on the links.

If chrome is sending people’s browser histories to Google that is invasive.

dysprosium@lemmy.dbzer0.com · 7 days ago

So how did you stop the referer header from doing that. I’d imagine it to be a clear simple command since it ought to be. Or was it not that straightforward?

solrize@lemmy.world · 7 days ago

It’s easier now that there are some control headers for it. At the time I tried a lot of things like bouncing through javascript opening a new window. Results varied by browser. The simplest way was to inconvenience users a bit by supplying text urls for them to paste into the nav bar, instead of clickable links.

bamboo@lemmy.blahaj.zone · 7 days ago

Do any of the pages in the directory link to other websites? It could be that if you link to a website that is using Google analytics, it may see that referrer header when the person using chrome opened the link. If it knew that your site didn’t have links to the third party site before, maybe that triggered a refresh.

You could test this by making a page linking to CNN or another site which is using Google analytics, and using Firefox (without anything that would block Google Analytics) and click on the link on your site to the other site. if the Google bot checks your site within 10 seconds then you could rule out chrome as the culprit.

Fijxu@programming.dev · 7 days ago

Nope, is just a file indexer that I host publicly. I don’t care about sharing the URL to provide more context.

The user accesed https://luna.nadeko.net/Movies/Ch3k0p3t3/ with Google Chrome

And 10 seconds after, Googlebot scrapes the folder.

Simple as that, I don’t have privacy invasive trackers on any of my webpages/services