Since Dec 1st last year I have the following robots.txt in my webroot:
User-agent: * Disallow: /
And still after 6 weeks I receive hits on my website from for example lj612201.inktomisearch.com. This is the Yahoo search-engine webspider.
I find this unacceptable search-engine behaviour. For that reason I have blocked 18.104.22.168/16 on my firewall.
How hard is it to build a search-engine which takes the effort to fetch the robots.txt regularly? Google does this. Although I can still find my website when I search on Google. Does anyone know how long it takes for Google to refresh its contents?