I Need To Get Thousands of Pages Out of Google
September 8th, 2009This post is prompted by a client’s need to delete thousands of pages from Google’s search results. His NSFW web site has user profiles which were unintentionally allowed to be indexed quite some time ago. Although Google has a page to manually submit specific URLs to be removed from search results, there is apparently no batch function provided to submit thousands of URLs.
Although Google does provide a function to delete entire directories, these URLs reside in a directory which also has other files which need to remain in the search results. Hence the problem.
I have altered the robots.txt file for the site to specify that the Googlebot should have ‘noindex’ and ‘noarchive’ applied to the file in question. The thousands of URLs are created when the file is accessed with a query-string, e.g., sitename.com/member.php?u=01234
I have also modified the member.php script so that when it is accessed by the Googlebot (or some other spider) it should return a HTTP 404 header, indicating the file has been deleted.
Finally, inside the HTML <HEAD> section of the page, I’ve included the following <META> tags:
<meta name="robots" content="noindex,nofollow"> <meta name="googlebot" content="noindex,nofollow,noarchive">