Got a URL or group of pages hanging around Google’s search results that you want gone but just can’t seem to get rid of? Have a page that you want to keep from getting indexed? I can help.
Follow the steps below to remove a URL from Google search permanently. Remember, when a URL on your site is blocked it means that Googlebot, the web crawler software used by Google, won’t index or display your content in Google Search results. Additionally, the URL and the content on that page will be hidden from Google Search users. Why would you want to do this?
- To keep your private content/data secure
- Hide content of less value to your audience
- Keep duplicate content from being indexed
- Hide content that is only meant for internal audiences
- Keep third-party content on your website out of Google’s index
If the URL hasn’t been indexed yet – maximum security.
A. Keep private content behind a login
If you have private information such as customer data that would cause issues if indexed by Google, the easiest way to keep it out of search results is to keep it unavailable to the public unless they have a login to the site where they can view the private content.
Keeping content behind a login in a place such as an Intranet will allow folks with proper credentials to access the content, but should keep it secure from being indexed since Google can’t pass through and complete login forms.
B. Password protect sensitive files
If you have confidential or private content that you don’t want to appear in Google’s search results, storing the content in password-protection directories on your site’s server is the simplest and most effective way to block private URLs from getting crawled.
Googlebot and all other web crawlers are unable to access content in password-protected directories.
Note: Keep in mind that all steps below A and B are more like “guidelines” and it is not a guarantee that webcrawlers outside of the Googlebot will obey them. Therefore, for maximum security, utilize steps A and B first.
If the URL hasn’t been indexed yet – normal security.
C. Meta Noindex, Nofollow
If a page is going to be public-facing and available to crawl, the best way to keep it out of Google’s index is to simply add a meta noindex, nofollow to the page’s HTML code.
This tag is a useful tool if you don’t have root access to your server, as it allows you to control access to your site on a page-by-page basis.
Add the following code to the [su_highlight]<head>[/su_highlight] section of your page to prevent only Google web crawlers from indexing the page and following the links.
To prevent most web crawlers from indexing a page, use this tag instead.
It should be noted that some search engine web crawlers might interpret the noindex directive differently. As a result, it is possible that your page might still appear in results from other search engines.
Note: The meta noindex, nofollow method will also allow a page that is already indexed to be dropped out of the index, but it may take a bit longer than the Google Webmaster Tools URL removal tool (see below) as it relies on crawlers to come back and find the meta tag.
D. Robots.txt exclusion
A robots.txt file is a text file that stops web crawler software, such as Googlebot, from crawling certain sections/pages of your site. If a specific directory or page is blocked in your robots.txt, it will not appear in Google Search results (unless it has already been indexed).
The robots.txt file is typically found off of the site’s main root – e.g. …/robots.txt.
If the page or directory in question is not indexed already, a simple robots.txt exclusion will keep it out of Google’s index. This can be done by adding the following code to the robots.txt file (insert the desired URL or directory you wish to block after the disallow):
User-agent: Googlebot Disallow: /example-directory/
To avoid massive robots.txt files, I prefer to focus more on excluding directories rather than single-pages, though it technically works for both.
Warning! Be very careful how you use robots.txt exclusions as (if done improperly) it can cause major SEO health issues and may even cause your whole site to be blocked from the index.
To test which URLs Google can and cannot access on your website, go to Google Webmaster Tools > Crawl > robots.txt Tester
It’s also important to keep in mind that this method has it’s limitations. For example, Google might still find and index information about disallowed URLs from other places on the web.
As a result, the URL address and, potentially, other publicly available information such as anchor text in links to the site can still appear in Google search results.
If a URL or directory has been indexed but it blocked by a robots.txt exclusion, you’ll see the following message in the SERP.
The quickest way to get the above example page out of the search results would be to use Google’s Webmaster Tools URL Removal tool (see below), or you can follow the steps outlined in the meta noindex, nofollow section, but that method will rely on how quickly Google comes back to crawl the page (and whether or not they decide to drop it from the index right away upon crawling and finding the meta noindex, nofollow tags).
If the URL has already been indexed.
E. Google’s URL Removal Tool
If you haven’t signed up for Google Webmaster Tools, you’ll have to sign up and get your site verified before you can remove a URL from Google search permanently if it is already in the index.
Once you sign up, within your site’s profile you’ll want to visit Google Index > Remove URLs to submit a page removal request.
From there, you’ll have the option to simply remove the page from Google’s search results and cache, remove the page from the cache only, or remove the entire directory from Google search.
After submitting the URL removal request, it typically takes Google no more than 1-2 days to process before they remove the URL.
This step may or may not be the final step in the process depending on if the URL has already been indexed or not.
When it comes to the task of trying to remove a URL from Google search results or trying to keep it out in the first place, there are many options available. When it comes to highly sensitive information that should never become publicly available, I highly recommend focusing on steps A and B above. However, for best results, combining all of the methods above will ensure the best results.
- Google Webmaster Support: Block access to your site content.
- Google’s list of crawlers – for blocking specific crawlers.
- Web Robots Database – more information on other search engine bots/crawlers.
- About robots.txt – more information on how to use the robots.txt file.
If you have questions, feel free to let me know in the content below.
Image credit: SmarterWatching.com