What is a Robots.txt File?
Posted
January 8, 2009
in
Search Engine Marketing
A Robots.txt file is simply a file that is placed on a website’s server with the intention of telling the search engine spiders not to crawl and index pages or sections of a website that would be otherwise publically viewable.
What is the Purpose of a Robots.txt File?
1. Block the search engines from indexing an entire site.
2. Prevent specific sections of a website to be blocked from being indexed.
3. Communicate specific indexing instructions to individual search engines.
Does My Site Need a Robots.txt File?
With the emergence of so many search engines over the past decade, Robots.txt file have become standard protocol for search engine spiders to attempt to locate and view when they first arrive at a website. Search engine spiders are trained to start their crawl looking for the Robots.txt file; this allows the spiders to know which pages to index and which pages should be disallowed from being indexed. Even though your site may not have any pages that you would currently want blocked from being publically viewable it is still a recommended best search engine optimization practice to include a Robots.txt file on the root folder of your site. It will act as an invitation with the search engines to successfully communicate the message to crawl all the pages on your site.
Why Would I Want to Exclude Spiders from My Site?
1. Your website is still in development and you do not want your unfinished work to be found by the public.
2. The information presented on your page only pertains to your specific audience and is of no interest to an outside viewer.
3. You want to exclude common pages such as Thank You and Privacy Policy pages from the search engine indexes because there is no benefit for a web visitor to enter the website onto them.
4. You do not believe in a specific search engine’s procedures or it is a known bot that collects data you do not want shared, such as email addresses.
5. You use doorway pages on your site (we highly recommend not using them) and you want to exclude specific search engine spiders from viewing them as they will have a negative impact on your optimization efforts with search engines such as Google and Yahoo.
How Do I Create a Robots.txt File?
A Robots.txt file is simple to create within Notepad or any text editor using the layout instructions below. Once it is created, upload the file to the root folder of your website. Each entry in a Robots.txt file only contains two lines of text.
Layout for a Robots.txt file:
User-Agent: [Spider or Bot name] Disallow: [Directory or File Name]
Examples of Robots.txt Files:
1. Exclude file or directory from a specific search engine spider: User-Agent: Googlebot Disallow: /section/examplefile.htm
2. Exclude a specific section of a website from every search engine spider and bot: User-Agent: * Disallow: /examplesection/
3. Allow spiders to index the entire site: User-agent:* Disallow:
4. Disallow spiders from indexing any part of the website: User-agent: * Disallow: /