Mississippi's Leading Website Design Firm on Google

A robots.txt file is a great tool to limit the capabilities of a search engine spider. This helpful file can block certain files and directories from being spidered for your website visitors. Search engines like Google, Yahoo, Bing, etc first look at your robots.txt file to assist them in finding all of the files in your site. If you do not have a robots.txt file then they just feel around until they find a page that they can index. So, it is in your best interest to have a spider friendly robots.txt file in your arsenal of SEO tools.

In this article we will show you how to properly write a spider friendly robots.txt file that will help you get listed on the major search engines and can help improve your page rank. Remember, without a robots.txt file, a search engine will spider your site until it finds a page without links....then it stops. Hopefully the search engine spider has found all of your pages before hitting the road block, but if you are unlucky it could hit that unlinkable page first and its search is over. For example,a popup page will most likely not have any links going in or coming out so the spider will stop at the popup...but with a robots.txt file you can block access to such pages so your spiders never get stuck in your web (*ha ha, I know, a lame joke*).

First thing you wanna do is go to Google and type "site:www.yourdomain.com" in the search box. This will give you a list of files that are indexed by Google. Look through the list and find files that you don't want your visitors seeing. These are the files you need to disallow in your robots.txt file. Ok, now that you have done your research with Google, you should do the same with all of the other major search engines becuase they do not always pickup the same content.

Writing the Robots.txt File

Ok, now on to the juicy part. Writing the actual robots.txt file. Lets start out with making this for all spiders. Here is an example of a simple robots.txt file.
User-agent: *
Disallow: /hiddenDirectory/
Allow: /hiddenDirectory/*.html
Disallow: /hiddenFile.php

Lets break it down. The first part designates the following text for all user-agents. The next part breaks down the allowed and disallowed directories and files. The hiddenDirectory is disallowed by the spiders, but the files with a .html extension are allowed. The hiddenFile.php is not allowed as well.

Also it is good practice to have your sitemap or sitemaps included in your robots.txt file. You can include this at the bottom of the file. This will help the search engines even more.
Sitemap: http://www.zero3computers.com/sitemap.xml

You can get even more advanced by adding certain user-agents access where others are not allowed. Everything following the declaration of the user-agent affects the user-agent you declared, so order does matter. You can do that by adding the following:
user-agent: user-agent name

Here is a list of all of the user-agents you can use: http://www.user-agents.org/

Well, that is about it. I hope this guide sets you on the right path to getting a great robots.txt file. If you have an questions or comments, don't be shy...leave us a comment below.

Nibbler report for zero3computers.com >