Do you need a Robots.txt file? When you have a small site, you are probably under the false assumption that you really don't need a robots.txt file. In fact, you may be saying to yourself, "I don't need a robots.txt file because, my site is, small, it's simple for the search engines to find, and since I want all pages indexed anyway, why bother." That was my thoughts in the beginning, as well as, not being aware of what a robots.txt file is/was or what it could do for my site. Thus, I'll try to give you a little insight as to what a robots.txt is, how to use them, why you need them and some basic instructions on creating a robots.txt file.
Define Robot.txt File
To begin we need to know what a web robot is, and is not. Thus, a Web robot is sometimes called spiders or web crawlers. These should not be confused with your normal web browser, for a web browser is not a web robot because a human being manually maneuvers it.
The main use of a robots.txt file is to give robots instructions to what they can crawl and what they should not crawl. This gives you a little more control over the robots. And since this gives you a little more control over the robots, which means you can issue indexing instructions to specific search engines.
Do you really need a Robots.txt file?
Do you really need a robots.txt even if you're not excluding any robots? It's a good idea. Why? First and foremost, it's an invite to the search engines. In addition, some of the good bots may step away from your website if you do not have a robots.txt created in the top level of your website.
Sometimes you may want to exclude some pages from the search engine's eye. What type of pages? 1. Pages that are still under construction 2. Directories that you would prefer not to have indexed 3. Or you may want to exclude those search engines whose sole purpose is to collect email addresses or who you do not what to have your website appear in.
What does a Robots.txt file look like?
The robots.txt file is a simple text file, which can be created in Notepad. It needs to be saved to the root directory of your site-that is the directory where your home page or index page is located.
To create a simple robots.txt file to allow all robots to spider your site you can create the following info:
User-agent: * Disallow:
That's it. This will allow all robots to index all your pages.
If you don't want a specific robot to have access to any of your pages, you can do the following:
User-agent: specificbadbot Disallow: /
Here you would have to name the robot or specific substring. And you will need the "/" because that means "all directories".
For example, let say you do not want the Googlebot to index a page called "donotenter: and your directory is "nogoprivate". In the disallow section you would put:
User-agent: Googlebot Disallow: /nogoprivate/donotenter.html
Now if it's a complete directory you do not want indexed you would put:
User-agent: Googlebot Disallow: /nogoprivate/
By putting the forward slashing at the beginning and at the end, you tell the search engine not to include any of the directories.
Getting Your Code Right
If your Robots.txt file is a more complex piece of code, than it's always wise to do a quick check on the syntax. There are some nice online Robots.txt checks that are free, that you can use to check your syntax. One such free checker is called Robots Text Tester which is free to use through Search Engine Promotion (http://www.searchenginepromotionhelp.com/m/robots-text-tester/robots-checker.php) or go to ClockWatchers (http://www.clockwatchers.com/robots_main.html) and they can help you create a robots.txt file, as well as, give you info how to create a file to eliminate bad bots.
By Vickie Scanlon

Comments (3)
Nice article. I always thought I didn't need a Robots.txt but i'm now in the middle of creating one! One point though, i checked out the 'clockwatchers' website and it mentions that you should put the code straight into the .htaccess file as opposed to the robots.txt.
Which one is right? or do they both do the same thing? Would i be wise to add them to both files or just the one?
Also, im running 2 websites from one server. one is hosted in the root, one is in a directory. Do i need to create a seperate robots.txt for both websites?
MrQwest, If you see your site statistics you can find that search engines are requesting robots.txt file only from root folder, so creating one more robots.txt file in any of subfolders doesn't give any effect.
htaccess is a server ( system ) file that organizes server's responses and requests, redirections, etc...
Please see an example:
http://www.theitarticles.com/mod-rewrite-seach-engine-friendly-urls/49/
Robots.txt is designed for search engines crawlers only.
So these 2 files are doing different things.