Sponsored Links


Other Topics
Sponsored Links



Quote of the Day

"Fans don't boo nobodies."

Reggie Jackson

FEATURED
INTERNET
PRODUCTS
 
Fast Domain Riches
 
Domain Traffic Income Domain Investing...
 
Unlock The Money Making Secrets With Domain...
 
Domain Auction Profit
 
Beyond Domaining - Rapid Domain Name Development...
 




 


Google


Warning: fopen(stopka/index.php) [function.fopen]: failed to open stream: No such file or directory in /stopka.php on line 117

Warning: fopen(stopka/6655d922aa2c1bbb44b556c10262be35.txt) [function.fopen]: failed to open stream: No such file or directory in /stopka.php on line 117


 
Featured Html Articles

404 page not found - a marketing secret
Here's a cheap magic trick for you - I bet you I can guess the answer you will give to the following question: What do you usually do when you click on a link or button and get the classic "404 File Not Found" page? 1. You click on the "Back" button of ...

How to build your own website- More steps
To build a website, there are FOUR METHODS to choose from: FIRST METHOD: Get a PROFESSIONAL DESIGNER to build the site for you: It is expensive in creation process, site maintenance and development. SECOND METHOD: Designing the site with FREE WEBSITE ...

W3C Compliance & Macromedia Flash
A Creative Solution to Validate Pages Containing Macromedia Flash and Work Around XHTML Specifications for Microsoft's Internet Explorer. Remember the "Good Housekeeping Seal?" W3C is the World Wide Web Consortium seal of quality assurance for your ...





Why Robots.txt?
 
I am sure that a lot of you have heard of the file named robots.txt (also called a "robot exclusion file") before. But what does this file really pertain to? Basically you can think of a robots.txt file as a list of rules that search engines follow when they spider your site. A robots.txt file gives you the Webmaster a say in what does and does not get indexed when spiders come to your little corner of the web.


Okay I can hear a few people asking why anyone would want to keep some things from being indexed. I thought the goal was to get indexed, right? Well yes and no, there are quite a few instances when blocking spider access to certain areas or pages is almost a must. Here are several examples of what a person might want to restrict access to: temporary files or directories, presentations, information with a specific sequential order, testing directories or cgi-bin. As you can see just from these few examples there are definitely files that you would most certainly want to keep from being indexed. While there is a Meta tag () available that does in essence the same thing as a robots.txt file it is not currently 100% supported by search engines. Another drawback is that the tag needs to go on every page you do not want indexed, as opposed to one central point of control.

Writing 101
All right I have given you a few vague examples as to what might be included in such a file, essentially there is never going to be a set list of things that should and should not be indexed, a robots.txt file needs to be tailored to your site and your content. There is however a very specific format that needs to be followed when creating a robots.txt file.

Step 1: First a robots.txt file needs to be created in Unix format, or Unix line ender mode. The reason for this is to ensure that there are no carriage returns inserted into your file. I would suggest looking at Notepad++, my personal favorite text editor due to the amount of languages and formatting it supports. Notepad++ is able to create a document directly in Unix format by selecting the "Convert to Unix Format" from the "Format" option. Other plain text editors should be able to achieve the same results however stay away from editors like WordPad or Microsoft Word when creating your robots.txt file. Also I do not recommend using HTML editors for this task.

Step 2: Now lets begin adding some content to our file. A robots.txt file is made up of two fields. The first line is the User-agent line. This line specifies the spider/robot that we are intending to limit or allow. An example of this would be:

User-agent: googlebot


In addition to allowing or restricting specific spiders you can use a wildcard and target all spiders coming to your site. To do this you simply need to place an asterisk (*) in for your User-agent. Example:

User-agent: *

Step 3: Now we will begin to disallow our desired content; either a file or a whole directory can be kept from being index with a robots.txt file. We will do this with the second line of our file the Disallow: directive line. Here is an example:

Disallow: /cgi-bin/


Or for a file:

Disallow: /temp/temp.html


Moreover you are not limited to just one Disallow per User-agent and in fact you can get pretty granular as to what you give spiders access to. Just make sure that you give each Disallow its own line. If you leave the Disallow field empty (i.e. Disallow: ) you are giving permission for all files and directories to be indexed.

One word of caution when writing your robots exclusion file; if you are not careful you can shut one or all spider's access to your site off completely. This would be done by prohibiting access at the root level by using a slash (/). Example:

Disallow: /

If you were to use the asterisk wildcard to specify your User-agent with the above example you would block all search engines from every part of your site.

Step 4: That is all there is to creating a robots.txt file. The final step is to upload it to the root directory of your site: www.yoursite.com/. Make sure that you upload it as ASCII just like all other text files and you are done.

Step 5: Writing a robots.txt file is pretty straightforward after you get comfortable with the files configuration. Once your file is complete and uploaded it is good practice to have it validated; you can do this through www.searchengineworld.com.

Notes: Aside from search engine specific information you are also able to comment your robots.txt file. This is achieved by using the pound sign (#). Though you can place a comment after the Disallow field it is not recommended. Instead make sure that you begin your comments on a new line starting with the pound sign. Example:

# Just making a comment

User-agent: googlebot

Disallow: /cgi-bin/

If you are hesitant about the different steps involved in creating a robots.txt file there are applications available that will help you through the creation process. One application that does this is RoboGen from Rietta Solutions. RoboGen provides you with an Explorer like view that lets you browse the files and directories that you want to restrict access to and creates the robot exclusion file as you go.

In Closing As with all things there are going to be some drawbacks you will need to contend with. With the robots.txt file it is the road map effect that it causes; for those with the desire to attempt to see what you do not want made publicly available the file provides them with a prime place to begin looking. Since all robot exclusion files are named the same and are always in the same place probing people will know where to find it.

Still the pros out weight the cons. And by having a robots.txt file present on your site you keep important or private information from ending up in a search engine's cache making it publicly available to a mass audience. This is what the file is there for. If on the other hand you have something that not only needs to be kept private but also needs to be protected you should make sure that access is restricted through much more secure and appropriate means. Robot exclusion files were designed as a method for Webmasters to delimit the access robots have to their sites, providing robots with one central place to look when they begin the task of indexing. To this end the file serves it purpose extremely well and when used properly it makes the job of a Webmaster much easier.



About the Author
Matt Benya is a co-owner of Primate Studios (www.primatestudios.com) an independent development house
focusing on CGI illustration, Web design and multimedia. With 20+ years of art experience and a degree in Network administration Matt is well suited to translate your needs to the Web.

Html News



DigitalJournal.com

Yahoo! Releases Own Web Browser Dubbed 'Axis'
Tom's Hardware Guide
A few years ago, search giant Google joined the web browser race with Chrome. Now, Yahoo! is getting in on it, too. The company has announced the launch of its own Web browser. Dubbed Axis, the browser is a little bit different to what we're using to ...
Yahoo seeks to shake up search, Web browsingBusinessWeek
Here comes Yahoo's own Web browser -- AxisCNET
Yahoo Launches New Mobile Web BrowserNewsMax.com
DigitalJournal.com
all 601 news articles »

Politico (blog)

NRSC web ads hit Carmona on treatment of women
Politico (blog)
By EMILY SCHULTHEIS | The National Republican Senatorial Committee is using online ads to hit Arizona Senate candidate Richard Carmona for his treatment of women in the workplace. The ads, which are running on Google, Facebook and AOL in Arizona, ...

and more »

ABC Action News

Web fact: Google kills 250000 Internet search links a week
ABC Action News
(CNN) - It may surprise you that there is a rapidly growing number of websites Google intentionally hides from you. Google doesn't want that to be a surprise anymore. The search giant said Thursday it would begin chronicling the thousands of requests ...
Google Says Microsoft Leads Requests for Content RemovalBusinessWeek
Google Deletes a Million Links a Month in Anti-Piracy EffortSci-Tech Today
Google asked to take down 1.2 million pages last monthThe Hill (blog)
Tech2 -Christian Science Monitor
all 346 news articles »

Chatwing Launches Efficient Web Chat Box for Yahoo Mail Users Looking for New ...
San Francisco Chronicle (press release)
Recently, these people are using the Chatwing web chat app in order to find new contacts. Chatwing is a special communication tool that takes minimalistic web chat to a whole new level. This application has also been used in several online communities.

and more »

Washington Times

US efforts on Web said frustrating to al Qaeda
Washington Times
By Shaun Waterman US efforts to counter al Qaeda recruiting online are bearing fruit, and the terrorist group is urging its members not to believe what they read on the Web, according to the State Department. “We can tell that our efforts are starting ...
US uses Yemeni Web sites to counter al-Qaeda propagandaWashington Post
Clinton: US Wars With Al-Qaida on the WebABC News

all 854 news articles »