In general we want our pages viewed by as many people as possible. However, there may be pages that we do not want viewed by the public. Examples might be pages containing personal information, pages having proprietary company information, and pages intended for viewing by particular individuals rather than by everyone.

There are methods that can be used to prevent private pages from being seen.

Every Folder Has an index.html File

If the URL given to a browser contains the name of a HTML page, the browser will look for that page. If the page isn't found, the browser will display the dreaded "404" error message. If the URL does not contain the name of a HTML page, the browser will look for a page having the default name of index.html or index.htm If neither of those pages is found, the browser will display a list of all the files in the folder so the user can choose which page should be displayed. If you have private pages in the folder, the names of those pages will be displayed, and visitors can open the private pages by double clicking on the names.

To prevent the browser from displaying the list of pages, place an index.html file in every folder. If the index file does not contain real data, make it a blank page.

Use a robots.txt File

Robots are spider or crawler programs that are "owned" by search services and spammers. The robots constantly roam the web and index web pages into their search databases. Before indexing a domain, most robots (hopefully) look for a robots.txt file that gives instructions to the robot about indexing that domain. You can use a robots.txt file to tell the robots not to index pages in a particular folder.

To create a robots.txt, use a text editor, such as NotePad, to create a text file. Put the following statements in the file.

# Comments explaining what the file does (optional)
# Statements beginning with # are comments
User-agent: *
Disallow: /path to the folder (change to be the path to your folder; keep the slash)

The file must be placed in the root directory of the site domain, and the domain can contain only one robots.txt file. The file affects indexing of all folders in the domain, even if there are different web sites in folders under the domain. The phrase User-agent: refers to the name of the robots, and the * indicates that all robots should use the robots.txt file. The Disallow: statement indicates which folder is affected by that statement in the robots.txt file. Each Disallow statement can contain only one path to a folder. Additional User-agent/Disallow statements can be added to control other folders. To add a robots.txt file, you must have access to the root directory of the domain you are using. Be aware, however, that robots owned by spammers probably ignore the robots.txt file.

For further reading about robots.txt files, go to the following sites.

	HTML 4.01 Specification (click on Table of Contents & search for word "robot")
	Robots.txt Files

Use a robots Meta Tag

If you do not have access to the root directory of the domain you are using (for example, you are using a free web site), you can place a robots meta tag in the header of each page. Be aware, however, that not all robots honor the robots meta tag. This tag is explained in the Meta Tags page of this site.

Don't Link to Private Pages

Most robots index pages that are linked in some way from the home page. By not having links to private pages, you can prevent robots from visiting those pages. However, search strategies are continually being changed by search services, and robots might begin indexing folders instead of linked pages.

[ Site Map ] [ Distance Learning ][ Home ] [ Up ] [ Page Titles ] [ Meta Tags ] [ Page Text ] [ Adding URLs ] [ Higher Rankings ] [ Optimizing Pages ] [ Spamming ] [ Hidden Pages ]