Training

"Winners make choices,
losers make excuses.
"
Decide to be a Winner!!!!

±Arrows Getting Started

± Phase 1
Planning

± Website Tools

± Phase 3
Conceptual Design

± Helpful Information

± Phase 4
Physical Design

± Phase 5
Testing

± Phase 6
Implement and Market Website

± Other Web Development Items

± Multimedia

± Useful Utilities

± Programming

± Advanced Programming

± Microsoft Office Products

± Computer Maintenance

± Other


Web Design

NOTE: This is a collection of information and links collected over the years that might provide useful information. A Safer Company LLC does not guarantee, endorse or approve any of these links or their scripts. Use these links to other websites at your own risk.

Search Engines and Robots.txt

 

Robots.txt files provide information about which files are not crawled and indexed by search engines.
Note: Robots can choose to ignore the robots.txt file.

404 Errors

When a robot crawls your site and does not find a robots.txt file, it assumes that it may crawl and index the entire site. Not having a robots.txt file can create unnecessary 404 errors in your server logs. To stop unnecessary 404 errors from occurring upload a blank or simple robots.txt file to the root directory of your domain.

top of page

Creating a robots.txt file

Create a text document and save the file as robots.txt in the root directory.

The syntax is
<field>:<optionalspace><value><optionalspace>

  • Comments can be included in robots.txt files.
    • # character - used to indicate that preceding space and the remainder of the line up to the line termination is discarded. Lines containing only a comment are discarded completely
  • The simplest robots.txt file uses two rules:
    • User-agent: <value>
      • The value can be the name of the robot the record is describing access policy
      • The value can be * (astericks) - the record describes the default access policy for any robot that has not matched any of the other records. It is not allowed to have multiple such records in the "/robots.txt" file.
    • Disallow: <value>
      • The valude can be a full path or a partial path URL you want to block;
      • Disallowing a specific file or folder to be crawled will keep it from being indexed and the file will not show up in the search engines
      • An empty value, indicates that all URLs can be retrieved.
      • At least one Disallow field needs to be present in a record.
  • An empty /robots.txt file - all robots will consider themselves welcome.

top of page

Simple Robots.txt

# This will allow all robots to crawl and index all files.
User-agent: *
Disallow:

back to top

Disallowing Files and Folders

#This rule is for all robots to crawl all files except the ones that are listed in the Disallow
User-agent: *
Disallow: /images/ #disallows all files in the folder /images/
Disallow: /example #disallows all files /example.html and all folders /example/index.html
Disallow: /product/ #disallows all files in the folder /product/ but allows the file /product.html
Disallow: /oldindex.html #this file is blocked

top of page

Disallow a Robot

Disallow specific robots from crawling your site or limit which files they may access.

# This example indicates that no robots should visit this site
User-agent: *
Disallow: /

# This denies access to Googlebot-image to any files in your domain
User-agent: Googlebot-Image
Disallow: /

# This specifically denies Googlebot-image to your images file
User-agent: Googlebot-Image
Disallow: /images/

 

Allowing Specific Robots

# Cybermapperhas access to all files and folders
User-agent: cybermapper
Disallow:

top of page

Robots.txt Validators

The robots.txt file should be validated once it has been uploaded to the root directory of your domain.

top of page

Links

top of page

Page last updated: May 31, 2012 10:30 AM

It is all about:
Content and Navigation...

Web Site by: A Safer Company LLC