Training

"Winners make choices,
losers make excuses."
Decide to be a Winner!!!!

Free eBooks - General web design , accessibility, usability, and CSS and (X)HTML
Content Management Systems (CMS)
Google Tools

± Useful Utilities

± Programming

Javascript
- noscript - when javascript is not enabled
Perl/CGI
PHP
MySQL
XML

± Advanced Programming

Accepting Credit Cards
Microsoft Vista Cookies
FavIcon - The icon beside the title in the browser
Foreign Languages
Translators
Game Development
.htaccess - custom error messages and protect folders with passwords
Icons
Regular Expressions - Regx - used for search and replace in PHP and Perl
SSI Server Side Includes
Web Development

± Microsoft Office Products

Microsoft Office Alternatives - alternatives to purchasing Microsoft Office
File Sharing
Templates
MS Word
MS Excel
MS Power Point

± Computer Maintenance

± Other

Web Design

NOTE: This is a collection of information and links collected over the years that might provide useful information. A Safer Company LLC does not guarantee, endorse or approve any of these links or their scripts. Use these links to other websites at your own risk.

WARNING: Many features of this website require JavaScript. You appear to have JavaScript disabled or running a non-JavaScript capable web browser.

To get the best experience, please enable JavaScript or download a newer web browser such as Internet Explorer 8, Firefox, Safari, or Google Chrome.

Search Engines and Robots.txt

404 Errors
Creating a robots.txt file
Simple Robots.txt
Disallowing Files and Folders
Disallow a Robot
Allowing Specific Robots
Robots.txt Validators
Links

Robots.txt files provide information about which files are not crawled and indexed by search engines.
Note: Robots can choose to ignore the robots.txt file.

404 Errors

When a robot crawls your site and does not find a robots.txt file, it assumes that it may crawl and index the entire site. Not having a robots.txt file can create unnecessary 404 errors in your server logs. To stop unnecessary 404 errors from occurring upload a blank or simple robots.txt file to the root directory of your domain.

top of page

Creating a robots.txt file

Create a text document and save the file as robots.txt in the root directory.

The syntax is
<field>:<optionalspace><value><optionalspace>

Comments can be included in robots.txt files.
- # character - used to indicate that preceding space and the remainder of the line up to the line termination is discarded. Lines containing only a comment are discarded completely
The simplest robots.txt file uses two rules:
- User-agent: <value>
  - The value can be the name of the robot the record is describing access policy
  - The value can be * (astericks) - the record describes the default access policy for any robot that has not matched any of the other records. It is not allowed to have multiple such records in the "/robots.txt" file.
- Disallow: <value>
  - The valude can be a full path or a partial path URL you want to block;
  - Disallowing a specific file or folder to be crawled will keep it from being indexed and the file will not show up in the search engines
  - An empty value, indicates that all URLs can be retrieved.
  - At least one Disallow field needs to be present in a record.
An empty /robots.txt file - all robots will consider themselves welcome.

top of page

Simple Robots.txt

# This will allow all robots to crawl and index all files.
User-agent: *
Disallow:

Disallowing Files and Folders

#This rule is for all robots to crawl all files except the ones that are listed in the Disallow
User-agent: *
Disallow: /images/ #disallows all files in the folder /images/
Disallow: /example #disallows all files /example.html and all folders /example/index.html
Disallow: /product/ #disallows all files in the folder /product/ but allows the file /product.html
Disallow: /oldindex.html #this file is blocked

top of page

Disallow a Robot

Disallow specific robots from crawling your site or limit which files they may access.

# This example indicates that no robots should visit this site
User-agent: *
Disallow: /

# This denies access to Googlebot-image to any files in your domain
User-agent: Googlebot-Image
Disallow: /

# This specifically denies Googlebot-image to your images file
User-agent: Googlebot-Image
Disallow: /images/

Allowing Specific Robots

# Cybermapperhas access to all files and folders
User-agent: cybermapper
Disallow:

top of page

Robots.txt Validators

The robots.txt file should be validated once it has been uploaded to the root directory of your domain.

Motoricerca Robots Checker

top of page

Search Engines and Robots.txt

404 Errors

Creating a robots.txt file

Simple Robots.txt

Disallowing Files and Folders

Disallow a Robot

Allowing Specific Robots

Robots.txt Validators

Links