|
URL INTEGRITY This section describes how to use a link checker named MOMspider. There is nothing more frustrating to an Internet surfer than error "404", file not found. The dynamic nature of the Internet make the elimination of this error a challenge, to say the least. This is why it is imperitive for you to constantly check the validity of your links. This is especially true if your site collects pointers to other sites. There are a number of free and fee link checkers available on the Internet. One of the very first and still quite useful is MOMspider. Written by Roy Fielding in 1994, MOMSpider or "Multi-Owner Maintenance spider", transverses your WWW site reporting on broken and redirected HTTP-based links. It is written in Perl, and therefore an application available for just about any operating system.To get MOMSpider up and running, you must:
make command. Similarly, installing MOMspider is just as easy. Just download the archive.
The most difficult thing in using MOMspider is the creation of instruction files. Instruction files are files describing what sets of HTML pages should be checked and how to report on what it finds. Below is an simple instruction file to check the validity of the URL at <URL:http:// sunsite.berkeley.edu/~emorgan/morganagus/serial-list.html>
|
EXAMPLE |
# This is a simple MOMspider instruction file # intended to check for broken links in Index Morganagus # Eric Lease Morgan # 02/16/97 - first cut AvoidFile /home/emorgan/.momspider-avoid SitesFile /home/emorgan/.momspider-sites <Tree Name Index Morganagus TopURL http://sunsite.berkeley.edu/~emorgan/morganagus/serial-list.html IndexURL http://sunsite.berkeley.edu/~emorgan/morganagus/spider-report.html IndexFile /home/emorgan/public_html/morganagus/spider-report.html EmailAddress eric_morgan@ncsu.edu EmailBroken eric_morgan@ncsu.edu EmailRedirected eric_morgan@ncsu.edu > |
EXPLANATION |
In a nutshell, this instruction file tells MOMspider to:
Once run, MOMspider creates a report and sends summary information to the specified email address looking something like this: This message was automatically generated by MOMspider/1.00 after a web traversal on Sun, 16 Feb 1997 20:08:30 The following parts of the Index infostructure may need inspection: Broken Links: <http://timon.sir.arizona.edu/pubs/olive.html> For more information, see the index at <http://sunsite.berkeley.edu/~emorgan/morganagus/spider-report.html>Examining the data at <URL:http:// sunsite.berkeley.edu/~emorgan/morganagus/spider-report.html> then provides more detailed information. MOMspider works; it does exactly what it was designed to do. Run regularly, it can help sigificantly with the integrity of your WWW server. Another alternative is the installation of a PURL (Persistant URL) server. See <URL:http://purl.oclc.org>. The PURL server, written and freely distributed by OCLC, is an HTTP server mapping real URLs to virtual URLs. It works much like the Internet names assigned to computers allowing you to keep your URLs (PURLs) constant and only updating your database mapping of your virtual URLs to real URLs. |
Version: 1.5
Last updated: 2004/12/23. See the release notes.
Author: Eric Lease Morgan (eric_morgan@infomotions.com)
URL: http://infomotions.com/musings/waves/