Previous | Top | Next | Search | Comments

Server maintenance

ANALYZINIG LOG FILES

This section outlines methods for doing rudimentary logfile analysis.

HTTP servers generate a lot of logfiles. Depending on your server, it may generate lists of what was accessed, who accessed it, and with what browser. In a nutshell, there are basically two ways to analyse this information. The first and most popular is to apply some sort of analysis tool to your logfiles. Some of these tools include Wusage, Getstats, and wwwstat. But the most popular seems to be Analog.

The second approach is to import your log files into a database and then query the database to create reports. This second approach is less popular, more difficult to implement, but may give you more exact information concerning the use of your server. To make the job a bit easier, you may want to try tabulate, a perl script that outputs tab-delimited text from "common log format" logfiles. If you use Apache, then you can configure it to create tab-delimited files automtically.

ANALOG

Analog is an application that can analyse your log files. It runs on Unix, Windows, and Macintosh computers. It can generate HTML or plain text output. All of its options are compiled into the application, can be overridden through a configuration file, or even the command line. Its fast and its free.

The most common structure of HTTP server logfiles is the "common logfile format." This format has the following structure:

remotehost rfc931 authuser [date] "request" status bytes
Where:
  • remotehost is the name of the computer accessing your server
  • rfc931 is the name of the remote user (usually blank)
  • authuser is the authenticated user
  • [date] is... the date
  • "request" is the URL requested from the server
  • status is the error code generated from the request
  • bytes is the size (in bytes) of the data returned
Analog can read the "common logfile format" (as well as others) and generate reports accordingly.

To use Analog, first you must download and uncompress the archive from the Analog home page or any of its many mirror sites.

If you are using a Unix computer, you will have to edit the analog.h file to define the application's defaults. You must then make (compile) the application. Don't fret. Its easy. The Windows and Macintosh version come pre-compiled and require little extra configuration.

The next step in using Analog is the editing of its configuration file, analog.cfg. This file directs Analog how to process your logfiles. The most important option is LOGFILE telling Analog the exact location of the file(s) to analyse. The next most important option is OUTFILE telling Analog where to save its output. You will also want to edit HOSTNAME, HOSTURL, and BASEURL so your resulting reports make sense.

Next, running Analog will examine your logfile, politely report any errors, create your report, and exit. Furthermore, it will do this quickly!

After playing with Analog for a little while, you may want to explore fine tuning some of its miriade of options thus customizing your reports to your needs. Such options include dates, times, host name exclusion, text/HTML graphic/text output, and browser types, etc.

Analog is worth much more than what you will pay for it.

SEE ALSO

  1. Boutell.Com, Inc., "Wusage" - "Wusage is a statistics system that helps you determine the true impact of your web server. By measuring the popularity of your documents, as well as identifying the sites that access your server most often, wusage provides valuable marketing information. Practically all organizations, whether commercial or educational or nonprofit, need solid numbers to make credible claims about the World Wide Web. Wusage fills that need." <URL:http:// www.boutell.com/wusage/>

  2. Eric Lease Morgan, "tabulate" - A rudimentary Perl script that takes "common log format" logfiles and outputs tab-delimited text. <URL:http:// www.lib.ncsu.edu/staff/morgan/tabulate.txt>

  3. Kevin Hughes, "Getstats" - "Getstats (formerly called getsites) is a versatile World-Wide Web server log analyzer. It takes the log file from your CERN, NCSA, Plexus, GN, MacHTTP, or UNIX Gopher server and spits back all sorts of statistics." <URL:http:// www.eit.com/software/getstats/getstats.html>

  4. Roy Fielding, "wwwstat and splitlog" - "The wwwstat program will process a sequence of HTTPd common logfile format (CLF) access_log files and output a log summary in HTML format suitable for publishing on a website. The splitlog program will process a sequence of CLF (or CLF with a prefix) access_log files and split the entries into separate files according to the requested URL and/or vhost prefix." <URL:http:// www.ics.uci.edu/WebSoft/wwwstat/>

  5. Stephen Turner, "Analog: A WWW Server Logfile Analysis Program" - Analog seems to be the most popular logfile analysis program. It is available for Unix, Windows, and Macintosh computers. Its fast and flexible, but just a tiny bit difficult to configure. <URL:http:// www.statslab.cam.ac.uk/~sret1/analog/>

  6. W3, "Logging in W3C httpd" - This page describes the format of log files for the WC3 server, and specifically the "common log format." <URL:http:// www.w3.org/pub/WWW/Daemon/User/Config/Logging.html #common-logfile-format>

  7. Yahoo!, "Log Analysis Tools (Yahoo!)" - Here is a collection of logfile applications and utilities. <URL:http:// www.yahoo.com/Computers_and_Internet/Software/Internet/ World_Wide_Web/Servers/Log_Analysis_Tools/>


Previous | Top | Next | Search | Comments

Version: 1.5
Last updated: 2004/12/23. See the release notes.
Author: Eric Lease Morgan (eric_morgan@infomotions.com)
URL: http://infomotions.com/musings/waves/