This is Part II of an essay about my water collection, specifically the process of putting it on the Web. Part I describes the whys and hows of the collection. Part III is a summary, provides opportunities for future study, and links to the source code.
Making the water available on the Web
As a librarian, I am interested in providing access to my collection(s). As a librarian who has the ability to exploit the use of computers, I am especially interested in putting my collection(s) on the Web. Unfortunately, the process is not as easy as the actual collection process, and there have been a number of processes along the way. When I was really into HyperCard I created a “stack” complete with pictures of my water, short descriptions, and an automatic slide show feature that played the sound of running water in the background. (If somebody asks, I will dig up this dinosaur and make it available.) Later I created a Filemaker Pro database of the collection, but that wasn’t as cool as the HyperCard implementation.
The current implementation is more modern. It takes advantage of quite a number of technologies, including:
- a relational database
- a set of PHP scripts that do input/output against the database
- an image processor to create thumbnail images
- an XSL processor to generate a browsable Web presence
- the Google Maps API to display content on a world map
The use of each of these technologies is described in the following sections.
Relational database
Since 2002 I have been adding and maintaining newly acquired waters in a relational, MySQL, database. (Someday I hope to get the waters out of those cardboard boxes and add them to the database too. Someday.) The database itself is rather simple. Four tables: one for the waters, one for the collectors, a join table denoting who collected what, and a metadata table consisting of a single record describing the collection as a whole. The entity-relationship diagram illustrates the structure of the database in greater detail.
Probably the most interesting technical characteristic of the database is the image field of type mediumblob in the waters table. When it comes to digital libraries and database design, one of the perennial choices to make is where to save your content. Saving it outside your database makes your database smaller and more complicated but forces you to maintain links to your file system or the Internet where the actual content resides. This can be an ongoing maintenance nightmare and can side-step the preservation issues. On the other hand inserting your content inside the database allows you to keep your content all in once place while “marrying” it to up in your database application. Putting the content in the database also allows you to do raw database dumps making the content more portable and easier to back-up. I’ve designed digital library systems both ways. Each has its own strengths and weaknesses. This is one of the rarer times I’ve put the content into the database itself. Never have I solely relied on maintaining links to off-site content. Too risky. Instead I’ve more often mirrored content locally and maintained two links in the database: one to the local cache and another to the canonical website.
PHP scripts for database input/output
Sets of PHP scripts are used to create, maintain, and report against the waters database. Creating and maintaining database records is tedious but not difficult as long as you keep in mind that there are really only four things you need to do with any database: 1) create records, 2) find records, 3) edit records, and 4) delete records. All that is required is to implement each of these processes against each of the fields in each of the tables. Since PHP was designed for the Web, each of these processes is implemented as a Web page only accessible to myself. The following screen shots illustrate the appearance and functionality of the database maintenance process.
High-level menus on the right. Sub-menus and data-entry forms in the middle. Simple. One of the nice things about writing applications for oneself is the fact that you don’t have to worry about usability, just functionality.
The really exciting stuff happens when the reports are written against the database. Both of them are XML files. The first is a essentially a database dump — water.xml — complete with the collection’s over-arching metadata record, each of the waters and their metadata, and a list of collectors. The heart of the report-writing process includes:
- finding all of the records in the database
- converting and saving each water’s image as a thumbnail
- initializing the water record
- finding all of the water’s collectors
- adding each collector to the record
- going to Step #5 for each collector
- finishing the record
- going to Step #2 for each water
- saving the resulting XML to the file system
There are two hard parts about this process. The first, “MOGRIFY”, is a shelled out hack to the operating system using an ImageMagik utility to convert the content of the image field into a thumbnail image. Without this utility saving the image from the database to the file system would be problematic. Second, the SELECT statement used to find all the collectors associated with a particular water is a bit tricky. Not really to difficult, just a typical SQL join process. Good for learning relational database design. Below is a code snippet illustrating the heart of this report-writing process:
# process every found row
while ($r = mysql_fetch_array($rows)) {
# get, define, save, and convert the image -- needs error checking
$image = stripslashes($r['image']);
$leafname = explode (' ' ,$r['name']);
$leafname = $leafname[0] . '-' . $r['water_id'] . '.jpg';
$original = ORIGINALS . '/' . $leafname;
$thumbnail = THUMBNAILS . '/' . $leafname;
writeReport($original, $image);
copy($original, $thumbnail);
system(MOGRIFY . $thumbnail);
# initialize and build a water record
$report .= '<water>';
$report .= "<name water_id='$r[water_id]' lat='$r[lat]' lng='$r[lng]'>" .
prepareString($r['name']) . '</name>';
$report .= '<date_collected>';
$report .= "<year>$r[year]</year>";
$report .= "<month>$r[month]</month>";
$report .= "<day>$r[day]</day>";
$report .= '</date_collected>';
# find all the COLLECTORS associated with this water, and...
$sql = "SELECT c.*
FROM waters AS w, collectors AS c, items_for_collectors AS i
WHERE w.water_id = i.water_id
AND c.collector_id = i.collector_id
AND w.water_id = $r[water_id]
ORDER BY c.last_name, c.first_name";
$all_collectors = mysql_db_query ($gDatabase, $sql);
checkResults();
# ...process each one of them
$report .= "<collectors>";
while ($c = mysql_fetch_array($all_collectors)) {
$report .= "<collector collector_id='$c[collector_id]'><first_name>
$c[first_name]</first_name>
<last_name>$c[last_name]</last_name></collector>";
}
$report .= '</collectors>';
# finish the record
$report .= '<description>' . stripslashes($r['description']) .
'</description></water>';
}
The result is the following “WaterML” XML content — a complete description of a water, in this case water from Copenhagen:
<water>
<name water_id='87' lat='55.6889' lng='12.5951'>Canal
surrounding Kastellet, Copenhagen, Denmark
</name>
<date_collected>
<year>2007</year>
<month>8</month>
<day>31</day>
</date_collected>
<collectors>
<collector collector_id='5'>
<first_name>Eric</first_name>
<last_name>Morgan</last_name>
</collector>
</collectors>
<description>I had the opportunity to participate in the
Ticer Digital Library School in Tilburg, The Netherlands.
While I was there I also had the opportunity to visit the
folks at
<a href="http://indexdata.com">Index Data</a>, a company
that writes and supports open source software for libraries.
After my visit I toured around Copenhagen very quickly. I
made it to the castle (Kastellet), but my camera had run out
of batteries. The entire Tilburg, Copenhagen, Amsterdam
adventure was quite informative.
</description>
</water>
When I first created this version of the water collection RSS was just coming on line. Consequently I wrote an RSS feed for the water, but then I got realistic. How many people want to get an RSS feed of my water. Crazy?!
XSL processing
Now that the XML file has been created an the images are saved to the file system, the next step is to make a browser-based interface. This is done though an XSLT style sheet and XSL processor called Apache2::TomKit.
Apache2::TomKit is probably the most eclectic component of my online water collection application. Designed to be a replacement for another XSL processor called AxKit, Apache2::TomKit enables the developer to create CGI-like applications, complete with HTTP GET parameters, in the form of XML/XSLT combinations. Specify the location of your XML files. Denote what XSLT files to use. Configure what XSLT processor to use. (I use LibXSLT.) Define an optional cache location. Done. The result is on-the-fly XSL transformations that work just like CGI scripts. The hard part is writing the XSLT.
The logic of my XSLT style sheet — waters.xsl — goes like this:
- Get input – There are two: cmd and id. Cmd is used to denote the desired display function. Id is used to denote which water to display
- Initialize output – This is pretty standard stuff. Display XHTML head elements and start the body.
- Branch – Depending on the value of cmd, display the home page, a collectors page, all the images, all the waters, or a specific water.
- Display the content – This is done with the thorough use of XPath expressions.
- Done – Complete the XHTML with a standard footer.
Of all the XSLT style sheets I’ve written in my career, waters.xsl is definitely the most declarative in nature. This is probably because the waters.xml file is really data driven as opposed mixed content. The XSLT file is very elegant but challenging for the typical Perl or PHP hacker to quickly grasp.
Once the integration of the XML file, the XSLT style sheet, and Apache2::TomKit is complete, I was able to design URL’s such as the following:
Okay. So its not very REST-ful; the URLs are not very “cool”. Sue me. I originally designed this in 2002.
Waters and Google Maps
In 2006 I used my water collection to create my first mash-up. It combined latitudes and longitudes with the Google Maps API.
Inserting maps into your Web pages via the Google API is a three-step process: 1) create an XML file containing latitudes and longitudes, 2) insert a call to the Google Maps javascript into the head of your HTML, and 3) call the javascript from within the body of your HTML.
For me, all I had to do was: 1) create new fields in my database for latitudes and longitudes, 2) go through each record in the database doing latitude and longitude data-entry, 3) write a WaterML file, 4) write an XSLT file transforming the WaterML into an XML file expected of Google Maps, 5) write a CGI script that takes latitudes and longitudes as input, 6) display a map, and 7) create links from my browser-based interface to the maps.
It may sound like a lot of steps, but it is all very logical, and taken bit by bit is relatively easy. Consequently, I am able to display a world map complete with pointers to all of my water. Conversely, I am able to display a water record and link its location to a map. The following two screen dumps illustrate the idea, and I try to get as close to the actual collection point as possible:
Read all the posts in this series:
- The whys and hows of the water collection
- This post
- A summary, future directions, and source code
Visit the water collection.