Archive for May, 2008

Code4Lib Journal Perl module (version .003)

Wednesday, May 28th, 2008

I hacked together a Code4Lib Journal Perl module providing read-only access to the Journal’s underlying WordPress (MySQL) database. You can download the distribution, and the following is from the distribution’s README file:

This is the README file for a Perl module called C4LJ — Code4Lib Journal

Code4Lib Journal is the refereed serial of the Code4Lib community. [1] The community desires to make the Journal’s content as widely accessible as possible. To that end, this Perl module is a read-only API against the Journal’s underlying WordPress database. Its primary purpose is to generate XML files that can be uploaded to the Directory of Open Access Journals and consequently made available through their OAI interface. [2]

Installation

To install the module you first need to have access to a WordPress (MySQL) database styled after the Journal. There is sample data in the distribution’s etc directory.

Next, you need to edit lib/C4LJ/Config.pm. Specifically, you will need to change the values of:

* $DATA_SOURCE – the DSN of your database, and you will probably need to only edit the value of the database name

* $USERNAME – the name of a account allowed to read the database

* $PASSWORD – the password of $USERNAME

Finally, exploit the normal Perl installation procedure: make; make test; make install.

Usage

To use the module, you will want to use C4LJ::Articles->get_articles. Call this method. Get back a list of article objects, and process each one. Something like this:

  use C4LJ::Article;
  foreach ( C4LJ::Article->get_articles ) {
    print '        ID: ' . $_->id       . "\n";
    print '     Title: ' . $_->title    . "\n";
    print '       URL: ' . $_->url      . "\n";
    print '  Abstract: ' . $_->abstract . "\n";
    print '    Author: ' . $_->author   . "\n";
    print '      Date: ' . $_->date     . "\n";
    print '     Issue: ' . $_->issue    . "\n";
    print "\n";
  }

The bin directory contains three sample applications:

1. dump-metadata.pl – the code above, basically

2. c4lj2doaj.pl – given an issue number, output XML suitable for DOAJ

3. c4lj2doaj.cgi – the same as c4lj2doaj.pl but with a Web interface

See the modules’ PODs for more detail.

License

This module is distributed under the GNU General Public License.

Notes

[1] Code4Lib Journal – http://journal.code4lib.org/
[2] DOAJ OAI information – http://www.doaj.org/doaj?func=loadTempl&templ=070509

Open Library, the movie!

Monday, May 26th, 2008

For a good time, I created a movie capturing some of the things I saw while attending the Open Library Developer’s Meeting a few months ago. Introducing, Open Library, the movie!

get-mbooks.pl

Monday, May 26th, 2008

I few months ago I wrote a program called get-mbooks.pl, and it is was used to harvest MARC data from the University of Michigan’s OAI repository of public domain Google Books. You can download the program here, and what follows is the distribution’s README file:

This is the README file for script called get-mbooks.pl

This script — get-mbooks.pl — is an OAI harvester. It makes a connection to the OAI data provider at the University of Michigan. [1] It then requests the set of public domain Google Books (mbooks:pd) using the marc21 (MARCXML) metadata schema. As the metadata data is downloaded it gets converted into MARC records in communications format through the use of the MARC::File::SAX handler.

The magic of this script lies in MARC::File::SAX. Is a hack written by Ed Summers against MARC::File::SAX found on CPAN. It converts the metadata sent from the provider into “real” MARC. You will need this hacked version of the module in your Perl path, and it has been saved in the lib directory of this distribution.

To get get-mbooks.pl to work you will first need Perl. Describing how to install Perl is beyond the scope of this README. Next you will need the necessary modules. Installing them is best accomplished through the use of cpan but you will need to be root. As root, run cpan and when prompted, install Net::OAI::Harvester:

$ sudo cpan
cpan> install Net::OAI::Harvester

You will also need the various MARC::Record modules:

$ sudo cpan
cpan> install MARC::Record

When you get this far, and assuming the hacked version of MARC::File::SAX is saved in the distribution’s lib directory, all you need to do next is run the program.

$ ./get-mbooks.pl

Downloading the data is not a quick process, and progress will be echoed in the terminal. At any time after you have gotten some records you can quit the program (ctrl-c) and use the Perl script marcdump to see what you have gotten (marcdump <file>).

Fun with OAI, Google Books, and MARC.

[1] http://quod.lib.umich.edu/cgi/o/oai/oai

Hello, World!

Monday, May 26th, 2008

Hello, World! It is nice to meet you.