Archive for June, 2008

eXtensible Catalog (XC): A very transparent approach

Thursday, June 26th, 2008

An article by Jennifer Bowen entitled “Metadata to support next-generation library resource discovery: Lessons from the eXtensible Catalog, Phase 1” appeared recently in Information Technology & Libraries, the June 2008 issue. [1]

The article outlines next-steps for the XC Project and enumerates a number of goals for their “‘next-generation’ library catalog” application/system:

  1. provide access to all library resources, digital and non-digital
  2. bring metadata about library resources into a more open Web environment
  3. provide an interface with new Web functionality such as Web 2.0 features and faceted browsing
  4. conduct user research to inform system development
  5. publish the XC code as open-source software

Because I am somewhat involved in the XC Project from past meetings and as a Development Partner, the article did not contain a lot of new news for me, but it did elaborate on a number of points.

Its underlying infrastructure is a good example. Like many “next-generation” library catalog applications/systems, it proposes to aggregate content from a wide variety of sources, normalize the data into a central store (the “hub”), index the content, and provide access to the central store or index through a number of services. This is how Primo, VUFind, AquaBrowser operate. Many others work in a similar manner; all of these systems have more things in common than differences. Unlike other applications/systems, XC seems to embrace a more transparent and community-driven process.

One of the things that intrigued me most was goal #2. “XC will reveal library metadata not only through its own separate interface.., but will also allow library metadata to be revealed through other Web applications.” This definitely the way to go. A big part of librarianship is making data, information, and knowledge widely accessible. Our current systems do this very poorly. XC is moving in the right direction in this regard. Kudos.

Another thing that caught my eye was a requirement for goal #3, “The XC system will capture metadata generated by users from any one of the system’s user environments… and harvest it back into the system’s metadata services hub for processing.” This too sounds like a good idea. People are the real sources of information. Let’s figure out ways to harness the knowledge, expertise, and experiences of our users.

What is really nice about XC is the approach they are taking. It is not all about their software and their system. Instead, it is about building on the good work of others and providing direct access to their improvements. “Projects such as the eXtensible Catalog can serve as a vehicle for moving forward by providing an opportunity for libraries to experiment and to then take informed action to move the library community toward a next generation of resource discovery systems.”

I wish more librarians would be thinking about their software development processes in the manner of XC.

[1] The article is immediately available online at http://hdl.handle.net/1802/5757.

Top Tech Trends for ALA (Summer ’08)

Wednesday, June 18th, 2008

Here is a non-exhaustive list of Top Technology Trends for the American Library Association Annual Meeting (Summer, 2008). These Trends represent general directions regarding computing in libraries — short-term future directions where, from my perspective, things are or could be going. They are listed in no priority order.

  • “Bling” in your website – I hate to admit it, but it seems increasingly necessary to make sure your institution’s website be aesthetically appealing. This might seem obvious to you, but considering the fact we all think “content is king” we might have to reconsider. Whether we like it or not, people do judge a book by its cover, and people do judge other’s on their appearance. Websites aren’t very much different. While librarians are great at organizing information bibliographically, we stink when it comes to organizing things visually. Think graphic design. Break down and hire a graphic designer, and temper their output with usability tests. We all have our various strengths and weaknesses. Graphic designers have something to offer that, in general, librarians lack.
  • Data sets – Increasingly it is not enough for the scholar or researcher to evaluate old texts or do experiments and then write an article accordingly. Instead it is becoming increasingly important to distribute the data and information the scholar or researcher used to come to their conclusions. This data and information needs to be just as accessible as the resulting article. How will this access be sustained? How will it be described and made available? To what degree will it be important to preserve this data and/or migrate it forward in time? These sorts of questions require some thought. Libraries have experience in these regards. Get your foot in the door, and help the authors address these issues.
  • Institutional repositories – I don’t hear as much noise about institutional repositories as I used to hear. I think their lack of popularity is directly related to the problems they are designed to solve, namely, long-term access. Don’t get me wrong, long-term access is definitely a good thing, but that is a library value. In order to be compelling, institutional repositories need to solve the problems of depositors, not the librarians. What do authors get by putting their content in an institutional repository that they don’t get elsewhere? If they supported version control, collaboration, commenting, tagging, better syndication and possibilities for content reuse — in other words, services against the content — then institutional repositories might prove to be more popular.
  • Mobile devices – The iPhone represents a trend in mobile computing. It is both cool and “kewl” for three reasons: 1) its physical interface complete with pinch and drag touch screen options make it easy to use; you don’t need to learn how to write in its language, 2) its always-on and endlessly-accessible connectivity to the Internet make it trivial to keep in touch, read mail, and “surf the Web”, 3) its software interface is implemented in the form of full-blown applications, not dummied down text interfaces with lot’s of scrolling lists. Apple Computer got it right. Other companies will follow suit. Sooner or later we will all by walking around like people from the Starship Enterprise. “Beam me up, Scotty!” Consider integrating into your services the ability to text the content of library research to a telephone.
  • Net Neutrality – The Internet, by design, is intended to be neutral, but increasingly Internet Service Providers (ISP) are twisting the term “neutrality” to mean, “If you pay a premium, then we won’t throttle your network connection.” Things like BitTorrent is a good example. This technique exploits the Internet making file transfers more efficient, but ISPs want to inhibit it and/or charge more for its use. Yet again, the values and morals of a larger, more established community, in this case capitalism, are influencing the Internet. Similar value changes manifested themselves when email became commonplace. Other values, such as not wasting Internet bandwidth by transferring unnecessarily large files over the ‘Net, have changed as both the technology and the numbers of people using the Internet have changed. Take a stand for “Net Neutrality”.
  • “Next generation” library catalogs – The profession has finally figured it out. Our integrated library systems don’t solve the problems of our users. Consequently, the idea of the “next generation” library catalog is all the rage, but don’t get too caught up in features such as Did You Mean?, faceted browse, cover art, or the ability of include a wide variety of content into a single interface. Such things are really characteristics and functions of underlying index. They are all things designed to make it easier to accomplish the problem of find, but this is not the problem to be solved. Google make it easy to find. Really easy. We are unable to compete in that arena. Everybody can find, and we are still “drinking” from the proverbial “fire hose”. Instead, think about ways to enable the patron to use the content they find. Put the content into context. Like the institutional repositories, above, and the open access content, below, figure out way to make the content useful. Empower the patron. Enable them to apply actions against the content, not just the index. Such things are exemplified by action verbs. Tag. Share. Review. Add. Read. Save. Delete. Annotate. Index. Syndicate. Cite. Compare forward and backward in time. Compare and contrast with other documents. Transform into other formats. Distill. Purchase. Sell. Recommend. Rate. Create flip book. Create tag cloud. Find email address of author. Discuss with colleagues. Etc. The types of services implementable by “next generation” library catalogs is as long as the list of things people do with the content they find in libraries. This is one of the greatest opportunities facing our profession.
  • Open Access Publishing – Like its sister, institutional repositories, I don’t hear as much about open access publishing as I used to hear. We all know it is a “good thing” but like so many things that are “free” its value is only calculated by the amount of money paid for it. “The journals from this publisher are very expensive. We had better promote them and make them readily visible on our website in order for us to get our money’s worth.” In a library setting, the value of material is not based on dollars but rather on things such as but limited to usefulness, applicability, keen insight, scholarship, and timeliness. Open access publishing content manifests these characteristics as much a traditionally published materials. Open access content can be made even more valuable if its open nature were exploited. Like the content found in institutional repositories, and like the functions of “next generation” library catalogs outlined above, the ability to provide services against open access content are almost limitless. More than any other content, open access content combined with content from things like the Open Content Alliance and Project Gutenburg can be freely collected, indexed, searched, and then put into the context of the patron. Create bibliography. Trace citation. Find similar words and phrases between articles and books. Take an active role in making open access publishing more of a reality. Don’t wait for the other guy. You are a part of the solution.
  • Social networking – Social networking is beyond a trend. It is all but a fact of the Internet. Facebook, MySpace, and LinkedIn as well as Wikipedia, YouTube, Flickr, and Delicious are probably the archetypical social networking sites. They have very little content of their own. Instead, they provide a platform for others to provide content — and then services against that content. (“Does anybody see a trend in these trends, yet?”) What these social networking sites are exploiting is a new form of the numbers game. Given a wide enough audience it is possible to find and create sets of others interested in just about any topic under the sun. These people will be passionate about their particular topic. They will be sincere, adamant, and arduous about making sure the content is up-date, accurate, and thoroughly described and accessible. Put your content into these sorts of platforms in the same way the Library of Congress as well as the Smithsonian Institution has put some of their content into Flickr. A rising tide floats all boats. Put your boat into the water. Participate in this numbers game. It is not really about people using your library, but rather about people using the content you have made available.
  • Web Services-based APIs – xISBN and thingISBN. The Open Library API. The DLF ILS-DI Technical Recommendation. SRU and OpenSearch. OAI-PMH and now OAI-ORE. RSS and ATOM. All of these things are computing techniques called Web Services Application Programmer Interfaces (API). They are computer-to-computer interfaces akin to things like Z39.50 of Library Land. They enable computers to unambiguously share data between themselves. A number of years ago implementing Web Services meant learning things like SOAP, WSDL, and UDDL. These things were (are) robust, well-documented, and full-featured. They are also non-trivial to learn. (OCLC’s Terminology Service embedded within Internet Explorer uses these techniques.) After that REST become more popular. Simpler, and exploits the features of HTTP. The idea was (is) send a URL to a remote computer. Get a response back as XML. Transform the response and put it to use — usually display things on a Web page. This is the way most of the services work (“There’s that word again!”) The latest paradigm and increasingly popular technique uses a data structure called JSON as opposed to XML as the form of the server’s response because JSON is easier to process with Javascript. This is very much akin to AJAX. Despite the subtle differences between each of these Web Services computing techniques, there is a fundamental commonality. Make a request. Wait. Get a response. Do something with the content — make it useful. Moreover, the returned content is devoid of display characteristics. It is just data. It is your responsibility to turn it into information. Learn to: 1) make your content accessible via Web Services, and 2) learn how to aggregate content through Web Services in order to enhance your patron’s experience.

Wow! Where did all of that come from?

(This posting is also available at on the LITA Blog. “Lot’s of copies keep stuff safe.”)

Google Onebox module to search LDAP

Monday, June 16th, 2008

This posting describes a Google Search Appliance Onebox module for searching an LDAP directory.

At my work I help administrate a Google Search Appliance. It is used index the university’s website. The Appliance includes a functionality — called Onebox — allowing you to search multiple indexes and combining the results into a single Web page. It is sort of like libraray metasearch.

In an effort to make it easier for people to find… people, we created a Onebox module, and you can download the distribution if you so desire. It is written in Perl.

In regards to libraries and librarianship, the Onebox technique is something the techno-weenies in our profession ought to consider. Capture the user’s query. Do intelligent processing on it by enhancing it, sending it to the appropriate index, making suggestions, etc., and finally returning the results. In other words, put some smarts into the search interface. You don’t need a Google Search Appliance to do this, just control over your own hardware and software.

From the distribution’s README file:

This distribution contains a number of files implementing a Google Onebox “widget”. It looks people’s names up in an LDAP directory.

The distribution contains the following files:

  • people.cgi – the reason de existance
  • people.pl – command-line version of people.cgi
  • people.png – an image of a person
  • people.xsl – XSL to convert people.cgi output to HTML
  • README – this file
  • LICENSE – the GNU Public License

The “widet” (people.cgi) is almost trivial. Read the value of the query paramenter sent as a part of the GET request. Open up a connection to the LDAP server. Query the server. Loop through the results keeping only a number of them as defined by the constant UPPER. Mark-up the results as Google XML. Return the XML to the HTTP client. It is then the client’s resposibility to transform the XML into an HTML (table) snippet for display. (That is what people.xsl is for.)

This widget ought to work in many environments. All you really need to do is edit the values of the constants at the beginning of people.cgi.

This code is distributed under the GNU Public License.

Enjoy.

DLF ILS Discovery Internet Task Group Technical Recommendation

Thursday, June 12th, 2008

I read the great interest the DLF ILS Discovery Internet Task Group (ILS-DI) Technical Recommendation [1], and I definitely think it is a step in the right direction for making the content of library systems more accessible.

In regards to the integrated systems of libraries, the primary purpose of the Recommendations is to:

  • improve discovery and use of library resources
  • articulate a clear set of expectations for developers
  • make recommendations applicable to existing and future systems
  • ensure the recommendations are feasible
  • support interoperation and cooperation
  • be responsive to the user and developer community

To this end the Recommendations list a set of abstract functions integrated library systems “should” implement, and it enumerate a number of concrete bindings that can be used to implement these functions. Each of the twenty-five (25) functions can be grouped into one of four overall categories:

  1. data aggregation – harvest content en masse from the underlying system
  2. search – supply a query and get back a list of matching records
  3. patron services – support things like renew, hold, recall, etc.
  4. OPAC integration – provide ways to link to outside services

The Recommendations also group the functions into levels of interoperability:

  1. Level 1: basic interface – simple harvest, search, and display record
  2. Level 2: supplemental – Level 1 plus more robust harvest and search
  3. Level 3: alternative – Level 2 plus patron services
  4. Level 4: robust – Level 3 plus reserves functions and support of an explain function

After describing the things outlined above in greater detail, the Recommendations get down to business, list each function, its parameters, why it is recommended, and suggests one or more “bindings” — possible ways the function can be implemented. Compared to most recommendations in my experience, this one is very easy to read, and it is definitely approachable by anybody who calls themselves a librarian. A few examples illustrate the point.

The Recommendations suggest a number of harvest functions. These functions allow a harvesting system to specify a number of date ranges and get back a list records that have been created or edited within those ranges. These records may be bibliographic, holdings, or authority in nature. These records may be in MARC format, but is strongly suggested they be in some flavor of XML. The search functions allow a remote application to query the system and get back a list of matching records. Like the harvest functions, records may be returned in MARC but XML is prefered. Patron functions support finding patrons, listing patron attributes, allowing patrons to place holds, recalls, or renewals on items, etc.

There was one thing I especially liked about the Recommendations. Specifically, whenever possible, the bindings were based on existing protocols and “standards”. For example, they advocated the use of OAI-PMH, SRU, OpenSearch, NCIP, ISO Holdings, SIP2, MODS, MADS, and MARCXML.

From my reading, there were only two slightly off kilter things regarding the Recommendations. First, it advocated the possible use of an additional namespace to fill in some blanks existing XML vocabularies are lacking. I suppose this was necessary in order to glue the whole thing together. Second, it took me a while to get my head around the functions supporting links to external services — the OPAC interaction functions. These functions are expected to return Web pages that is static, writable, or transformative in nature. I’ll have to think about these some more.

It is hoped vendors of integrated library systems support these functions natively or they are supported through some sort of add-on system. The eXstensible Catalog (XC) is a good example here. The use of Ex Libris’s X-Server interface is another. At the very least a number of vendors have said they would make efforts to implement Level 1 functionality, and this agreement been called the “Berkley Accord” and includes: AquaBrowser, BiblioCommonsCalifornia Digital Library, Ex Libris, LibLime, OCLC, Polaris Library Systems, SirsiDynix, Talis, and VTLS.

Within my own sphere of hack-dom, I think I could enhance my Alex Catalogue of Electronic Texts to support these Recommendations. Create a (MyLibrary) database. Populate it with the metadata and full-text data of electronic books, open access journal articles, Open Content Alliance materials, records from Wikipedia, and photographic images of my own creation. Write reports in the form of browsable lists or feeds expected to be fed to an indexer. Add an OAI-PMH interface. Make sure the indexer is accessible via SRU. Implement a “my” page for users and enhance it to support the Recommendations. Ironically, much of this work has already been done.

In summary, and as I mentioned previously, these Recommendations are a step in the right direction. The implementation of a “next generation” library catalog is not about re-inventing a better wheel and trying to corner the market with superior or enhanced functionality. Instead it is about providing a platform for doing the work libraries do. For the most part, libraries and their functions have more things in common than they have differences. These Recommendations articulate a lot of these commonalities. Implement them, and kudos to Team DLF ILS-DI.

[1] PDF version of Recommendation – http://tinyurl.com/3lqxx2

HyperNote Pro: a text annotating HyperCard stack

Saturday, June 7th, 2008

In 1992 I wrote a HyperCard stack called HyperNote Pro.

HyperNote screenshotHyperNote allowed you to annotate plain text files, and it really was a hypertext system. Import a plain text file. Click a word to see a note. Option-click a word to create a note. Shift-click a word to create an image note. Option-shift-click a word to link to another document. Use the HyperNote > New HypernNote menu option to duplicate the stack and create a new HyperNote document.

HyperCard is all but dead, and need an older Macintosh computer to use the application. It was pretty cool. You can download it from my archives. Here is the text from the self-extracting archive:

HyperNote Pro: a text annotating stack by Eric Lease Morgan

HyperNote Pro is a HyperCard stack used to annotate text. It can also create true hypertext links between itself and other documents or applications.

Simply create a new HyperNote Pro stack, import a text file, and add pop–up notes, pictures, and/or hypertext links to the text. The resulting stack can be distributed to anybody with HyperCard 2.0 and they will be able to read or edit your notes and pictures. They will be able to link to other documents if the documents are available.

Here are some uses for HyperNote Pro. Context sensitive help can be created for applications. News or journal articles could be imported and your opinions added. Business reports could be enhances with graphs. Resumes could go into greater detail without overwhelming the reader. Students could turn in papers and teachers could comment on the text.

Another neat thing about HyperNote Pro is it self–replicating. By selecting “New HN…” and choosing a text–file, HyperNote Pro creates a copy of itself except with the text of the chosen file.

HyperNote Pro is free. It requires HyperCard 2.0 to run.

Features:

  • any size text–file can be imported
  • format the text with any available font
  • add/edit pop–up notes and/or pictures to imported text
  • add true hypertext links to any document or application
  • includes a “super find” feature
  • self–replicating
  • System 7 compatible
        \ /
       - * -      
     \ // \       Eric Lease Morgan, Systems Librarian 
    - * -|\ /     North Carolina State University
     / \ - * -    Box 7111, Room 2111
      |  |/ \     Raleigh, NC 29695-7111
      \ /| |      (919) 515-6182
     - * - |
      / \| /      
       | |/       
    ===========   America Online: EricMorgan
     \=======/    Compu$erve: 71020,2026
      \=====/     Internet: eric_morgan@ncsu.edu
       =====      The Well: emorgan

P.S. Maybe I will be able to upload this stack to TileStack as seen on Slashdot.

Steve Cisler

Friday, June 6th, 2008

This is a tribute to Steve Cisler, community builder and librarian.

Steve CislerLate last week I learned from Paul Jones’s blog that Steve Cisler had died. He was a mentor to me, and I’d like to tell a few stories describing the ways he assisted me in my career.

I met Steve in 1989 or so after I applied for an Apple Library of Tomorrow (ALOT) grant. The application was simple. “Send us a letter describing what you would do with a computer if you had one.” Being a circuit-rider medical librarian at the Catawba-Wateree Area Health Education Center (AHEC) in rural Lancaster, South Carolina, I outlined how I would travel from hospital to hospital facilitating searches against MEDLINE, sending requests for specific articles via ‘fax back to my home base, and having the articles ‘faxed back to the hospital the same day. Through this process I proposed to reduce my service’s turn-around time from three days to a few hours.

Those were the best two pages of text I ever wrote in my whole professional career because Apple Computer (Steve Cisler) sent me all the hardware I requested — an Apple Macintosh portable computer and printer. He then sent me more hardware and more software. It kept coming. More hardware. More software. At this same time I worked with my boss (Martha Groblewski) to get a grant from the National Library of Medicine. This grant piggy-backed on the ALOT grant, and I proceeded to write an expert system in HyperCard. It walked the user through a reference interview, constructed a MEDLINE search, dialed up PubMED, executed the search, downloaded the results, displayed them to the user, allowed the user to make selections, and finally turned-around and requested the articles for delivery via DOCLINE. I called it AskEric, about four years before the ERIC Clearinghouse used the same name for their own expert system. In my humble opinion, AskEric was very impressive, and believe it or not, the expert part of the system still works (as long as you have the proper hardware). It was also during this time when I wrote my first two library catalog applications. The first one, QuickCat, read the output of a catalog card printing program called UltraCard. Taking a clue from OCLC’s (Fred Kilgour’s) 4,2,2,1 indexing technique, it parsed the card data creating author, title, subject, and keyword indexes based on a limited number of initial characters from each word. It supported simple field searching and Boolean logic. It even supported rudimentary circulation — search results of items that had been checked-out were displayed a different color than the balance of the display. QuickCat earned me the 1991 Meckler Computers In Libraries Software Award. My second catalog application, QuickCat Mac, read MARC records and exploited HyperCard’s free-text searching functionality. Thanks goes to Walt Crawford who taught me about MARC through his book, MARC For Library Use. Thanks goes to Steve for encouraging the creativity.

Steve then came to visit. He wanted to see my operation and eat barbecue. During his visit, he brought a long a video card, and I had my first digital image taken. The walk to the restaurant where we ate his barbecue was hot and humid but he insisted on going. “When in South Carolina you eat barbecue”, he said. He was right.

It was time for the annual ALOT conference, and Steve flew me out to Apple Computer’s corporate headquarters. There I met other ALOT grantees including Jean Armor Polly (who coined the phrase “surfing the Internet”), Craig Summerhill who was doing some very interesting work indexing content using BRS, folks from OCLC who were scanning tables-of-contents and trying to do OCR against them, and people from the Smithsonian Institution who were experimenting with a new image file format called JPEG.

I outgrew the AHEC, and with the help of a letter of reference from Steve I got a systems librarian job at the North Carolina State University Libraries. My boss, John Ulmschneider, put me to work on a document delivery project jointly funded by the National Agriculture Library and an ALOT grant. “One of the reasons I hired you”, John said, “was because of your experience with a previous ALOT grant.” Our application, code named “The Scan Plan”, was a direct competitor to the fledgling application called Ariel. Our application culminated in an article called “Digitized Document Transmission Using HyperCard”, ironically available as a scanned image from the ERIC Clearinghouse (or this cached version). That year, during ALA, I remember walking through the exhibits. I met up with John and one of his peers, Bil Stahl (University of North Carolina – Charlotte). As we were talking Charles Bailey (University of Houston) of PACS Review fame joined us. Steve then walked up. Wow! I felt like I was really a part of the in crowd. They didn’t all know each other, but they knew me. Most of the people whose opinions I respected the most at that particular time were all gathered in one place.

By this time the “Web” was starting to get hot. Steve contacted me and asked, “Would you please write a book on the topic of Macintosh-based Web servers?” Less than one year, one portable computer, and one QuickTake camera later I had written Teaching a New Dog Old Tricks: A Macintosh-Based World Wide Web Starter Kit Featuring MacHTTP and Other Tools. This earned me two more trips. The first was to WebEdge, the first Macintosh WWW Developer’s Conference, where I won a hackfest award for my webcam application called “Save 25¢ or ‘Is Eric In’?” The second was back to Apple headquarters for the Ties That Bind conference where I learned about AppleSearch which (eventually) morphed into the search functionality of Mac OS X, sort of. I remember the Apple Computer software engineers approaching the Apple Computer Library staff and asking, “Librarians, you have content, right? May we have some to index?”

motifTo me it was the Ties That Bind conference that optimized the Steve Cisler I knew. He described there his passion for community. For sharing. For making content (and software) freely available. We discussed things like “copywrite” as opposed to copyright. It was during this conference he pushed me into talking with a couple of Apple Computer lawyers and convince them to allow the Tricks book to be freely published. It was during this conference he described how we are all a part of a mosaic. Each of us are a dot. Individually we have our own significance, but put together we can create an even more significant picture. He used an acrylic painting he recently found to literally illustrate the point, all puns intended. Since then I have used the mosaic as a part my open source software in libraries handout. I took the things Steve said to heart. Because of Steve Cisler I have been practicing open access publishing and open source software distribution for longer than the phrases have been coined.

A couple more years past and Apple Computer shut down their library. Steve lost his job, and I sort of lost track of Steve. I believe he did a lot of traveling, and the one time I did see him he was using a Windows computer. He didn’t like it, but he didn’t seem to like Apple either. I tried to thank him quite a number of times for the things he had done for me and my career. He shrugged off my praise and more or less said, “Pass it forward.” He then went “off the ‘Net” and did more traveling. (Maybe I got some of my traveling bug from Steve.) I believe I wrote him a letter or two. A few more years past, and like I mentioned above, I learned he had died. Ironically, the next day I was off to Santa Clara (California) to give a workshop on XML. I believe Steve lived in Santa Clara. I thought of him as I walked around downtown.

Tears are in my eyes and my heart is in my stomach when I say, “Thank you, Steve. You gave me more than I ever gave in return.” Every once in a while younger people than I come to visit and ask questions. I am more than happy to share what I know. “Steve, I am doing my best to pass it forward.”