Attached should be simple Perl script called wordcloud.pl. Initialize it with a hash of words and associated integers. Output rudimentary HTML in the form of a word cloud. This hack was used to create the word cloud in a posting called “Last of the Mohicans and services against texts“.
Archive for August, 2008
wordcloud.pl
Monday, August 25th, 2008
Last of the Mohicans and services against texts
Monday, August 25th, 2008
Here is a word cloud representing James Fenimore Cooper’s The Last of the Mohicans; A narrative of 1757. It is a trivial example of how libraries can provide services against documents, not just the documents themselves.
scout heyward though duncan uncas little without own eyes before hawkeye indian young magua much place long time moment cora hand again after head returned among most air huron toward well few seen many found alice manner david hurons voice chief see words about know never woods great rifle here until just left soon white heard father look eye savage side yet already first whole party delawares enemy light continued warrior water within appeared low seemed turned once same dark must passed short friend back instant project around people against between enemies way form munro far feet nor
About the story
While I am not a literary scholar, I am able to read a book and write a synopsis.
Set during the French And Indian War in what was to become upper New York State, two young women are being escorted from one military camp to another. Along the way the hero, Natty Bumppo (also known by quite a number of other names, most notably “Hawkeye” or the “scout”), alerts the convoy that their guide, Magua, is treacherous. Sure enough, Magua kidnaps the women. Fights and battles ensue in a pristine and idyllic setting. Heroic deeds are accomplished by Hawkeye and the “last of the Mohicans” — Uncas. Everybody puts on disguises. In the end, good triumphs over evil but not completely.
Cooper’s style is verbose. Expressive. Flowery. On this level it was difficult to read. Too many words. In the other hand the style was consistent, provided a sort of pattern, and enabled me to read the novel with a certain rhythm.
There were a couple of things I found particularly interesting. First, the allusion to “relish“. I consider this to be a common term now-a-days, but Cooper thought it needed elaboration when used to describe food. Cooper used the word within a relatively short span of text to describe condiment as well as a feeling. Second, I wonder whether or not Cooper’s description of Indians built on existing stereotypes or created them. “Hugh!”
Services against texts
The word cloud I created is simple and rudimentary. From my perspective, it is just a graphical representation of a concordance, and a concordance has to be one of the most basic of indexes. This particular word cloud (read “concordance” or “index”) allows the reader to get a sense of a text. It puts words in context. It allows the would-be reader to get an overview of the document.
This particular implementation is not pretty, nor is it quick, but it is functional. How could libraries create other services such as these? Everybody can find and get data and information these days. What people desire is help understanding and using the documents. Providing services against texts such as word clouds (concordances) might be one example.
Crowd sourcing TEI files
Friday, August 15th, 2008
How feasible and/or practical do you think “crowd sourcing” TEI files would be?
I like writing in my books. In fact, I even have a particular system for doing it. Circled things are the subjects of sentences. Squared things are proper nouns. Underlined things connected to the circled and squared things are definitions. Moreover, my books are filled with marginalia. Comments. Questions. See alsos. I call this process ELMTGML (Eric Lease Morgan’s Truly Graphic Mark-up Language), and I find it a whole lot more useful than the use of simple highlighter pen that where all the mark-up has the same value. Florescent yellow.
I think I could easily “crosswalk” my mark-up process to TEI mark-up because there are TEI elements for many of things I highlight. Given such a thing I could mark-up texts using my favorite editor and then create stylesheets that turn on or turn off my commentary.
Suppose many classic texts were marked-up in TEI. Suppose there were stylesheets that allowed you to turn on or turn off other people’s commentary/annotations or allowed you to turn on or turn off particular people’s commentary/annotation. Wouldn’t that be interesting?
Moreover, what if some sort of tool, widget, or system were created that allowed anybody to add commentary to texts in the form of TEI mark-up. Do you think this would be feasible? Useful?
Creator: Eric Lease Morgan <eric_morgan@infomotions.com>
Date created: 2008-05-26
Date updated: 2010-05-09
URL: ./