Planet Cataloging

February 27, 2015

International Society for Knowledge Organization (ISKO) UK

Thesaurus Debate needs to move on

Surprise, surprise - last Thursday's debate on this proposition was a pushover for the opposition. To defeat any argument of the form “XXX has no place in YYY”, all you have to provide is one counter-example. Just for starters:  The UK Data Archive, powered by the HASSET thesaurus The FAO’s AGRIS database, searchable using AGROVOC, and  EUROVOC, used for searching publications of the EU

by Stella Dextre Clarke ( at February 27, 2015 09:51 AM

February 26, 2015

Mod Librarian

5 Things Thursday: DAM, Ontologies, IA

Here are five things:

  1. What’s holding DAM back? DAMNews three part special feature.
  2. What the heck’s Information Architecture?
  3. DAM – from photo library to enterprise system.
  4. Building a historical search engine is no easy thing.
  5. Awesome presentation from WorldIA Day Seattle from Andy Fitzgerald – Desiring Ecologies.

View On WordPress

February 26, 2015 01:04 PM

February 25, 2015

TSLL TechScans

OBS/TS name that grant contest

Posted on behalf of the OBS/TS Joint Research Grant Committee

Name that Grant Contest!

Did you know that TS and OBS offer a grant to do research? Did you know that research can be as simple as evaluating an app?

A recent survey says that many people are confused by the name of the OBS/TS Joint Research Grant. (Hint, it's not about researching joints!)

We'd like you to rename the Grant!

We’re holding a contest to find a creative, catchy name that will also accurately represent the purpose of the Grant (See below for Grant Guidelines).
  • All AALL members are eligible to enter
  • Multiple entries are allowed
  • Entries will be accepted from February 16th through March 16th
  • Winner will be determined by vote
  • The winner will receive a $50 Amazon Gift Card
  • All entrants will be entered into a raffle for a free membership to OBS-SIS or TS-SIS

Go to the Name that Grant Contest to enter*.

*You will need to register with the contest website in order to enter the contest or vote

by (Jackie Magagnosc) at February 25, 2015 03:21 PM

February 24, 2015

025.431: The Dewey blog

WebDewey Number Building Tool: Standard Subdivisions and Three-Digit Numbers Ending with Zero: Part 1

Note: See also previous posts (1 and 2) on using the number building tool with standard subdivisions. See also posts 1 and 2 on using the number building tool in music, and posts on using the number building tool in literature and in natural resources. The general approach to building numbers described in those posts can be applied in any discipline. See also the WebDewey training modules for the WebDewey number building tool.

Are you having problems using the WebDewey number building tool to add standard subdivisions to three-digit numbers ending in zero?  If so, let’s start with the simplest case: a division number where no extra zeros are needed with standard subdivisions. (We’ll look at more complicated situations in subsequent blog posts.) Division numbers have two significant digits plus one placeholder zero, e.g., 550 Earth sciences.  Classifiers are taught to drop the placeholder zero when adding a standard subdivision to a division number.  We plan to teach the number building tool to drop that placeholder zero, but that has not happened yet. In the meanwhile, here are instructions on how to overcome the problem.

Let’s consider an encyclopedia of earth sciences, e.g.: Macmillan Encyclopedia of Earth Sciences.  Its first LCSH is "Earth sciencesEncyclopedias."

Browsing the Relative Index for "earth sciences" yields:

Earth sciences    550   

If we click to see the full record, we get this Hierarchy box:


The focus is 550 Earth sciences, with downward hierarchy showing some numbers built with standard subdivisions, all with one zero.  There is no indication that extra zeros are needed.  One step up in the hierarchy is an entry with the placeholder zero greyed out (550) and the heading edited for browsing purposes as found in the DDC Summaries:

550 Earth sciences & geology  

If we click to see the full record, we get this Hierarchy box:


In addition to the Hierarchy box, the full record has a Create Built Number box with the base number that we want: 55 without the placeholder zero:


If we click Start in that box, we get:



Now the Hierarchy box shows standard subdivisions:


In the Hierarchy box we click T1—03 Dictionaries, encyclopedias, concordances and get this Hierarchy box:


If we now click Add in the Create Built Number box, we get:


We have now built the number with the correct number of zeros.  If we click Save, the newly built number appears in the Hierarchy box:


We now have an opportunity to change the user term and add other user terms—but enough for now!  We have successfully built the number.

The key to success is to find the record with the placeholder zero greyed out, and click Start in that record.

Note: The WebDewey number building tool may be confused by bracketed standard subdivisions. For example, if there are bracketed entries indicating that standard subdivisions have been relocated from a double zero to a single zero (e.g., at 380 or 730), you will need to use Edit local to get the single zero.

by Juli at February 24, 2015 11:09 PM


Testing date parsing by fuzzing

ArabicDate Fuzz testing, or fuzzing, is a way of stress testing services by sending them potentially unexpected input data. I remember being very impressed by one of the early descriptions of testing software this way (Miller, Barton P., Louis Fredriksen, and Bryan So. 1990. "An empirical study of the reliability of UNIX utilities". Communications of the ACM. 33 (12): 32-44), but had never tried the technique.

Recently, however, Jenny Toves spent some time extending VIAF's date parsing software to handle dates associated with people in WorldCat.  As you might imagine, passing through a hundred million new date strings found some holes in the software.  While we can't guarantee that the parsing always gives the right answer, we would like to be as sure as we can that it won't blow up and cause an exception.

So, I looked into fuzzing.  Rather than sending random strings to the software, the normal techniques now used tend to generate them based on a specification or by fuzzing existing test cases.  Although we do have something close to a specification based on the regular expressions the code uses, I decided to try making changes to the date strings we have that are derived from VIAF dates.

Most frameworks for fuzzing are quite loosely coupled, typically they pass the fuzzed strings to a separate process that is being tested.  Rather than do that, I read in each of the strings, did some simple transformations on it and called the date parsing routine to see if it would cause an exception. Here's what I did for each test string, typically for as many times as the string was long.  At each step the parsing is called

  • Shuffle the string ('abc' might get replaced by 'acb')
  • Change the integer value of character up or down (e.g. 'b' would get replaced by 'a' and then by 'c')
  • Change each character to a random Unicode character

For our 384K test strings this resulted in 1.9M fuzzed strings. This took about an hour to run on my desktop machine.

While the testing didn't find all the bugs we knew about in the code, it did manage to tickle a couple of holes in it, so I think the rather minimal time taken (less than a day) was worth it, given the confidence it gives us that the code won't blow up on strange input.

The date parsing code in GitHub will be updated soon.  Jenny is adding support for Thai dates (different calendar) and generally improving things.

Possibly the reason I thought of trying fuzzing was an amazing post on lcamtuf's blog Pulling JPEGs out of thin air.  That post is really amazing.  By instrumenting some JPEG software so that his fuzzing software could follow code paths at the assembly level, he was able to create byte strings representing valid JPEG images by sending in fuzzed strings, a truly remarkable achievement. My feeling on reading it was very similar to my reaction reading the original UNIX testing article cited earlier.



by Thom at February 24, 2015 03:45 PM

Terry's Worklog

MarcEdit 6 Update

A new version of MarcEdit has been made available.  The update includes the following changes:

  • Bug Fix: Export Tab Delimited Records: When working with control data, if a position is requested that doesn’t exist, the process crashes.  This behavior has been changed so that a missing position results in a blank delimited field (as is the case if a field or field/subfield isn’t present.
  • Bug Fix: Task List — Corrected a couple reported issues related to display and editing of tasks.
  • Enhancement: RDA Helper — Abbreviations have been updated so that users can select the fields that abbreviation expansion occurs.
  • Enhancement: Linked Data Tool — I’ve vastly improved the process by which items are linked. 
  • Enhancement: Improved VIAF Linking — thanks to Ralp LeVan for pointing me in the right direction to get more precise matching.
  • Enhancement: Linked Data Tool — I’ve added the ability to select the index from VIAF to link to.  By default, LC (NACO) is selected.
  • Enhancement: Task Lists — Added the Linked Data Tool to the Task Lists
  • Enhancement: MarcEditor — Added the Linked Data Tool as a new function.
  • mprovements: Validate ISBNs — Added some performance enhancements and finished working on some code that should make it easier to begin checking remote services to see if an ISBN is not just valid (structurally) but actually assigned.
  • Enhancement: Linked Data Component — I’ve separated out the linked data logic into a new MarcEdit component.  This is being done so that I can work on exposing the API for anyone interested in using it.
  • Informational: Current version of MarcEdit has been tested against MONO 3.12.0 for Linux and Mac.

Linked Data Tool Improvements:

A couple specific notes of interest around the linked data tool.  First, over the past few weeks, I’ve been collecting instances where and viaf have been providing back results that were not optimal.  On the VIAF side, some of that was related to the indexes being queried, some of it relates to how queries are made and executed.  I’ve done a fair bit of work added some additional data checks to ensure that links occur correctly.  At the same time, there is one known issue that I wasn’t able to correct while working with, and that is around deprecated headings. currently provides no information within any metadata provided through the service that relates a deprecated item to the current preferred heading.  This is something I’m waiting for LC to correct.

To improve the Linked Data Tool, I’ve added the ability to query by specific index.  By default, the tool will default to LC (NACO), but users can select from a wide range of vocabularies (including, querying all the vocabularies at once).  The new screen for the Linked Data tool looks like the following:


In addition to the changes to the Linked Data Tool – I’ve also integrated the Linked Data Tool with the MarcEditor:


And within the Task Manager:


The idea behind these improvements is to allow users the ability to integrate data linking into normal cataloging workflows – or at least start testing how these changes might impact local workflows.


You can download the current version buy utilizing MarcEdit’s automatic update within the Help menu, or by going to: and downloading the current version.


by reeset at February 24, 2015 05:38 AM

February 21, 2015

First Thus

ACAT Doctor Who (Fictitious character)–strange authority record

Posting to Autocat

On 1/29/2015 8:17 PM, McDonald, Stephen wrote:
> I agree with your general point, concerning the problem when common usage differs from the formal or official name. However, in the specific case of Dr. Who, people_do_ know the character as The Doctor. It is only those who are not actually familiar with the show who think Doctor Who refers to the character’s name rather than the name of the show. Wikipedia, fan websites, and other popular sources use the name The Doctor.

When we discuss this in terms of the world of linked data, the character of “The Doctor/Doctor Who” becomes almost overwhelming. If we look at dbpedia (soon to be overtaken by Wikidata?) we see where the “owl:sameAs” property (that is, other forms of the term) contains:닥터_(닥터_후),_doki?)الدكتور_(دكتور_هو)Доктор_(Доктор_Кой)הדוקטורДоктор_(Доктор_Кто)Доктор_(Доктор_Хто)博士_(異世奇人)

Here is his Wikipedia entry in Russian. In the world of linked data, there will be no “official forms” but rather a number of variants (owl:sameAs) and the most important part will be the link: and then the systems people can style things how the library wishes.

Also in Wikipedia, there are an incredible number of links to various aspects of “The Doctor/Doctor Who” including his differrent transformations and individual episodes, plus spinoffs:

In the larger world, the Doctor Who character takes on many names, just as there is no single “correct” name for the city of “Rome” or “Vienna”. There are many. It depends on where you come from. There is no “correct” form in this sense.


by James Weinheimer at February 21, 2015 10:57 AM

ACAT Genre term for untrue nonfiction

Posting to Autocat

On 1/22/2015 12:37 AM, J. McRee Elrod wrote:
> James said:
>> >It seems to me that the catalog*as a whole* handles this rather well >> >right now …
> Yes there are resources outside the items themselves which identify > hoaxes. The questions are, do the records for the items themselves > need such indication, and if so, how does that impact on cataloguer > neutrality, and who decides what is untrue?

That is not the idea I wished to convey. We need to see the catalog as a whole–as our users do–and not focus only on individual records. In the case of the “Protocols” the catalog is supposed to bring all the materials together, both the different versions of the Protocols and the items about the Protocols. At one time, it did that job rather well.

We are lucky that we can see how this was supposed to work by looking in Princeton’s scanned card catalog. (DISCLOSURE: I am not pining for “the old days” here. I am demonstrating a power that has been lost)

If we go to the first card of the Protocols:, and browse the cards, we see the different versions of the text in different languages. As we continue along, we come to items that have the Protocols as a subject: (We know this because the subjects were typed in ALL CAPS and they may have been in red ink, too) As we browse those cards, we immediately become aware that there is some kind of controversy.

This was one example of how the catalog was supposed to work but it, along with many other capabilities, were lost when keyword was introduced. It is true that keyword brought in many capabilities that were impossible before, but it should be recognized that it lost many as well.

In our catalogs today, there is no “browse” function that brought subjects and titles together in this very powerful and provocative way. I am absolutely not saying that the solution is to bring back such a browse, because that method was for physical catalogs and is 100% obsolete today. But our time and efforts would be better spent figuring out how to recreate that power for a new environment, instead of the tedious recoding of zillions of records of what we deem to be “true” or “false” today, and that we know will change over time. As a profession, we don’t want to go there. Simply bringing similar things together can both be powerful and highly provocative.

Mac, I know you understand this, but you are one of the few. I believe these sorts of basic powers of the catalog have been forgotten for a long time now.


by James Weinheimer at February 21, 2015 08:59 AM

February 20, 2015

First Thus

ACAT Genre term for untrue nonfiction

Posting to Autocat

On 1/21/2015 7:40 PM, J. McRee Elrod wrote:
> The word “Mythomane” has the meaning we are looking for, but would be > unknown to most patrons I suspect.
> “Hoaxes” is perhaps the best suggestion so far.
> Deciding what is a hoax and what is simply inaccurate would be > difficult. Rarely do people witnessing the same event agree about > what they saw. Also, would such a judgement breach our neutrality > policy?

It seems to me that the catalog as a whole handles this rather well right now. If we search for “The Protocols of the Wise Men of Zion” (the uniform title) in Worldcat as a subject (which is what we are talking about) we find:

The first individual records are (out of many):
1) The history of a lie, “The protocols of the wise men of Zion” : a study. by Herman Bernstein; John Retcliffe, Sir

2) The plot : the secret story of the Protocols of the Elders of Zion by Will Eisner; Umberto Eco

3) Warrant for genocide; the myth of the Jewish world-conspiracy and the Protocols of the elders of Zion by Norman Cohn

4) A lie and a libel : the history of the Protocols of the Elders of Zion by B W Segel; Richard S …

Already there is something similar that exists in Google. When I search for this boy’s book “The boy who came back from heaven” (at least I see):

1) The boy who didn’t come back from heaven: inside a bestseller’s ‘deception’

2) The Boy Who Came Back From Heaven, Or Not?

3) What If Heaven Is Not For Real?

4) and then there is the Wikipedia page that says in the second sentence, “The book, published by Tyndale House Publishers in 2010, lists Alex’s father Kevin Malarkey as an author along with Alex, though in November 2012 Alex described the book as “1 of the most deceptive books ever.””

It seems to me that this is more than adequate.

The catalog can be a very powerful tool so long as it is used correctly. After all, that was how it was designed to work and our predecessors were pretty clever people. And as other tools come along–as they are now–we can use them so that the catalog can become even more powerful. The task ahead of us is to make this power of these tools more obvious to the user. Already, the information and the technology exists to do it all.

In this case, the boy’s surname is a tip-off as well. (Yes, that is a joke!)

I hesitate to change the fundamental role of catalogers and their records. I think that somebody, somewhere has to play the role of “unbiased arbiter”. Otherwise, if we are to be the arbiters of truth and falsehood, I fear we may be going down the road to create our own, modernized form of the “Index of Forbidden Books” (


by James Weinheimer at February 20, 2015 10:24 PM

February 19, 2015

Mod Librarian

5 Things Thursday: DAMNY, Hulton Archive, Linked Data

Here are five more exciting things:

  1. Agenda announced for DAMNY. Super early bird discount ends tomorrow!
  2. Really cool video about the photos in the Hulton Archive. See a real card catalog and learn about wacky research requests.
  3. And the Hulton Archive landing page on Getty Images.
  4. Trying to understand Linked Data like I am? These tutorials are helpful.
  5. What is an ontologyversus a controlled…

View On WordPress

February 19, 2015 01:13 PM

February 14, 2015

Metadata Matters (Diane Hillmann)

The Jane-athon Report

I’ve been back from Chicago for just over a week now, but still reflecting on a very successful Jane-athon pre-conference the Friday before Midwinter. And the good news is that our participant survey responses agree with the “successful” part, plus contain a lot of food for thought going forward. More about that later …

There was a lot of buzz in the Jane-athon room that day, primarily from the enthusiastic participants, working together at tables, definitely having the fun we promised. Afterwards, the buzz came from those who wished they’d been there (many on Twitter #Janeathon) and others that wanted us to promise to do it again. Rest assured–we’re planning on another one in San Francisco at ALA Annual, but it will probably be somewhat different because by then we’ll have a better support infrastructure and will be able to be more concrete about the question of ‘what do you do with the data once you have it?’ If you’re particularly interested in that question, keep an eye on the site, where new resources and improvements will be announced.

Rballs? What the heck are those? Originally they were meant to be ‘RIMMF-balls’, but then we started talking about ‘resource-balls’, and other such wanderings. The ‘ball’ part was suggested by ‘tar-balls’ and ‘mudballs’ (mudball was a term of derision in the old MARBI days, but Jon and I started using it more generally when we were working on aggregated records in NSDL).

So, how did we come up with such a crazy idea as a Jane-athon anyway? The idea came from Deborah Fritz, who’d been teaching about RDA for some time, plus working with her husband Richard on the RIMMF (RDA In Many Metadata Formats) tool, which is designed to allow creation of RDA data and export to RDF. The tool was upgraded to version 3 for the Jane-athon, and Deborah added some tutorials so that Jane-athon participants could get some practice with RIMMF beforehand (she also did online sessions for team leaders and coaches).

Deborah and I had discussed many times the frustration we shared with the ‘sage on the stage’ model of training, which left attendees to such events unhappy with the limitations of that model. They wanted something concrete–they usually said–something they could get their teeth into. Something that would help them visualize RDA out of the context of MARC. The Jane-athon idea promised to do just that.

I had done a prototype session of the Jane-athon with some librarians from the University of Hawaii (Nancy Sack did a great job organizing everything, even though a dodgy plane made me a day late to the party!) We got some very useful evaluations from that group, and those contributed to the success of the official Chicago debut.

So a crazy idea, bolstered by a lot of work and a whole lot of organizational effort, actually happened, and was even better than we’d dared to hope. There was a certain chaos on the day, which most people accepted with equanimity, and an awful lot of learning of the best kind. The event couldn’t have happened without Deborah and Richard Fritz, Gordon Dunsire, and Jon Phipps, each of whom had a part to play. Jamie Hennelly from ALA Publishing was instrumental in making the event happen, despite his reservations about herding the organizer cats.

And, as the cherry on top: After the five organizers finished their celebratory dinner later in the evening after the Jane-athon, we were all out on the sidewalk looking for cabs. A long black limousine pulled up, and asked us if we wanted a ride. Needless to say, we did, and soon pulled up in style in front of the Hyatt Regency on Wacker. Sadly, there was no one we knew at the front of the hotel, but many looked askance at the somewhat scruffy mob who piled out of the limo, no doubt wondering who the heck we were.

What’s up next? We think we’re on the path of a new data sharing paradigm, and we’ll run with that for the next few months, and maybe riff on that in San Francisco. Stay tuned! And do download a copy of RIMMF and play–there are rballs to look at and use for your purposes.

P.S. A report of the evaluation survey will be on RDA-L sometime next week.

by Diane Hillmann at February 14, 2015 07:43 PM

February 13, 2015

TSLL TechScans

PCC news

Phase 3A of RDA changes to name authority records were completed in December 2014.

The announcement contains details on the changes and has links to other information of interest to NACO authority file users.

The Library of Congress’ Policy and Standards Division has also prepared a posting describing the changes:

The bulk of these changes seem to have affected music headings and were necessary for phase 3B, scheduled for April 2015.

Additionally, the PCC recently released a Vision, Mission and Strategic Directions document outlining its vision and direction for the period January 2015-December 2014.

by (Jackie Magagnosc) at February 13, 2015 09:04 PM

Common ground: Exploring compatibilities between the linked data models of the Library of Congress and OCLC
Godby, Carol Jean and Dennenberg Ray, Common ground: exploring compatibilities between the linked data models of the Library of Congress and OCLC, January 2015.

This white paper, jointly issued by the Library of Congress and OCLC Research, documents the areas of alignment and difference between OCLC's project and the Library of Congress' BIBFRAME initiative. The paper concludes with some recommendations for closer alignment of the two linked data projects.

The paper has been widely recommended and provides a worthwhile perspective as we think about future directions for our bibliographic data

by (Jackie Magagnosc) at February 13, 2015 09:02 PM

Managing Metadata

Next step

I’m happy to announce that I’ve accepted a position at The UCI Libraries as Head, E-Research & Digital Scholarship Services. Today is my last day at Caltech. I will resume my professional blogging at, where I’ve maintained a presence since 1998. My long-time host recently decided to close shop, so I’m in the process of migrating nearly 20 years of server cruft. Everything should be ready to launch by March, when I begin at UCI.

Thanks for the connection here during my 7 years at Caltech, I look forward to continuing our conversations.


by laura at February 13, 2015 06:20 PM