Planet Cataloging

November 20, 2014

First Thus

the joy of advanced cataloging

Posting to RadCat

On 11/20/2014 2:33 PM, Snow, Karen wrote:

I love Mann’s essay as well. It’s a good thing that I have all of my beginning cataloging students read that very essay and write a discussion post about it! As part of that assignment, I have them complete a search in I-Share that is similar to the one Mann talks about in “Peloponnesian War…” and discuss their search results. I tell them to pretend that they are college students looking for works on the therapeutic use of storytelling and they must search I-Share using a combination of “storytelling” “therapy” “therapeutic” “story telling” etc….). Then they must find the authorized LCSH for the topic and search again using it (“Narrative therapy”). Even those students who currently work in libraries say that the article and exercise are very eye-opening.

Even though I have had my debates with Thomas Mann, I do like much of his writing, and that includes this essay of his. But I question precisely where the problem is. Mann shows how complicated and difficult it is to use a catalog, but he goes on to lay out very clearly that if you use it right, it can do a lot for you. I wholeheartedly agree with him, but I’m afraid that it is becoming almost irrelevant for most people. Why?

Because the way catalogs work is based on methods that almost nobody does anymore. The methods are just too alien to 21st-century people: browsing by alphabetical order–truly obsolete in the era of keyword, relevance, sql, lucene, “intuitive search” and so on, but most especially–and weirdly for people today: in a catalog, people are not supposed to search for the information they want, but rather, they are supposed to search for how someone else (aka “the cataloger”) has decided to describe the information they want. That’s completely different and it is what Mann’s article on the Peloponnesian War is actually all about: finding terms that would never have occurred to the searcher in a thousand years, and using those terms to find the information they want.

I think it all made much more sense 25 years ago, when everyone was handling physical cards arranged in a card catalog, and where you couldn’t just take the cards out and rearrange them as you would like. To do something like that would have been *inconceivable*, but in our catalogs today, we do it all the time! So, back then it was pretty clear that you had to find the right grouping(s) that somebody else had already arranged, e.g. Mann’s example of someone who wants to know about tributes during the Peloponnesian War needs the heading “Finance, Public–Greece–Athens.” Who in the world could ever think of that?

Although the need to do so was rather clear back then, I believe that this way of thinking is too strange an idea for people to grasp today. When we try to teach young people to do this, we look like trudging old dinosaurs.

I think it is obvious that the catalog needs to change how it functions and Mann’s article is an excellent example of that (although I don’t believe that is what he intended). In my opinion, catalog-ing and catalog records do not need to change all that much though, because for catalogs to work even in the new environments, records must still be based on the over-riding rule of consistency. If you dump consistency, whatever is left might be called a listing, an inventory, an account, and so on, but it cannot be called a catalog.

And yet, if we expect that if a member of the public wants information, then it is their job to learn to follow Mann’s odyssey as laid out in his paper on searching the Peloponnesian War, then it’s game over! People won’t stand for that today and will turn (or have already turned) to other tools.

That’s why I asked in my earlier message: “Does it [advanced cataloging] mean cataloging within the current library-focused world of AACR2/RDA/LCSH/LCC/LCNAF/MARC21/FRBR or does it mean something else?” I know *lots* of people who would say that working with those tools is anything but advanced. I want to emphasize that I disagree with such a notion, but it is clear to me that the catalog must work much differently than it does now.

No matter how different the future catalog may work, the catalog records will still need to be consistent although–unfortunately–that seems to be changing.

For these reasons, I would say that in an advanced cataloging class it would be absolutely important to show how important consistency is, and how difficult it is to achieve, both in theory and in practice. But if you drop consistency, you must see the consequences very clearly. Also, people should become aware how various developments are threatening that consistency and what can be done about it. (I discussed this in my latest podcast, Metadata Creation–down and dirty. I just had to get in that plug!

FacebookTwitterGoogle+PinterestShare

by James Weinheimer at November 20, 2014 04:21 PM

Mod Librarian

5 Things Thursday: DAM LA, David Riecks, Taxonomy, Linked Data

Hello,

Here are yet another five things:

  1. Advanced Metadata “Snackinar” recording featuring David Riecks.
  2. Slideshare on Learning W3C Linked Data.
  3. Should we still care about Dublin Core?
  4. DAM LA is happening now. Watch this spot for interesting things…
  5. How to show related posts by taxonomy in WordPress. I should give this a shot…

View On WordPress

November 20, 2014 01:12 PM

November 19, 2014

TSLL TechScans

PCC BIBFRAME web page

Paul Frank, along with the PCC Secretariat, have created a new webpage, BIBFRAME and the PCC, to help librarians learn about the BIBFRAME initiative and understand development of a future bibliographic ecosystem. The creators hope that this page will function as a central source for information, documentation and updates on the PCC's involvement with BIBFRAME.

Of particular interest is a short paper, authored by Paul Frank, entitled BIBFRAME: Why? What? Who? describing the basics of BIBFRAME and why it is being developed.

by noreply@blogger.com (Jackie Magagnosc) at November 19, 2014 08:37 PM

November 18, 2014

First Thus

ACAT Qualifying filmed stage productions

Posting to Autocat

On 18/11/2014 2.56, Thomas, Kirsti wrote:

Several productions of Richard II have been done in the last few years, so I think it’s important to distinguish this “2013 RSC production starring David Tennant” from other versions like the “2012 BBC Two production starring Ben Whishaw” or the “2011/2012 Donmar Warehouse production starring Eddie Redmayne” or even the “1978 BBC Shakespeare production starring Derek Jacobi.” Our users are typically looking for specific versions by specific directors or with specific actors, so I come down on the side of providing a qualified uniform title. I guess the new RDA term for that is “Authorized Access Point Representing an Expression” ;)

If I understand correctly, this seems to be equating individual performances with expressions. This would be something new I think. For instance, in music if someone wants a copy of Beethoven’s 5th Symphony, they search for

Beethoven, Ludwig van, 1770-1827. Symphonies, no. 5, op. 67, C minor.

and then they select what they want from the different records. But you cannot further specify *within the heading* that I want one conducted by Von Karajan, or Bernstein. So, catalogers do not create headings with specific conductors such as:
Beethoven, Ludwig van, 1770-1827. Symphonies, no. 5, op. 67, C minor. Toscanini, Arturo, 1867-1957.

or with specific orchestras:
Beethoven, Ludwig van, 1770-1827. Symphonies, no. 5, op. 67, C minor. NBC Symphony Orchestra.

and we certainly do not create something like:
Beethoven, Ludwig van, 1770-1827. Symphonies, no. 5, op. 67, C minor. Toscanini, Arturo, 1867-1957. NBC Symphony Orchestra. Carnegie Hall, March 22, 1952.

All of that information goes into the *record*, but not in the heading. The catalog itself is supposed to provide that access but people actually had to do some work. In a traditional catalog, such as in the LC catalog, we can see how it has always worked. This is a search for the uniform title for Beethoven’s Fifth, and people are still expected to examine each record to choose which one they want.

Of course, it’s easier with keyword searches. Even the search for the specific performance works in Worldcat:
Beethoven, Ludwig van, 1770-1827. Symphonies, no. 5, op. 67, C minor. Toscanini, Arturo, 1867-1957. NBC Symphony Orchestra. Carnegie Hall, March 22, 1952

Things also change in faceted catalogs. Here is the search for just the uniform title in Worldcat

Today with the facets, we can see different conductors: Furtwangler, Toscanini, etc., or you can limit by date (i.e. date of production, not date of performance). Facets can be made from any fields in the record. This means that it would be easy enough to make corporate bodies display in the facets so that you could limit by NBC Symphony Orchestra (if it has been put into the record), which for some reason do not do so now. Everything can be changed or improved in almost any way someone would want.

On the other hand, a translation of a libretto of an opera warrants a new expression, e.g.

Wagner, Richard, 1813-1883. Ring des Nibelungen. Libretto. English

Changing the idea of the expression for works of the performing arts (theater, film, music, etc.) so that an expression is determined not only by the author and the piece of music (Beethoven, 5th symphony) but also by the performer(s) and perhaps even by the individual performance is an interesting idea. I know that when searching music on YouTube for, e.g. a Rolling Stones song, I want the Rolling Stones and not something recorded by little Johnny’s garage band. Or, I may want something *very specific* such as the Stones’ “Under My Thumb” but not just any one. I want a specific performance: the one at Altamont in 1969 where the Hell’s Angels killed the spectator and many things in society changed after that. This was a historic and important performance–not just any performance by the Stones.

It works in Google for the actual performance! https://www.google.it/search?q=rolling+stones+under+my+thumb+altamont
(it ends just before the violence at the end. Not a very good performance but they were all obviously very unhappy)

For catalogers to give that kind of access through formal headings would be quite a bit more work than what we do now. Prudence dictates that we should first determine if the extra effort is warranted (and sustainable!), especially when it can be demonstrated that people can find these materials right now in other ways. I think we should just let the catalog work its magic, so that when we search for keywords, we get it (as happens now) or put IT to improve the facets. That would be a lot cheaper and easier than adding zillions of new “expressions”.

FacebookTwitterGoogle+PinterestShare

by James Weinheimer at November 18, 2014 08:37 PM

Coyle's InFormation

Classes in RDF

RDF allows one to define class relationships for things and concepts. The RDFS1.1 primer describes classes succinctly as:
Resources may be divided into groups called classes. The members of a class are known as instances of the class. Classes are themselves resources. They are often identified by IRIs and may be described using RDF properties. The rdf:type property may be used to state that a resource is an instance of a class.
This seems simple, but it is in fact one of the primary areas of confusion about RDF.

If you are not a programmer, you probably think of classes in terms of taxonomies -- genus, species, sub-species, etc. If you are a librarian you might think of classes in terms of classification, like Library of Congress or the Dewey Decimal System. In these, the class defines certain characteristics of the members of the class. Thus, with two classes, Pets and Veterinary science, you can have:
Pets
- dogs
- cats

Veterinary science
- dogs
- cats
In each of those, dogs and cats have different meaning because the class provides a context: either as pets, or information about them as treated in veterinary science.

For those familiar with XML, it has similar functionality because it makes use of nesting of data elements. In XML you can create something like this:
<drink>
    <lemonade>
        <price>$2.50</price>
        <amount>20</amount>
    </lemonade>
    <pop>
        <price>$1.50</price>
        <amount>10</amount>
    </pop>
</drink>
and it is clear which price goes with which type of drink, and that the bits directly under the <drink> level are all drinks, because that's what <drink> tells you.

Now you have to forget all of this in order to understand RDF, because RDF classes do not work like this at all. In RDF, the "classness" is not expressed hierarchically, with a class defining the elements that are subordinate to it. Instead it works in the opposite way: the descriptive elements in RDF (called "properties") are the ones that define the class of the thing being described. Properties carry the class information through a characteristic called the "domain" of the property. The domain of the property is a class, and when you use that property to describe something, you are saying that the "something" is an instance of that class. It's like building the taxonomy from the bottom up.

This only makes sense through examples. Here are a few:
1. "has child" is of domain "Parent".

If I say "X - has child - 'Fred'" then I have also said that X is a Parent because every thing that has a child is a Parent.

2. "has Worktitle" is of domain "Work"

If I say "Y - has Worktitle - 'Der Zauberberg'" then I have also said that Y is a Work because every thing that has a Worktitle is a Work.

In essence, X or Y is an identifier for something that is of unknown characteristics until it is described. What you say about X or Y is what defines it, and the classes put it in context. This may seem odd, but if you think of it in terms of descriptive metadata, your metadata describes the "thing in hand"; the "thing in hand" doesn't describe your metadata. 

Like in real life, any "thing" can have more than one context and therefore more than one class. X, the Parent, can also be an Employee (in the context of her work), a Driver (to the Department of Motor Vehicles), a Patient (to her doctor's office). The same identified entity can be an instance of any number of classes.
"has child" has domain "Parent"
"has licence" has domain "Driver"
"has doctor" has domain "Patient"

X - has child - "Fred"  = X is a Parent 
X - has license - "234566"  = X is a Driver
X - has doctor - URI:765876 = X is a Patient
Classes are defined in your RDF vocabulary, as as the domains of properties. The above statements require an application to look at the definition of the property in the vocabulary to determine whether it has a domain, and then to treat the subject, X, as an instance of the class described as the domain of the property. There is another way to provide the class as context in RDF - you can declare it explicitly in your instance data, rather than, or in addition to, having the class characteristics inherent in your descriptive properties when you create your metadata. The term used for this, based on the RDF standard, is "type," in that you are assigning a type to the "thing." For example, you could say:
X - is type - Parent
X - has child - "Fred"
This can be the same class as you would discern from the properties, or it could be an additional class. It is often used to simplify the programming needs of those working in RDF because it means the program does not have to query the vocabulary to determine the class of X. You see this, for example, in BIBFRAME data. The second line in this example gives two classes for this entity:
<http://bibframe.org/resources/FkP1398705387/8929207instance22>
a bf:Instance, bf:Monograph .

One thing that classes do not do, however, is to prevent your "thing" from being assigned the "wrong class." You can, however, define your vocabulary to make "wrong classes" apparent. To do this you define certain classes as disjoint, for example a class of "dead" would logically be disjoint from a class of "alive." Disjoint means that the same thing cannot be of both classes, either through the direct declaration of "type" or through the assignment of properties. Let's do an example:
"residence" has domain "Alive"
"cemetery plot location" has domain "Dead"
"Alive" is disjoint "Dead" (you can't be both alive and dead)

X - is type - "Alive"                                         (X is of class "Alive")
X - cemetery plot location - URI:9494747      (X is of class "Dead")
Nothing stops you from creating this contradiction, but some applications that try to use the data will be stumped because you've created something that, in RDF-speak, is logically inconsistent. What happens next is determined by how your application has been programmed to deal with such things. In some cases, the inconsistency will mean that you cannot fulfill the task the application was attempting. If you reach a decision point where "if Alive do A, if Dead do B" then your application may be stumped and unable to go on.

All of this is to be kept in mind for the next blog post, which talks about the effect of class definitions on bibliographic data in RDF.

by Karen Coyle (noreply@blogger.com) at November 18, 2014 10:39 AM

November 17, 2014

Resource Description & Access (RDA)

RDA Blog Reaches 200000 Pageviews

Hi all, I am pleased to announce that RDA Blog has crossed 200000 pageviews mark. It is interesting to note that the first hundred thousand pageviews came in 3 years, but it took just 8 months to reach another hundred thousand pageviews.
Thanks all for your love, support and suggestions. Please post your feedback and comments on RDA Blog Guest Book. Select remarks will be posted on RDA Blog Testimonials page.

click on image to enlarge


INTRODUCTION TO RDA BLOG:


RDA Blog is a blog on Resource Description and Access (RDA), a new library cataloging standard that provides instructions and guidelines on formulating data for resource description and discovery, organized based on the Functional Requirements for Bibliographic Records (FRBR), intended for use by libraries and other cultural organizations replacing Anglo-American Cataloging Rules (AACR2). This blog lists description and links to resources on Resource Description & Access (RDA). It is an attempt to bring together at one place all the useful and important information, rules, references, news, and links on Resource Description and Access, FRBR, FRAD, FRSAD, MARC standards, AACR2, BIBFRAME, and other items related to current developments and trends in library cataloging practice. 

              RDA BLOG HIGHLIGHTS IN 1-MINUTE VIDEO PRESENTATION              

by Salman Haider (noreply@blogger.com) at November 17, 2014 09:56 PM

First Thus

ACAT RDA Training for Reference Services?

On 16/11/2014 22.25, Callie Blackmer wrote:

This is an assignment for my Cataloging and Classification course so any thoughts would be greatly appreciated:

What is your understanding of the RDA cataloging standards? I came across an article by Teressa M. Keenan (permalink below) in which she discusses how references services in the library are affected by the shift to RDA standards and why it is important for reference librarians to understand how RDA works so that they are better equipped to direct patrons through the catalog. Do you agree with Keenen, is it important to reference services for non-cataloger librarian to learn cataloging standards?

My own thoughts on this are first: of course, professionals in any field need to learn their tools and to keep current with changes. If you are a dentist, a doctor, a lawyer, a faculty member, a mechanic, a butcher, or in almost any field, all of them are changing and each professional must keep up with those changes, whether you agree with them or not. Librarians–be they in cataloging or in reference–are no different. They must all stay current on what is going on in their field.

Nevertheless, I would say that cataloging has gotten such a bad rap, especially in the last few decades, that it is very difficult for lots of non-catalogers to see the importance of any changes. There has always been a divide between reference and cataloging, but from my experience it has gotten more serious. I have heard several reference librarians say that the problem *is* the catalog, and when you add people in systems departments, there are even more challenges. Lots of people know about stories such as “Thinking the unthinkable: a library without a catalogue: Reconsidering the future of discovery tools for Utrecht University library” (http://libereurope.eu/news/thinking-the-unthinkable-a-library-without-a-catalogue-reconsidering-the-future-of-discovery-tools-for-utrecht-university-library/) and “Giving up on discovery” (http://taiga-forum.org/giving-up-on-discovery/). For all different kinds of reasons, I think that it will be difficult to convince many non-catalogers that the changes in cataloging rules are going to have any major impact on the day-to-day activities of users.

Let’s take a very normal example that happens quite literally every day: someone wants an article that they have found cited somewhere. How do you find it? The traditional method says: when you find the citation, note down the name of the journal. (If you don’t have the name of the journal, it is practically impossible to find it) Then you go to the catalog, look for the name of the journal to see if the library has it. If it does, then look at the holdings to see if the library has the exact issue you want. If you can’t find it or you have problems, ask a reference librarian. There were always problems with that: complicated records (maybe you are looking at an earlier or later name of the title; maybe a wrong form of title was cited; key titles always confused people), there are terribly complicated holdings statements and so on and on.

What is the best way of doing it today? It is completely different, and you don’t even have to use the catalog. To take an example from the article above, “Giving up on discovery” http://taiga-forum.org/giving-up-on-discovery/, the first comment is by Peter Murray and he says “I also recommend looking back at David W. Lewis’ A Strategy for Academic Libraries in the First Quarter of the 21st Century”. While he gives the citation and a link here lots of others do not.

Now however, all you have to do is highlight the name and title “David W. Lewis A Strategy for Academic Libraries in the First Quarter of the 21st Century” (you don’t need the journal title) then right click and search Google automatically, and you get some great results. (At least I do) The very first is the actual article, and the second is something perhaps even more important: it is David W. Lewis’ Google Scholar page, where you can see more of his writings, plus (very important!) I learn that this article was cited 101 times and I click on those articles right now! The later articles may be even more important to me than this original one.

What does the searcher need to know? Mechanical skills: select text and right click. I can’t imagine anything much easier and there is no comparison with the older methods. It’s also nice if the users know that it is possible to add different search engines and how to do it. Of course, this method doesn’t work all the time, but it works a lot of the time and will work more and more often as more materials come online. I think it should be one of the first methods tried. If it fails, OK: try something else.

Compare this to users searching the library catalog for the individual article. Either they won’t find it (because journal articles are not in there) or there is the “single search box” syndrome, which mashes everything together and has its own problems. (I have discussed this at length in an earlier post http://blog.jweinheimer.net/2014/10/consistency-was-conflicting-instructions-in-bib-formats-about-etds-being-state-government-publications.html)

What I am trying to say from all of this is that while it is very important for reference librarians to keep up with changes in cataloging so they can use it in their practice, the opposite is just as true: catalogers should be learning and adapting to changes among the users, and this is best done through communication with the reference librarians. The world of research is changing in fundamental ways, as is the overwhelming importance of the catalog. The catalog is still immensely important, but it too must adapt to the new realities.

I am sure we are only at the very beginnings of the changes in catalogs–and not all of them will necessarily be for the better.

FacebookTwitterGoogle+PinterestShare

by James Weinheimer at November 17, 2014 02:24 PM

November 16, 2014

The Feral Cataloger

cbtarsala

Note: As part of a marketing campaign for my proposed classification textbook, I prepared this introduction to BISAC for cataloging students. The original plan was to give it out as a freebie at ALA in Las Vegas to promote the book. Sadly, neither Las Vegas nor the book happened. I am posting it here for the greater public good.

Foreword

Confession: I love DDC. Before I started to research BISAC I wasn’t very impressed with it. Now I have a healthy respect for it. It’s a good scheme for what it does, but I’m afraid that many people who are promoting it are doing so for commercial purposes.

At the end of this section you will be asked to answer the key question: evaluate BISAC as a classification using the process in this chapter. [note: not included in this excerpt]  That’s what the professional should do.

General Background on BISAC

Ditch Dewey! Undo Dewey! Again and again you will read news reports about librarians who are replacing Dewey Decimal Classification in their collections, all the while making awful rhymes and puns as they do (do-ey!) it. At conferences anti-Dewey advocates will sometimes pitch their alternate systems. More often they will promote a system called BISAC. BISAC stands for “Book Industry Standards and Communications.” It is the subject category system used in bookstores. Because BISAC has become more mainstream in the past decade, you might someday work at a library that will debate whether to use it or not.

BISAC is a list of subject headings that are used to express the topical content of books. In a formal information science context, you would call them “descriptors.” There are over 3000 BISAC subject headings available, and they are arranged under fifty-one major headings. Only the major headings have scope notes and usage information.

The BISAC subject headings are hierarchical strings. Here is an example: PETS/Dogs/Breeds. PETS is the major heading, and it is the hierarchical relationship in the string that classifies the concept. The hierarchy is limited to two or three subdivisions below the major heading. PETS/Dogs/Breeds is the most specific level for Dogs. There are no subject headings for particular dog breeds.

Take a look at all the subject headings under the major heading PETS.

PET000000          PETS / General

PETS / Amphibians see Reptiles, Amphibians & Terrariums

PETS / Aquarium see Fish & Aquariums
PET002000          PETS / Birds
PET003000          PETS / Cats / General
PET003010          PETS / Cats / Breeds

PETS / Cooking for Pets see COOKING / Pet Food

PET004000          PETS / Dogs / General
PET004010           PETS / Dogs / Breeds
PET004020          PETS / Dogs / Training
PET010000          PETS / Essays & Narratives
PET005000          PETS / Fish & Aquariums
PET012000          PETS / Food & Nutrition *
PET006000          PETS / Horses
PET011000           PETS / Rabbits, Mice, Hamsters, Guinea Pigs, etc.
PET008000          PETS / Reference
PET009000          PETS / Reptiles, Amphibians & Terrariums

Here are some things to notice:
• The subject headings are arranged alphabetically under each major heading.[1]
• If a subject heading has subdivisions, there is always a heading ending with “/ General.” Therefore, you see PETS / Birds covering all books about birds, but PETS / Dogs / General for the books about dogs that are not about Breeds and Training.
• Each descriptor has a unique code number, but the code notation is not expressive.
• The code starts with three letters to represent the major heading followed by a six-digit number.
• The numbers of the codes are not related to the alphabetical order of the subject headings. However, they do express the hierarchical level of the descriptor. Compare the code for PETS / Dogs / General and PETS / Dogs / Breeds.
• An asterisk marks a newly-added subject heading.
• There are two kinds of cross-references. One kind leads you to another subject heading under the same major heading. Another kind sends you to a different major heading.

(The information above is from the http://www.bisg.org’s subject headings faq. Accessed 1/8/14)

BISAC in Its Native Habitat

BISAC comes from the Book Industry Study Group’s Subject Codes Committee. The Committee updates BISAC every year, and you can view the current edition online at the BISG website. American and Canadian publishers assign the subject headings as part of a complete metadata record that is used to market the book.

As happens any living classification scheme, the annual update of the descriptors indicates that the scheme is getting more detailed and expanding. BISG guidelines ask publishers to go through the change list every year and update the categories to the most current. If you are using BISAC as a shelf arrangement tool, this is something you must monitor and respond to in order to keep your browsing categories up-to-date.

BISAC also offers “extensions” that target specific audiences. There are “Merchandising Themes” for groups of people, events, holidays and topics. Examples of Merchandising Themes are CULTURAL HERITAGE / Asian / Korean or EVENT / Back to School or HOLIDAY / St. Patrick’s Day or TOPICAL / Boy’s Interest. BISG has recently developed an extension for Regional Themes, and it is discussing a new extension for Common Core. Some of these extensions will have relevance for libraries, but currently only the regular subject headings are included in library catalog records.

Another example of BISAC’s growth is the committee’s development of  “Regional Themes” classification, which allows publishers to add a seven-digit hierarchical code to the record, allowing them to specify the location about which the work is written. The enumerated codes were only assigned to places which have “more than 100 titles” about them. So you will not find many codes in the seventh position, which represents borough/neighborhood/district. The only cities where they are used now are for parts of New York City or Los Angeles.

Example: 4.0.1.6.3.1.1 = Beverly Hills. Los Angeles. California. Western & Pacific States. USA. [Zero is an undefined part of the higher level area. In this case, that's the continent]. North America.

It may be interesting for you to learn how a major retailer like Amazon uses BISAC. Self-publishers of books on amazon are told to choose “up to 2 categories” from BISAC. In certain sub-categories (Romance, Science Fiction & Fantasy, Children’s, Teen & Young Adult, Mystery, Thriller & Suspense, Comics & Graphic Novels, Literature & Fiction, and Erotica), Amazon requires “search keywords” to be added. These are Amazon-specific descriptors.

For example, if you choose the BISAC subject heading FICTION/Romance/Paranormal/Witches & Wizards, you must supply at least one of the following keywords: witch, wizard, warlock, druid, shaman. Amazon also has some BISAC-like subject headings of its own. Romance/Sports is an example, and it also requires you to choose one of the following additional keywords: sport, hockey, soccer, baseball, basketball, football, olympics, climbing, lacrosse, nascar, surfing, boxing, martial arts, golf. In Amazon advanced search you can search for the keywords and add the BISAC subject heading from a drop-down list. This gives you additional power for genre fiction searches. To test it, I did an Amazon search and came up with over one hundred golf romances!

In the future you may see publishers using multiple subject classifications in their metadata records, because BISAC is only one of many available to them. Outside of North America, English-language publishers use a different classification called Book Industry Communication (BIC). There are also other national book industry schemes, and there is a new, multilingual, international subject classification called Thema that was released at the Frankfurt book fair in September 2013. Even though North American publishers will continue to use BISAC, it is important for you to remember that BISAC exists in a landscape of international marketing of books.

What to Expect from BISAC Metadata

BISG’s Metadata Committee gives publishers instructions about how to apply the subject headings in their manual, Product Metadata Best Practices. Everything in the human-crafted element comes from the publisher—an editor or a “marketing department associate.” If these people are following the guidelines in BISG’s best practices, here is what you can expect in accepting downstream subject headings from them.

There will be a “main subject.” Beyond that, BISG recommends “no more than three” and that number is confirmed by info from the large publisher Random House. (Andrea Bachofen Via Random House Random Notes) There are guidelines that encourage the most specific fit. Editors are warned not to add general headings as well as specific, and not to assign conflicting exclusive classifications. In particular, you cannot have a book carry both juvenile audience and non-juvenile headings. Publishers should map their in-house categories to BISAC, and most bookstores map their floor plans to BISAC on the other end. With this consistency in place, the headings may be used to identify category best-sellers.

You should remember that there is an absolutely unbridgeable divide between juvenile and adult subject headings in BISAC. As a classifier you must choose between one or the other, if comparable headings exist. You cannot add both of them. The strings and the codes are completely different. JNF015000 JUVENILE NONFICTION/Crafts & Hobbies should not be assigned with CRA043000 CRAFTS & HOBBIES/Crafts for Children or CRA023000/CRAFTS&HOBBIES/Origami.

There is a weird heading called “Non-classifiable.” Non-classifiable is used only for blank books, those decoratively bound things that people buy to use as notebooks or journals.

Where can you get BISAC subject headings for the works in your library?

• Off the Book: Publishers assign BISAC codes to their products according to to their own internal standards, and BISG encourages them to put the headings near the bar code in an easy-to- spot location for bookstore owners as they arrange their stock.
• Increasingly, catalog records may include the codes or the descriptors. The Library of Congress started adding them routinely to Electronic CIP. WebDewey includes BISAC headings as an access method to DDC numbers, so you can switch back and forth at some hierarchical levels.

BISAC in MARC Records

Publishers do not use MARC to encode their metadata. The publishing standard for metadata encoding is ONIX, which must be crosswalked into MARC databases. Some libraries take ONIX records directly from the publisher to load into their catalogs, especially for e-books, which don’t have readily available copy in the library source databases. For new books with cataloging-in-publication metadata, however, the Library of Congress crosswalks BISAC into MARC records.

BISAC and other bookseller codes are added to MARC21 records in field 084. You will not see any of the subject headings in a record when you load it into your local catalog because it’s coded as a classification. However, it is possible to generate the headings on display with the information in the $2 subfield (source of data) and a list of the BISAC codes. (You must agree to the EULA to include BISAC in your catalog). A growing percentage of the total records in Worldcat have 084 fields. 40 million out of 311 million in the database have field 084, which is used for any additional classification scheme beyond DDC or LCC. When 084is used for BISAC, it would be in new releases through downstream ONIX metadata from major publishers.

The 084 field is a small but mighty field that is easy to overlook because it is not easy to decode by reading. However, LC/pcc records do seem to add it if the metadata is readily available. However, it seems unlikely that major agencies are assigning BISAC as a routine part of their cataloging workflow if it must be generated locally or researched on amazon or some other database. Some catalogs strip BISAC as part of their copy cataloging processes so the MARC record stored contains only the library classification field. This is something you must investigate and trouble-shoot if you want to switch to BISAC.

Review the ideas.

Discussion question: Is BISAC a formal classification system, as defined in the previous section? How does it rate on the evaluation?

Working with BISAC: For the following book, evaluate the assignment of its BISAC subjects using the BISAC subject heading list online and information about correct assignment included in this post. Are they correctly or incorrectly chosen by the publisher? If incorrect, what would be better choices?

Cat Sense: How the New Feline Science Can Make You a Better Friend to Your Pet by biologist John Bradshaw presents scientific information about domestic cats to a popular audience.
Here are the Library of Congress Subject Headings found in its catalog record:
• Cats—Behavior.
• Cats—Psychology.
• Human-animal relationships.
• Cat owners.

Here are the BISAC subject headings found in its catalog record:
• PETS / Cats / General
• SCIENCE / Life Sciences / Zoology / Mammals
• SCIENCE / Life Sciences / General

Answer: The last two are wrong. SCIENCE is used for works aimed at professionals, and you would never use the general heading if you have a more specific one because the more general is implicit.
NAT019000 NATURE/Animals/Mammals would be preferred as the second subject heading for this work.

Bradshaw’s other book is The Behaviour of the Domestic Cat 2nd ed.(John W S Bradshaw, Rachel Casey, Sarah Brown (Wallingford: CABI, 2012). It is aimed at his fellow scientists/zoologists/anthrozoologists. Both of Bradshaw’s books are classed the same in DDC and LCC. 636.8 in DDC (cats); SF446.5 in LCC (Animal culture — Pets — Cats — Behavior) Neither DDC nor LCC distinguishes between academic and popular treatments of subjects.

In the example above, also note the difference between the BISAC and LCSH strings. LCSH does not include higher levels of a hierarchy in its headings. With LCSH you search for terms directly and specifically, but cannot see a term’s place in the overall structure of the vocabulary without access to a cross-reference structure. These are issues you want to consider when you include both vocabularies as subject headings in your catalog records.

I end with the same advice that I started with: If your library is seriously considering BISAC as a replacement for traditional library classification and subject access, you must evaluate it critically and carefully, because it is not a simple, universal substitute for library-specific classification.


by Cheryl Boettcher Tarsala at November 16, 2014 06:43 PM

Slide17

It was gratifying to have a thousand people register for my free webinar on cuttering, and then to see over five hundred of them log in when I was presenting.  It shows the continuing need for continuing education in traditional cataloging knowledge. Or maybe it shows that lots of people attend when free webinars are offered. No matter.

Here’s a summary of what I covered:

Are you curious about Cutters? Maybe a little confused? This free webinar will reveal how Cutter’s alphanumeric book numbering systems work. You will learn how to recognize different types of cutter numbers and how to construct them for yourself.

  • Principles of alphanumeric numbering systems
  • Types of Cuttering
    • Cutter Two-Figure, Three-Figure and Cutter-Sanborn
    • Cutters for Library of Congress Classification
  • How to Use the Cutter-Sanborn Table
  • Different Uses for Cuttering in Library of Congress Classification
  • Basic Use of the LC Cutter Table

If you weren’t able to attend the webinar, here’s a link to the session recording. The slides are available at SlideShare. And here’s the resource sheet with all the links that I refer to in the presentation.

Thanks to ALA Editions for hosting it.Slide17


by Cheryl Boettcher Tarsala at November 16, 2014 03:21 PM

November 13, 2014

Mod Librarian

5 Things Thursday: More DAM, Portland Art Museum, NYPL, Viewshare

Here are 5 more things:

  1. Criteria for shopping for an appropriate digital asset management system.
  2. NYPL explores “The Networked Catalog.”
  3. Learn about new features in Viewshare visualization software.
  4. How digital asset management helps museums.
  5. A case study on the Portland Art Museum and Extensis Portfolio.

View On WordPress

November 13, 2014 01:16 PM

November 10, 2014

First Thus

Metadata Creation–Down and Dirty (Updated)

Metadata Creation:
Down and Dirty

James L. Weinheimer

[This was originally written in 1999 and was cited in several places at that time. That text is now in the Internet Archive but without the images, so I am putting it on my blog, replacing the images more or less as I remember them, made some edits for additional clarity, and added an update at the end. I cannot remember one image at all, so I have deleted the reference in this version.]


How is metadata created? How should it be created? Who should do it? Who WANTS to do it?

These are not the usual questions that draw spirited discussion. While there are many articles and discussions about different metadata formats, there is very little discussion about how to go about making the metadata itself, i.e. the information that goes inside the metadata format

<META NAME=”Creator,Title,Format,etc.” CONTENT=”The part that hasn’t been discussed very much.”>

Traditional metadata creation has been practiced for millennia under various names and the actual procedures have changed tremendously. In its basics however, it has always remained the same. Its fundamental purpose is: to bring similar items together. The idea of bringing similar items together ensures that a user does not have to look in dozens–or hundreds–of different places for the same thing. These items can be similar in all sorts of ways according to the needs of the collection: it may be items by same author, or items using the same methodology, the same printers, the same colors, or anything at all.

This may sound simple enough but as happens so often, the practice of a simple rule results in incredible complexity.

Let’s start with a very simple example: deciding where to place a physical item in some sort of subject arrangement. Since it is a single physical item it can go only into a single place in the arrangement. Let us further suppose that this collection and its arrangement has been around for a long time and that many people have worked on it and used it. For our purposes, this subject arrangement is the Library of Congress Classification (it has been around for about 100 years) and its outlines are reproduced below.
subjects
Let us assume that I have a new item to add to the collection. The topic of my item is somewhat complex, let us assume that it is a book of photos of naval fighter planes used during WWII in the Pacific, and it could go into several places in the arrangement. It could go under photography, WWII, airplanes, military avionics, naval studies, or if they were taken by a single person even under the photographer. I have a good understanding of the classification and feel that my item belongs best in a certain place, naval science:
subjects1
Is this correct? After all, this is where I truly believe it should go. But, if I just put the item where I believe it should go, I may be making a serious mistake. Why? We must always remember that the purpose of all of this work is to put similar items together. Therefore, I must look to see if there are similar items are in the collection and where they are.

In this case, I find 25 similar items and discover that they have always been placed into a different area, under WWII. While I realize this area is also a possibility, I do not think it is nearly as good as mine. (This happens all the time)
subjects2a
Now I am faced with a dilemma. I actually disagree with the way the items have always been treated. What am I to do?

The answer may be surprising. If I decide that it is just a difference of opinion and simply put the item where I think it goes and forget it, it’s obvious that people will have to look in two different places to find all the similar items. How are people supposed to know that?
subjects2a
Also, when the collection inevitably gets a similar item in the future, that librarian may think that these items belong in still another area. It turns out that if everybody did whatever they wanted, similar items inevitably would be scattered all over the place. This is not how information retrieval is supposed to work. Therefore, if I am resolved to put my item where I believe it should go, I must take the 25 similar items that were already cataloged, and rework them in some way so that they will all stay together.
subjects4
Now, what have I accomplished? Not only have I created a lot of additional work for myself and other library staff (which makes my library colleagues angry), I’ve also made the users angry because I have removed the items from the areas they were accustomed to finding them. They knew how to find things before, but now I have forced them to go searching all over again.

In a few years’ time, a new item on a similar topic will inevitably arrive and that librarian may decide that where I have placed the books is wrong. He or she will move the books yet again—back to where they were originally or someplace different—thereby sending people wandering around again. This can go on forever, causing lots of extra work for everybody and only making things harder to find. The only practical solution is to swallow what I believe to be “correct” and place my item where everyone has been accustomed to finding such things, in this case, under WWII with the similar materials.
subjects1a
So, have I done something that is actually incorrect? No, because what is “correct” in terms of information access is different from other senses of “correct.” If I really believe my item is basically different from the others already there, then that would be another situation, but if I think it is the same, I have no real choice. “Correct” in the sense we are discussing here is: bringing similar items together. This is also called consistency.

The only time I will change what has already been done is when I have found a true error, which rarely happens. For instance in one collection I worked in, I found the works of D.H. Lawrence and T.E. Lawrence had been mixed together. I moved the items by T.E. Lawrence. Most of the time however, it amounts merely to a difference of opinion.

The natural response at this point is: “What does all this have to do with other sorts of items? Surely you’re talking about books on shelves here, but links to websites can be placed in more than one point on the line. In fact, the entire concept of lines is irrelevant, too, with the added possibilities we have with computers. Books can only go in one place, unless you want to buy extra copies.”

To answer this, let us continue to imagine a digital article that is placed somewhere on the web. A person has written something you violently disagree with; everything the fellow says is wrong. You disagree so much that you write a withering response and also place it on the web.

Now the task is to ensure that when people find one article, they find the other one.

You easily can guarantee that when people find your article, they will know about the first article merely by adding a link from your article to the other one, e.g.

To read this fellow’s ridiculous article, click HERE.

Things are rather more difficult in reverse, however. How can you ensure that when people find his page, they will find your page? That is what you really want, after all. One thing you could try would be to contact the fellow and politely ask if he would make a link to your article–which he probably wouldn’t do.

Let’s further assume that this fellow has added metadata to his paper and that you have examined it. You disagree with every word he has chosen and decide to use other, much better words in your metadata.

His Metadata

Your Metadata

<META NAME=”Keywords” CONTENT=”Inferior words”>

<META NAME=”Keywords” CONTENT=”Other, much better words”>

What have you succeeded in doing? You have just guaranteed that when people search the metadata used in his page, no one will find your page. Just as before, people would have to look in two different ways for the same things. How can people know that?

Remember that the purpose is to bring similar things together. So, even though you disagree with every word the fellow says, and you are free to write whatever you want in your article, if you want people to know about your article when they find his article, you must use the same metadata, even though it irks you, just as much as it did when we put the book in the “wrong” place.

Someday advanced search engines may allow searchers somehow to find the URL in your paper and bring your article together with his in some way, but let us further suppose that someone else disagrees with both articles and has written still another page on the same topic. This author is fed up with both of you and decides to ignore your writings completely. This fellow certainly wants searchers to find his article when they find yours. Neither of your URLs appear in the page so that can’t work. But what sort of metadata should this author use?

It becomes clear: the same metadata as you two have used.

In this way, we can see that standardized terminology is a natural outgrowth of the primary task of bringing things together. It works on the Web just as it does in other media. The task is to describe similar items in a consistent fashion. [This will be discussed further in the Update]

More often than not, it turns out that a topic is expressed in more than one word, or that the subject of an article encompasses more than a single topic and can expressed in various ways. For example, an item may be about the Aesthetic movement as it was seen in the architecture during the late 1800’s in England. Any of the aspects of this subject may be handled in very special ways. Obviously, the tasks of doing this analysis and relating it to the collection can become highly complex.

In some of the earliest years of information retrieval, various goals were laid down that determined what would constitute a successful–or unsuccessful–information structure. Among other things, an information retrieval system has allowed a user to find what the collection has by its authors, titles, and subjects. [This will be discussed further in the Update]

It is important to note that this doesn’t mean users should be able to find just a few works by a certain author, they should be able to find everything in the collection by that author. Therefore, when users find the author Dostoyevsky, Fyodor, they should find everything that is in the collection by this author no matter how his form of name appears on any item. Or when they search Aesthetics movement, they should find everything. The forms of the words used in the metadata are based on bringing all works with the topic under a single form of name. Many times this form is based on the first item entered into the collection–a practice that should be more understandable now.

In reality, this is a tremendous undertaking and it doesn’t work perfectly. For instance, the idea of “everything in the collection” should not be taken literally but comparatively. In practice for libraries, this means 20% of any single item. So, if there is a paragraph about the aesthetics movement in a 600 page book, it will not be included in the metadata, but if 120 pages of the book were about the Aesthetics movement, it would be included in the metadata. If the book is 100 pages, only 20 pages needs to be devoted to the Aesthetics movement. Nevertheless, the goal is not to enable people merely to find something in the collection on a specific topic (which is a relatively simple undertaking), but to find all the things in the collection, within the 20% rule and some other guidelines. The two goals are completely different.

The choice of the words themselves for the metadata is far less important for information access than the fact that similar items are brought together. Why? Because long experience has shown that no matter which word someone chooses, it is inevitable that others will come up with terms they believe to be “better”. The term an expert will use will very often be different from the one that a novice will use. Neither one is correct nor is it incorrect.

One of the traditional tasks of information retrieval is to forego the “correctness” or “incorrectness” of a term (so far as this is possible) and concentrate efforts on helping people find the term used for bringing the items together. This is done through a system of Use: and Related: cross-references.

What is the “correct” name of Geneva, Switzerland? It depends on where you come from. It can be: Geneva, Genf, Ginevra, Jih-nei-wa, Ginebra, Cheneba, Geneua, or of course, Genève, along with lots of other possibilities. None of these forms is incorrect and no one should be faulted for searching under any of these forms, but they need help to find the one that has been chosen. Therefore, there are cross-references. e.g.

find Cheneba ==> Use: Geneva.

Additionally, someone may not be aware that Geneva was an independent republic from 1536 to 1798 and there are also items in the collection under another form. At these times, a cross-reference can come in very handy. e.g.

find Geneva ==> See also: Republic of Geneva.

For all of this to work, as metadata creators create new metadata records and they discover a new term for a concept that is already in the collection, if a librarian adds an item that uses the term “Cenevre” for Geneva, they must add a new cross-reference:

find Cenevre ==> Use: Geneva.

In the last twenty years or so, the introduction of computers has allowed users to search databases by separate words in the record, called keyword searching. This can even be done with entire texts. From the beginning of study into such searches, the problems of bypassing the standardized terminology were clear. One question was obvious: how do searchers know whether a record (or text) with certain words in it has anything to do with their topic? This problem was compounded by another dilemma: what is the best way to order the keyword results for the user? Attempts have included: by date of publication, latest date of input into the collection, and by location of the word, among other attempts. [This will be discussed further in the Update]

The problems with keyword searches are many: there are synonyms, e.g. “fossil” is a term for the stratified remains of past life, a company that makes watches, a pejorative term for an old person, and so on. Also, the results of keyword searches are almost always much larger than those of traditional searches and users can find themselves sorting through masses of irrelevant material. There is also not even the possibility that a searcher is retrieving all the information on a topic, even if it is limited to 20% of an item.

Many users also make incorrect assumptions about their searches. They tend to believe that when they make a keyword search, e.g. “World War II”, they are searching for the concept of the Second World War, when they are actually searching for three words scattered in various ways throughout a text. If they would search the standardized terminology, they would indeed be searching concepts, but with keyword searching it cannot be assumed.

Relevance is one of the latest attempts to order the results of keyword searches and has become the most popular method of searching the Internet. The results are ranked by the number of times a term is used, how it is used in a text, how often other articles cite it, or other ways, depending on what a specific database considers to be “relevant”. The results of relevancy searches can be excellent, but all of us have been mystified by some items at the top of the relevancy rankings that have nothing at all to do with what we want, while other sites that are much more relevant show up much farther down the list. [This will be discussed further in the Update]

Relevance ranking also tacitly assumes that authors in the past have used similar terms in similar mathematical correlations and that they continue to follow these same criteria today–something that is highly dubious, at the very least.

Traditional information specialists welcome the increased power from keyword searching, including relevance ranking, and they have joined in the task of discovering new and better methods to improve keyword searching, but it has not eliminated the need for authorized forms and consistent analysis, although those methods are changing. It has turned out that one of the most powerful uses for keyword searching is that it can simplify the user’s task of finding the standardized terms used in the metadata records.

Traditional information specialists look at the present problems of information retrieval from a unique vantage point: how can we bring similar things together in a way that is useful to the user, and how can we make sure that we are retrieving everything from a search, i.e. when we search for the history of World War I, how can we guarantee that we are getting everything on WWI and not just a few random items? If we are getting just a few items, then which ones are we getting? If we can’t answer these questions, what is the goal of information retrieval today? Are the traditional goals of information retrieval even relevant in today’s environment? Is it that people no longer want to find items by their authors, titles, and subjects? Or is it a different problem altogether?

In the “free and open” world of the Internet, there are no accepted standards for metadata content at the moment, and little is being discussed. There are no authorized forms, and there is no way that one person can “correct” the metadata embedded in another’s site (which is a frightening thought!).

The examples given here are very simple and literally scratch the surface of traditional metadata creation. As we have seen, experts traditionally have had the authority to change any metadata they wish, but even this is more complicated than appears at first glance, and can entail lots of work with associated frustrations for the users. Traditional information retrieval may not have all the answers, but it can pose some very good questions. Information retrieval specialists have unique knowledge and experience which should be of tremendous help when we tackle the problem of finding items on the Internet.

Update

When I re-read this document I was surprised that it needs relatively few updates. One example is the antiquated coding. Today, the coding would be changed into RDF triples, XML, microformats or something else. The basic idea remains the same however.

Aside from this, the major changes lie primarily in three areas: Linked Data, Search and the Single Search Box.

Linked Data

With linked data, we are not dealing with that much that is fundamentally different from what we have already discussed. Exactly the same things happen as described before, except it is not that people must assign the same words (text) but the same URIs. For instance, if there is an item about cats, the text can be in many words: cat, gatto, Katz, Кот, मांजर, قط or any of the words seen in the section owl.sameAs of this page http://dbpedia.org/page/Cat. This is the page for the URI, which is one part of the linked data network.

How would this work? In a correctly designed search system where everything is based on URIs, someone would be able to enter a search for “gatto” and in the background, the system would translate this word into the URI http://dbpedia.org/page/Cat then it would search where this URI has been used elsewhere and in this way, can find قط and Katz and cat and so on, wherever the URI http://dbpedia.org/page/Cat has been included. In this way, we see that the traditional method of adding standardized vocabulary still holds, except that it has turned into adding the same URIs.

Also, to compare this to the previous example of different people writing articles on the same topics and disagreeing violently on everything the others wrote, yet they must use the same metadata keywords if others are to find all of their writings, then the same thing happens here except that everyone must use the same URIs. In a similar fashion, if people decide not to use the same URIs, they guarantee that when someone finds one item on a topic, they cannot find the others.

The result is that in a linked data universe there are no longer “headings” in the sense that librarians traditionally think of them. The “heading” for cat becomes http://dbpedia.org/page/Cat while all text forms associated with it become more similar to cross-references. The power of modern systems allows the human display of the “URI heading” to be almost anything: a single bit of text (cats), multiple bits of text (where the words cats, felines, kittens can display at the same time), an image

Cat

a sound (Cat meow), or a video


and the display could even be customized by each person, so that one person might see text while another sees an image.

Nevertheless, linked data could solve some important parts of the problem: once a system of linked data is in place, people would no longer all have to search the same text, but somebody, somewhere would still have to add the same URIs to the items, just as earlier, they had to add the same text to the items. And as we have seen, there are many problems with that.

An additional problem arises however: there are several systems of URIs. Which is someone supposed to use?
http://dbpedia.org/page/Cat
http://id.loc.gov/authorities/subjects/sh85021262
http://aims.fao.org/aos/agrovoc/c_1390.html
http://vocab.getty.edu/aat/300265960
https://www.freebase.com/m/01yrx

Is the solution to link these systems together? Some are trying to do it but that’s not so easy either. What will be the final product for the searcher and how can it all be made coherent to a human? Will any of it really be useful? Nobody knows. Anyway, it is difficult to say how long it will take to build such a system and even once the system is done, it needs to be seen whether and how many people will be willing to implement it. The usefulness of the final product for the end users also needs to be demonstrated.

Linked data has great promise to provide practical results to the majority of people, but a lot of work remains to be done and there are many unanswered questions.

Search

Search is probably the area where the most important changes have taken place since I wrote this article in 1999. Searching/Search has changed almost completely since that time. Larry Page and Sergey Brin had begun their Google project at Stanford only a year before so obviously, much of what I wrote needs reconsideration.

At the end of the 1990s, the only search engines were AltaVista and WebCrawler and others that were similar, and the results obtained from them were inevitably subjects for some very funny jokes. It wasn’t until Google came up with PageRank that something substantively more useful came about, and there have been many developments since that time.

I wrote a podcast about Search, and everything there still holds true. Today, the growth of semantic technologies, which is based primarily on the so-called “big data” about you, where you go, what you look at on the web, what is in your emails or who you talk to on your phone, where those people go, what is in their emails, any social interactions all of you may have, what you have bought over the web and so on, are all changing the very concept of searching and in fact, searching as we have traditionally known it is expected by many to diminish as other methods take over.

This is what Tim Berners-Lee intended with his call for creating “intelligent agents” by building “The Semantic Web.” I discussed this in my podcast and it is now coming true.

For instance, there is something called “intuitive search” where the system, based on your activities that will be monitored in a increasingly detailed ways, will predict what you want even before you know you want it yourself. Many have experienced this already. Perhaps you are in another city; it is time to eat and you find a message on your smart phone with an ad for a nearby restaurant that has gotten high reviews, perhaps by friends of yours, or by their friends.

In the future, when the web turns into the “Internet of Things” when your refrigerator and almost everything else is hooked into the web, you could be coming home from work and get a message from your refrigerator saying that there is no milk, so you should stop and get some. With wearable technology, our very bodies can be monitored constantly, so that we can be told to take a medication that we forgot, or perhaps the bathroom scale decided I weigh too much today, conferred with the bracelet on my wrist that monitors my blood pressure, communicates this to my Google glasses, which recognizes a doughnut in my hand, and I get some sort of message that tells me to get some exercise and put that doughnut down.

Some may find such a future wonderful; others may find it horrifying, but no matter what we think, this is what many very powerful companies are planning for and is the thrust behind the idea of “intuitive search.” It is clear that for intuitive search to work at all, a very powerful system must know an awful lot about you. Some may consider this an invasion of privacy or not, but no matter: it is what many organizations are planning for and why many believe that the idea of “traditional search” such as we see in Google today, will gradually disappear. (For more on this, see “What’s Ahead For Paid Search?Search Engine Land,  and “Google Hummingbird: A Sophisticated, Intuitive Search Tool” from Syracuse University. School of Information Studies)

While most of these developments are focused on business and in the social world, such developments will of course have a major impact on what users expect from libraries. Already, I have discovered that the idea of searching for authors or titles or subjects is being forgotten by many young people and they think only in terms of keywords. Even the notion that searching for information can actually be hard work is difficult for many to grasp when, in other spheres, they can find a new app or find reviews for a nearby restaurant in just a few seconds. When they have trouble finding information for a paper for class, they often think it is a problem not with themselves, but with the systems—especially library systems.

Users who have grown up on Google and Yahoo searches have already changed their expectations and their opinions of libraries, along with their catalogs, from what earlier generations expected. It is difficult to imagine how their expectations and opinions of libraries will change when “intuitive search” really gets going, but they will most definitely change.

My own opinion is that library methods still provide important “added value” found nowhere else and should be retained, but if libraries do not follow these new developments very closely and institute methods to adapt to them, our traditional methods will look more and more obsolete and antiquated.

Single Search Box

The single search box is one of the library answers to the point mentioned above, that the “… idea that searching for information can actually be hard work is difficult for many to grasp…” One answer to make things easier has been to institute the “single search box” that searches many databases and different kinds of tools at once. This can be done in a few ways. One option is to convert records from these other databases–that are based on other rules or no rules at all–into MARC21 (the library’s format) and put them into the local library catalog, so that when people search the library’s catalog, they are searching “everything.”

Or they can institute a type of federated searching. This happens when you implement a system that searches all of the sites simultaneously, brings back the results and puts them all together in a single result. (For a demo, see this search for the term metadata, which is searching library catalogs, Wikipedia, OpenCourseWare and other resources all at the same time. This list can be expanded.  These resources do not follow–and cannot follow–the simple rule to “bring similar items together”  because many of these systems do not follow such a rule) No records need to be downloaded and converted into the local catalog and everything happens on the fly. Those who search such a system experience no real differences from the first option. The final product will obviously have problems with “keeping similar things together” as I have described.

I shall refer to a post I made recently that discusses the problem:

“… I will take a backseat to no one concerning the importance of consistency. It is one of the reasons why I have been against a lot of changes with RDA.

BUT (there is always a “but”), the fact is: we are living in transitional times. At one time–and not that long ago, just 10 or 15 years ago–the library catalog was a world apart. It was a closed system. A “closed system” meant that nothing went into it without cataloger controls, and when the catalog records went out into the wider world, they went into a similar, controlled union catalog, such as OCLC, RLIN, etc.

The unavoidable fact is, that world has almost disappeared already and the cataloging community must accept it. The cataloging goal of making our records into “linked data” means that our records can literally be sliced and diced and will wind up anywhere–not only in union catalogs that follow the same rules, not only in other library catalogs that may follow other rules, but quite literally anywhere. That is what linked data is all about and it has many, many consequences, not least of all for our “consistency”.

Plus there is a push for libraries to create a “single search box” so that users who search the library’s catalogs, databases, full-text silos and who knows what else, can search them all at once. Again, the world takes on a new shape because these other resources have non-cataloger, non-RDA, non-ISBD, non-any-rules-at-all created metadata, or no metadata at all: just full-text searched by algorithms. Those resources are some of the most popular materials libraries have, or have ever had. They are expanding at such an incredible rate that they would sink entire flotillas of catalogers working 24 hours a day. The very idea of “consistency” in this environment begins to lose its meaning.

For example, if a normal catalog department can add, let’s say, 70,000 AACR2/RDA records to their catalog per year, but the IT department is adding hundreds of thousands or even millions of records that follow no perceptible rules at all from the databases the library is paying lots of money for (this is happening in many libraries right now), then in just a few years, the records from non-library sources will clearly overwhelm the records from catalogers. That is a mathematical certainty.

Even without data dumps into the catalog itself, instituting the single search box will result in exactly the same thing from the searcher’s perspective: records of all types will be mashed together, where there will be far more non-library-created records than library-created records.

So the logical question about consistency is: Consistency over what, exactly? It is hard to escape the conclusion that it is consistency over an ever diminishing percentage of what is available to our users.

Yet, I still believe very strongly in consistency, but it must be reconsidered in the world of 21st century linked data and the abolition of separate “data silos”. It is all coming together, and both the cataloging community and the library community seems to want this. I want it too.

The idea and purpose of consistency will change. It must change, or it will disappear. Is it at all realistic to think that these other non-library databases will implement RDA? [That is, the current library rules for metadata] Hahaha! But if a huge percentage of a catalog follows no rules at all, how can we say that consistency is so important? If consistency is to mean something today and in the future, it will have to be reconsidered. What are its costs and benefits?

I consider these to be existential questions for the future of cataloging. I don’t see these issues being discussed in the cataloging community, but I have no doubt whatsoever they are being discussed in the offices of library administrators, whether they use words such as “consistency” or not.”

Conclusions

To sum up how my own ideas have changed: today I am much less certain than I was 15 years ago about the superiority of library methods. I still believe in the library goals of providing resources that have been selected by professionals according to open, professional standards. Afterwards, those resources should be described and organized in professionally standardized ways, while all standards are governed by the good of the users and aim to be as objective as possible. I do not consider those goals obsolete in any way, and they are completely different from goals seen from other projects such as “the semantic web” or “intuitive search.”

Traditional library methods are completely different however. As this article shows, methods used in other types of information processing are developing at an incredible rate and these methods are specifically designed to appeal to the public. These developments simply must have consequences for libraries, but it is difficult to predict exactly what those consequences might be. For instance, searching by last name [comma] first name, which everyone did automatically not that long ago, is for all intents and purposes, gone. As mentioned before, the simple concept of searching by author, by title, or especially by subject, is being forgotten. This doesn’t mean that people would not like these options if they knew about them, understood them, and it was easy to do, but simply to get to that point will be a huge undertaking.

Nevertheless, I think it would be a worthwhile and a noble endeavor.

FacebookTwitterGoogle+PinterestShare

by James Weinheimer at November 10, 2014 09:44 PM

Mod Librarian

Special Edition: Taxonomy Bootcamp

Oh, to be a fly on the wall at Taxonomy Bootcamp…

The next best thing is a link to all the fabulous and thought-provoking presentations. Here you are with some highlights:

  • Taxonomy Fundamentals Workshop with John Horodyski
  • Talking About Taxonomies with Gary Carlson
  • Taxonomy and Search with Mindy Carner
  • Taxonomies in the Arts with David Clarke

Enjoy!

View On WordPress

November 10, 2014 01:03 PM

November 07, 2014

First Thus

ACAT Should libraries use user generated tags in their catalogs?

Posting to Autocat

On 11/7/2014 5:17 AM, Leah Leger wrote:

Hi I am an MLIS student at the University of Denver and for a cataloging class I have to read and come up with a question from the article and get feedback from the field.

My questions are should libraries include user generated tags in their OPAC, should they be from websites like Librarything or created by users, and should they be searchable in a libraries’ OPAC? What are your thoughts?

I have studied and written on this a bit. I used to enjoy searching Amazon for controversial books and see the tags people assigned. Mostly they were insults of various kinds. One of my postings is here, concerning Sarah Palin’s book “Going rogue”: http://blog.jweinheimer.net/2009/11/fw-sarah-palin.html

I mentioned the tags and gave links to various Amazon tags (amazon.uk, .can, etc.), where I only summarized them and then gave links to the pages. Unfortunately, a few years ago, Amazon stopped letting the public see the tags assigned by others, although you can see your own (I guess. I don’t do it).

While the tags do not exist in the current Amazon pages, I discovered that the Internet Archive has arrived to save the day once again! The links need to be changed but you can still see them:
https://web.archive.org/web/20091203235650/http://www.amazon.com/Going-Rogue-American-Sarah-Palin/dp/0061939897 where the page exists, and you can see all the tags (from Oct. 4, 2009) at https://web.archive.org/web/20091004003025/http://www.amazon.com/Going-Rogue-American-Sarah-Palin/dp/tags-on-product/0061939897.

I think these tags explain pretty clearly why Amazon stopped it. Many thanks to the Internet Archive!

There was another perspective on a part of a talk by Clay Shirkey about “Authority in an Age of Open Access” http://blog.jweinheimer.net/2012/11/authority-in-age-of-open-access-analysis.html where he discussed tags used in a web project, which was interesting, and I gave my own analysis as a cataloger. The discussion that took place on NGC4LIB was also noteworthy.

Customers have used amazon reviews in unexpected ways as well. You can see them in http://www.amazon.com/Maisto-Fresh-Metal-Tailwinds-Endurance/dp/B004JFMOGK/ref=cm_rdp_product, a child’s toy of a predator aircraft. The reviews are really something.

It seems to me that when libraries have implemented tags, either nobody uses them (as so often happens) or they turn into variations of spam. There needs to be a certain “critical mass” before they start to take off but even in Worldcat I have seen relatively few of them.

Maybe someone from OCLC could provide some percentages of records with tags, and maybe if they are children’s books or other types.

FacebookTwitterGoogle+PinterestShare

by James Weinheimer at November 07, 2014 02:13 PM