by Arlene G. Taylor
Presentation for "Authority Control: Why It
College of the Holy Cross, Worcester, MA
November 1, 1999
Twenty years ago at the 1979 LITA Institute on Authority Control, which I think was the first U.S. library conference on authority control, Mary Madden speculated on the reasons that vendors were not providing authority control: first, the high cost involved in setting up an authority control system, and second, the fact that librarians seemed only remotely interested in the matter. (Madden, 1982) I remember thinking this very strange, at the time, because my first job out of library school in the mid-1960s had been as a descriptive cataloger at the Library of Congress, and at LC authority control was central to the cataloging process. There, the first thing one did after jotting down some description of the book (we were only doing books in those days!) was to note all names associated with the work and head for the official catalog to see if there was an authority card for each of the names. If there wasn't, we created one. Upon leaving LC and learning that other libraries didn't have authority cards, I was quite frustrated. It seemed such an obviously needed tool. But then most libraries were using LC cataloging as much as possible; so unbeknownst to many of them, they essentially had authority control!
Three years later, in 1982, when I was doing research for a paper to be presented at Regional Institutes on Authorities (sponsored by ALA's then RTSD, now ALCTS), I asked vendors what they offered in the way of authority control. They didn't know what I was talking about. But by the summer of 1983 representatives of most of the same vendors did have some plans for incorporating authority control. So what do we have to show for our sixteen years of automated authority control? The answer depends very much on which library one is in and which vendor one is using. In 1985 I wrote with Margaret Maxwell and Carolyn Frost that bibliographic files and authority files could be non-integrated, partially integrated, or wholly integrated. (Taylor, Maxwell, Frost, 1985) The only thing that has changed is the proportions. There are still non-integrated, partially integrated, and wholly integrated systems, but there are more totally integrated systems now than there were then (total integration meaning that when a search is conducted, the references in the authority file are activated along with bibliographic records in response and/or when one does a name or subject search, one is dropped into the respective authority file before being shown titles of works). We concluded in that 1985 article:
It should be remembered, however, that regardless of how integrated the system, a human is ultimately responsible for deciding whether a particular entry in an existing authority record represents the same person, place, corporate body, or title as the one to be represented on the bibliographic record being entered. We have not yet automated the ability to think! (Taylor, Maxwell, Frost, 1985, p. 202)
This has not changed, although shortly I will discuss ways in which computers do outshine humans when it comes to doing tedious things and "remembering" details.
I spent six weeks of September and October 1999 at the Bodleian Library at Oxford University. While there, I asked for a demonstration of their integrated authority control. I was quite impressed with the system they have negotiated with Geac (thanks largely to Wilma Minty, Head of Cataloging Support Services). As cataloging proceeds, each name, uniform title and subject heading can be checked against the authority file simply by clicking in the field. If there is a name match, but the cataloger is not certain it's the same person, the bibliographic records to which the name has been assigned are called up with a click, not a search of a separate file. If there is a match with a different form of name, the authority form can be automatically copied into the bibliographic record. If the name is not there, a minimal authority record is created by the system automatically, using information from the bibliographic record at hand. It seems to me to be much more efficient, not to mention effective, for a cataloger to be able to do this kind of authority checking with item in hand, than for names and subjects to be checked against authority files later when the cataloged item is long gone. This is the kind of authority control that we hoped for in the 1980s, but only were able to have in a few places (e.g., WLN libraries).
Ten years ago, in 1989, I wrote about the research that had been done on authority control in the preceding seven years. A major concern that ran through that research was the amount of disk space (and the resultant slowness of response time) that was required for integrated authority control. Would you believe that I wrote that paper on an old CPM machine that had 64 kilobytes of memory? It wasn't until 1991 that I had a machine with 40 megabytes of memory, while my current machine has 12 gigabytes. Happily, we no longer have to be constrained by storage space.
Much of the research of the 1980s was aimed at discovering just how much authority control was needed. Was it really necessary to have unique access points for names? What kind of references were needed? Could works be brought under authority control in some way? Could authority control for subject headings be automated? Were local authority files necessary, or could national ones suffice? At the core of most of these questions were the limits of the technology of the day. (Taylor, 1989)
In the 1990s with the resolution of the technology problems in favor of virtually unlimited space, research has turned to other questions. Two studies looked at the effect of authority control or lack of it on retrieval and found that retrieval is greater with the addition of authority records to a catalog. (Bangalore, 1995; Wilkes and Nelson, 1995) Personal name authority control research seems to have run its course but had one last report in 1992, when Borgman and Siegfried reviewed personal name matching algorithms and programs for phonetic and pattern-matching techniques. (Borgman and Siegfried, 1992) However, corporate name research got its first major attention in the OCLC Office of Research. (O'Neill, 1999a) What was learned was that while minor variations can be very important in distinguishing personal names, such minor variations are not so important in corporate names. Therefore, elaborate matching algorithms can be developed that can automatically correct a corporate name in which a human has made a mistake upon input. The computer can flip inverted forms, put in or take out articles, and correct typos. Such a system is incorporated into CORC. Here is where computers outshine humans. A computer doesn't mind looking for all the possible ways a corporate name can be arranged and it doesn't forget to check every single possibility.
Three researchers have been addressing the authority control of works in the 1990s. (Vellucci, 1990; Vellucci, 1997; Smiraglia and Leazer, 1999) Four studies have looked at administrative issues with an eye to the cost of doing the authority work, and found that it is indeed expensive, and that human intervention is often required. (Calhoun and Oskins, 1992; Pappas, 1996; Chan and Vizine-Goetz, 1997; Greever, 1997) Younger, motivated by a need to cut the cost of authority control, suggested that "utility" be considered the standard for doing authority work. She defined utility as being the need for authority control only over names of people of accomplishment who are themselves the subjects of enquiry. She suggested that if someone is looking for a work on "aerodynamics" by Smith, they will find it whether the name is Mike Smith or Michael Smith. (Younger, 1995) In a sense this is what is happening in the authority control system I described that is in use at the Bodleian Library, where names not already in the authority file have a skeleton record made for them without necessarily having any authority work done. It is also happening with the addition of coded contents notes to records, giving access to names in contents notes, but not putting them under authority control. I will come back to this idea later.
Several studies have dealt with subject authority control. (Drabenstott, 1991; Taylor, 1995; Wilkes and Nelson, 1995; Chan and Vizine-Goetz, 1997; Miller 1997; CannCasciato, 1999; Drabenstott, Simcox, and Fenton, 1999) This is interesting since a number of folks in the 1980s seemed to be heading toward the idea that subjects were passe, and keyword searching would be just fine, thank you. Drabenstott and Chan and Vizine-Goetz looked at the content of subject headings and subdivisions. Wilkes and Nelson looked at subject searching in online catalogs, and Drabenstott, Simcox, and Fenton looked at the general lack of understanding of subject strings by end-users. Research by Miller and Taylor dealt with aspects of form/genre access, which I'm including as subject authority control, because it is so closely coupled with it. CannCasciato looked at the need to go back to find already cataloged items that are on a subject that has been newly established, noting that this is done at LC, and MARC records are redistributed, but libraries having the record already in their catalogs do not benefit from the redistributed records. (CannCasciato, 1999)
Also in the 1990s, though, the technological revolution that made authority control really viable (plenty of storage space) also brought the development of the World Wide Web, browsers, and search engines. These have attempted (not too successfully, as it turns out) to bypass the concept of authority control completely.
So that's where it's been. Now, where is it going? In 1989 I suggested that the file design suggested by Gorman in the late 1970s would be the next step. In this design, physical items would be represented in unique records, unencumbered by access points, but linked to each authority record that represent one of the persons, corporate bodies, works, and subjects that are associated with that physical item. I thought that actually might happen in the 1990s. (Taylor, 1989, p. 51) So much for my powers of forecasting!
I'm going to play it safer this time, though, and talk about some directions that are already in the works, but still in early stages. First, I believe we are moving in the direction of international access control. The idea behind access control is that an entity can be known by more than one name. An individual is an entity but may be called different names by different people or at different times in life. A subject concept usually remains the same concept even when the way of expressing it changes over time. In the international realm who is to say which name in which language is the correct one for names of famous non-living persons who have name representations in many languages and scripts (e.g., Aristotle, Confucius)?
Barbara Tillett has written extensively and has worked internationally toward the goal of international access control. In a December 1998 article, she identified several of LC's cooperative programs, such as NACO and SACO, that have expanded internationally. (Tillett, 1998) She reminds us that the concept of international authority control really began with the 1961 Paris Principles. That is, we have been trying to figure this out for almost 40 years. The difficulty with the current UBC (Universal Bibliographic Control) principles is that they call for each country to forego its own national conventions and users' needs to accept the form established by each nation for its national authors. While this is a noble goal, our users are not inclined to be so noble in their searching. With international access control, we could all use the same authority records, but we could choose default forms to display to our users. The non-default forms would act as references. Each form used could be coded to indicate which country had created or added which form.
One recommendation from an IFLA group is that the record id number from one country's authority record be added to the authority record of another country to link the authority records for the same entity. Z39.50 protocols could be used to search and display these linked national authority file records. Use of UNIMARC authority records is also being explored. International access control is catching on now because of the internet with its ease of worldwide access to the shared wealth of online shared resource files. (See also Danskin, 1998)
A second place I believe authority control may be going has to do with subject access to internet materials. A recent ALA Subject Analysis Subcommittee has looked at what is needed for subject access in metadata for internet resources. Among the Subcommittee's recommendations are that a subject system be simple and easy to apply and comprehend, that it be intuitive, that it be scalable from simple to sophisticated, that it be logical, and that it be appropriate to the subject discipline involved. The Subcommittee considers a mixture of keywords and controlled vocabulary to be the most viable approach. In response, Ed O'Neill in the OCLC Research Office has been working on development of FAST (Faceted Application of Subject Terminology). (O'Neill, 1999b)
When OCLC's WorldCat records were compared with LCSH authority records, it was found that only three percent of total subject strings were established in the authority file. Another 26 percent were used by LC, although not established because of free-floating subdivisions, geographic subdivisions, etc. The remaining 71 percent were not used by LC and were not established. These figures do not bode well for authority control of subject headings.
FAST is based on LCSH. It is intended to be a post-coordinated faceted vocabulary. It is designed to be used in online environments by people with minimal training and experience. This is still a research project, which if eventually found viable, will be implemented in the CORC environment. There are eight facets: Topical, Geographic, Form, Period, Personal names, Corporate names, Conference/Meeting names, and Uniform titles. Instead of creating strings of headings with subdivisions, each facet will stand alone. There is a theoretical precision loss as shown in the following example:
LCSH: Gold mines and mining $z California Silver mines and mining $z Colorado FAST: Gold mines and mining Silver mines and mining California Colorado
O'Neill predicts that it can be 100 percent under authority control, because each facet will be matched by an established authority record. As much as I like the idea of having LCSH strings available for display so that users have the serendipity of seeing a combination they didn't know existed, I must admit that 100 percent authority control is very appealing!
Some rules that are being tried:
LCSH and FAST can co-exist. When a heading from LCSH is entered into the system, FAST headings will be automatically generated and both sets of headings will be retained in the record.
A fascinating bit of research Ed told me about was in development of an algorithm to retrospectively convert $x to $v for form. They used a sample of headings that included 50,000 explicitly coded $v subdivisions. In the OCLC Office of Research, they first changed all v's to x's. Then they ran their algorithm to change form subdivisions to $v. All but 350 matched the original coding. They then went over the 350 non-matching subject strings with folks at LC. It turned out that in those 350 the algorithm was right 83 percent of the time, and LC's coding was right 17 percent. That is, for 83 percent of those 350 headings, LC's catalogers had erred in coding in the first place -- another place where computers can outshine humans!
The third place I want authority control to go, but don't know if it will, is one of my soapbox issues. And since I have a captive audience... It's the "rule of 3." I honestly don't know how we can say we have authority control over names in our catalogs, thus implying that everything by a person will be found under the authorized form of the person's name, when we have a rule that doesn't even allow us to enter some peoples' names in a bibliographic record, let alone construct their names in the authority-controlled form. In this age of increasing numbers of works by multiple authors, I do hope that rule gets changed soon.
Finally, we should talk about what's going to happen with authority control of names on the Internet. While Jennifer Younger's idea of "utility" as the basis for providing authority control over names has not exactly caught on in library catalogs, this might be a way to get a handle on names on the WWW. It appears that most Internet searching involves subjects rather than names. As we begin to discover the names that users do want to find on the web, we can begin to provide authority control for those names. There is quite a lot of a kind of authority control for names of authors out there now, not necessarily created by librarians. I am talking about websites for a particular person, such as Margaret Atwood or Louisa May Alcott, where the creators of the sites have scoured the net for every possible reference and have provided links. If we were to provide a bibliographic record in our catalogs to each such site, we would have effective authority control for those names on the web based on the idea of "utility."
So where has authority control been and where is it going? Well, it has been slowly catching on for some forty years, with major fast-breaking activity happening in the last decade. It has gone through periods of being pooh-poohed by people who did not understand it and who thought that computers would save us from it forever. But such a good idea as authority control doesn't stay down long. Folks have finally realized that computers have brought a kind of chaos that can only be dealt with through metadata and controlled vocabulary. We could have told them that!!
Bangalore, Nirmala S. 1995. "Authority Files in Online Catalogs Revisited." Cataloging & Classification Quarterly 20, no. 3: 75-94.
Bangalore, Nirmala S., and Chandra G. Prabha. 1998. "Authority Work in Copy (Derived) Cataloging: A Case Study." Technical Services Quarterly 15, no. 4: 39-56.
Borgman, Christine L. and Susan L. Siegfried. 1992. "Getty's Synoname and Its Cousins: A Survey of Applications of Personal Name-Matching Algorithms." Journal of the American Society for Information Science 43, no. 7, (August): 459-476.
Calhoun, Karen, and Mike Oskins. 1992. "Rates and Types of Changes to LC Authority Files." Information Technology and Libraries 11, no. 2 (June): 132-136.
CannCasciato, Daniel. 1999. "Retrospective Application of Subject Headings, Part 1." Library Philosophy and Practice 2, no. 1 (Fall). Available: http://www.uidaho.edu/~Embolin/cann-c1.htm
Chan, Lois Mai, and Diane Vizine-Goetz. 1997. "Errors and Obsolete Elements in Assigned Library of Congress Subject Headings: Implications for Subject Cataloging and Subject Authority Control." Library Resources & Technical Services 41, no. 4 (October): 295-322.
Danskin, Alan. 1998. "International Initiatives in Authority Control." Library Review 47, no. 4: 200-205.
Drabenstott, Karen Markey. 1991. "Determining the Content of Machine-Readable Subdivision Records." Annual Review of OCLC Research (July 1990-June 1991): 40-43.
Drabenstott, Karen M., Schelle Simcox, and Eileen G. Fenton. 1999. "End-User Understanding of Subject Headings in Library Catalogs." Library Resources & Technical Services 43, no. 3 (July): 140-160.
Greever, Karen E. 1997. "A Comparison of Pre- and Post-Cataloging Authority Control." Library Resources and Technical Services 41, no. 1 (January): 39-49.
Madden, Mary A. 1982. "Is This Somehow Connected? The Vendor Perspective", in Authority Control: The Key to Tomorrow's Catalog, ed. Mary W. Ghikas (Phoenix, Ariz.: Oryx Press), p. 85-94.
Miller, David. 1997. "Identical in Appearance but not in Actuality: Headings Shared by a Subject-Access and a Form/Genre Access Authority List." Library Resources & Technical Services 41, no. 3 (July): 190-204.
O'Neill, Ed. 1999a. "Authority Control for the Internet." OCLC Newsletter (May/June): 34-35.
O'Neill, Ed. 1999b. Telephone communication with author, October 22.
Pappas, Evan. 1996. "An Analysis of Eight RLIN-Members' Authority-Controlled Access Points for Purposes of Speeding Copy Cataloging Work Flow." Cataloging & Classification Quarterly 22, no. 1: 29-47.
Prabha, Chandra G. 1991. "Authority Control Practice in Libraries." Annual Review of OCLC Research (July 1990-June 1991): 3-5.
Smiraglia, Richard P., and Gregory H. Leazer. 1999. "Derivative Bibliographic Relationships: The Work Relationship in a Global Bibliographic Database." Journal of the American Society for Information Science 50, no. 6 (May): 493-504.
Taylor, Arlene G. 1995. "How Many Subdivisions Represent the Form of an Item? Results of a Research Study." Available: http://www.pitt.edu/~agtaylor/ala/subfldv.htm
Taylor, Arlene G. 1989. "Research and Theoretical Considerations in Authority Control." Cataloging & Classification Quarterly 9, no. 3: 29-56.
Taylor, Arlene G., Margaret F. Maxwell, and Carolyn O. Frost. 1985. "Network and Vendor Authority Systems." Library Resources & Technical Services 29, no. 2 (April/June 1985): 195-205.
Tillett, Barbara B. 1998. "International Shared Resource Records for Controlled Access." ALCTS Newsletter Online 10, no. 1 (December). Available: http://www.ala.org/alcts/alcts_news/v10n1/gateway.html
Vellucci, Sherry L. 1990. "Uniform Titles as Linking Devices." Cataloging & Classification Quarterly 12, no. 1: 35-62.
Vellucci, Sherry L. 1997. "Bibliographic Relationships." Paper written for the International Conference on the Principles and Future Development of AACR, Toronto, Canada, October 23-25, 1997. Available: http://www.nlc-bnc.ca/jsc/confpap.htm
Younger, Jennifer A. 1995. "After Cutter: Authority Control in the Twenty-first Century." Library resources & Technical Services 39, no. 2 (April): 133-141.
Wilkes, Adeline, and Antoinette Nelson. 1995. "Subject Searching in Two Online Catalogs: Authority Control vs. Non-Authority Control." Cataloging & Classification Quarterly 20, no. 4: 57-79.