SYLLABUS
 
for
LIBSCI 2002 Retrieving Information

Fall 2005

DATE(S)

TOPICS

September 6-11

Introduction and Course Overview: What is information retrieval? Why is information retrieval important? What services are formed on the basis of information retrieval? How is the network environment changing information retrieval?

September 12-18

Data and Information Structures

September 19-25

Basic Concepts and Models of Information Retrieval: incl. Boolean searching, Controlled vocabulary searching; Free text searching

September 26-October 2

Recall, Precision, and Relevance

October 3-9

Selection of Databases and Search Engines; Development of Search Strategies: General Concepts and Heuristics; Bibliographic Databases; Full-text Databases

October 10-16, 17-23

Searching the World Wide Web: Harvesting; Differing Characteristics of Web Search Engines and Sorting Procedures; Experimental Search and Sorting Algorithms; Linking as a Basis for Understanding the Structure of and Searching the Web

October 21-23

On-Campus Weekend

October 24-30, October 31-November 6

Searching DIALOG: Using DIALOGWeb and Classic (command-line) DIALOG services (DIALOG's DIALINDEX for database selection; searcher assistance commands in DIALOG; DIALOG sources for database selection; using DIALOG Bluesheets especially sources of controlled vocabulary, basic index, prefix-coded fields, type formats, limiting, cost information, etc.)

November 7-13, 14-20

Searching Lexis-Nexis: General overview of Lexis-Nexis features especially commands for entering, combining, and truncating search terms, and strategies for searching fulltext database; System commands for NEXIS segment searching; additional Boolean and proximity operators; limiting; printing search results; Freestyle searching in NEXIS

November 21-27

Thanksgiving Recess

November 28-December 4

Searching Citation Indexes; Non-Textual (Image and Audio) IR

December 5-11

Evaluating Search Results; Understanding the Information Retrieval Process from the User's Perspective

BASIC COURSE POLICIES & PROCEDURES

Course Goals

What we call the "information environment" has been growing at an exponential rate since the Industrial Revolution, and as Lyman and Varian have documented in their "How Much Information" series, the production of digital information is accelerating this growth in the information environment at a rapid rate. What we also know based on other sources is that when viewed in terms of its content (or relationships based on content), the information environment is growing increasingly complex, and that its complexities are far greater than any or all of the intellectual technologies that have been devised for the purposes of retrieving relevant information. In LIS 2002 Retrieving Information, the locus of inquiry is conceptual, in the sense that the course will endeavour to examine the key ideas underlying information retrieval as both process and problem area. Practical aspects of information retrieval will be considered, particularly as they apply to the use of commercial database services (such as DIALOG, Lexis-Nexis, etc.) and key Web search services.

Conduct of the Course

The conduct of the course is based on the notion of mastery learning. Which means that the focus of teaching and learning is on establishing a detailed and useful understanding of key concepts and issues, through the readings, weekly discussion topics, assignments, and online meetings.

Assignments

Grades will be determined on the basis of 5 assignments, cumulatively accounting for 50 percent of the final grade, with participation in the weekly exercises accounting for the balance of the final grade. (Each of the assignments must be submitted via the Digital Dropbox. No other method of submission will be accepted.)

For the weeks in which there is an exercise, the results of the exercise will be due on the Sunday of the week at issue. The assignments, including the dates on which each assignment is due, are listed in the ASSIGNMENTS folder, as well as below.

Discussion Groups

Students enrolled in the course will be assigned to a discussion group. The discussion group will serve two main purposes: first, it will be the medium for weekly discussion of current readings and topics; and, second, the discussion group will be the organizational point for weekly sessions with the instructor via Blackboard's Office Hours lightweight chat application and Gizmo, a VOIP service with conference capabilities. (Participation in the weekly group conferences is not obligatory, but it is strongly encouraged.) The schedule for the discussion groups will be announced during the week of September 5-11.

The virtual sessions will make use of the lightweight chat software that is provided as part of CourseWeb under the COMMUNICATION | COLLABORATION tab. (The sessions conducted through the use of lightweight chat software require the presence of a Java Runtime Environment (JRE) on the user's computer. Windows XP and Macintosh OS X provide native support; for students lacking such support, JREs may be downloaded free of charge from Sun Microsystems or Apple Computers.) Also note that these sessions will be recorded and archived, so that they may be consulted later. Students should register with the Gizmo Project and download the appropriate version of the Gizmo client software. Gizmo requires an Internet connection producing a local IP address, e.g., connections acquired via PPP, DSL, cable modem service, or Ethernet, a microphone, audio headset, or USB telephone, and/or speakers. The Gizmo registration process will result in the assignment of a Gizmo user name, and installation of the client software will enable users to acquire a SIP telephone number by dialing **. (See below.) Gizmo user names and SIP telephone numbers should then be posted to the Gizmo folder. The folder is located on CourseWeb under ORGANIZATIONS | DISCUSSION BOARD.

   

Access to Commercial Databases

During the course of the term, students will be called upon to use commercial database services, including DIALOG and Lexis-Nexis. Each student will be assigned a password for DIALOG and Lexis-Nexis. respectively, and will be advised of the conditions under which access to the systems has been provided. Passwords will be distributed during the week of September 11-17.

Grading

LIBSCI 2002 is a core requirement of the MLIS degree program. Satisfying this core requirement requires a letter grade of B or better.

The table to the right illustrates how quality points correlate to letter grades. Students who fail to earn a B or better will be obligated to retake the course.

G-Grades: For this course, a G-Grade will be granted by permission of the instructor only. The G-Grade allows two additional terms to complete course work.

Letter Grade  

Points Required

A+

98-100

A

93-97

A-

90-92

B+

87-89

B

83-86

Academic Integrity

Academic Integrity: Students in this course will be expected to comply with University of Pittsburgh's Policy on Academic Integrity. Any student suspected of violating this obligation for any reason during the semester will be required to participate in the procedural process, initiated at the instructor level, as outlined in the University Guidelines on Academic Integrity. This may include, but is not limited to, the confiscation of the examination of any individual suspected of violating University Policy. Furthermore, no student may bring any unauthorized materials to an exam, including dictionaries and programmable calculators.

Special Student Services

Disabilities: If you have a disability that requires special testing accommodations or other classroom modifications, you need to notify both the instructor and the Disability Resources and Services no later than the 2nd week of the term. You may be asked to provide documentation of your disability to determine the appropriateness of accommodations. To notify Disability Resources and Services, call 648-7890 (Voice or TTD) to schedule an appointment. The Office is located in 216 William Pitt Union.

REQUIRED BOOKS*
for LIBSCI 2002 Retrieving Information
Fall 2005

DIALOG Lab Workbook. DIALOG, 2005.

Google Hacks, by Tara Calishain and Rael Dornfest. 2nd edition. O’Reilly, 2004. (Available via Safari Tech Books Online Database, University Library System.)

Information Retrieval, by C.J. van Rijsbergen. 2nd edition. Butterworths, 1979.  (Individual chapters available at ftp://mingus.exp.sis.pitt.edu/lis2002/libsci2002_books/IR/.)

Information Retrieval Interaction, by Peter Ingwersen. Taylor Graham, 1992. (Individual chapters available at ftp://mingus.exp.sis.pitt.edu/lis2002/libsci2002_books/IRI/.)

LexisNexis Academic User Guide, n.d.; Learning Lexis-Nexis, n.d.

Online Retrieval: a Dialogue of Theory and Practice, by Geraldene Walker and Joseph Janes. 2nd edition. Libraries Unlimited, 1999.

Proceedings of the 1998 Conference on the History and Heritage of Science Information Systems. Edited by Mary Ellen Bowden, Trudi Bellardo Hahn and Robert V. Williams. Published by Information Today for the American Society for Information Science and the Chemical Heritage Foundation, 1999.

Web Search Garage, by Tara Calishain. Prentice-Hall, 2004. (Available via Safari Tech Books Online Database, University Library System.)

* With the exception of Online Retrieval: a Dialogue of Theory and Practice, which must be purchased, the books listed above are available in the PDF format in the BOOKS section of the course.

PRINCIPAL ASSIGNMENTS

Assignment 1
______________

Due September 19

According to Wikipedia, circa 8/20/04:

"Information retrieval (IR) is the art and science of searching for information in documents, searching for documents themselves, searching for metadata which describes documents, or searching within databases, whether relational stand alone databases or hypertext networked databases such as the Internet or intranets, for text, sound, images or data. There is a common confusion, however, between data, document, information, and text retrieval, and each of these have their own bodies of literature, theory, praxis and technologies.

IR is a broad interdisciplinary field, that draws on many other disciplines. Indeed, because it is so broad, it is normally poorly understood, being approached typically from only one perspective or another. It stands at the junction of many established fields, and draws upon cognitive psychology, information architecture, information design, human information behaviour, linguistics, semiotics, information science, computer science and librarianship.

Automated information retrieval (IR) systems were originally used to manage information explosion in scientific literature in the last few decades. Many universities and public libraries use IR systems to provide access to books, journals, and other documents. IR systems are often related to object and query. Queries are formal statements of information needs that are put to an IR system by the user. An object is an entity which keeps or stores information in a database. User queries are matched to documents stored in a database. A document is, therefore, a data object. Often the documents themselves are not kept or stored directly in the IR system, but are instead represented in the system by document surrogates."

What are the key problems of information retrieval from both theoretical and practical perspectives? Your response should be specific, relatively brief, and documented.

Assignment 2
____________

Due October 17

Relevance is a fundamental, though not completely understood, concept for documentation, information science, and information retrieval. During the 1960s, a movement emerged that identified relevance as an evaluative tool for resolving problems associated with measuring the effectiveness of automated information systems. Various definitions of relevance set the tone for ongoing research in the field of information science, among the most influential being the notions that relevance is:

  • a measure of information conveyed by a document relative to a query; and/or
  • the criterion used to quantify the phenomenon involved when individuals (users) judge the relationship, utility, importance, degree of match, fit, proximity, appropriateness, closeness, pertinence, value or bearing of documents or document representations to an information requirement, need, question, statement, description of research, treatment, etc.

How has each of the notions cited above influenced subsequent research concerned with nature of the concept of relevance and its applications? To what extent has the understanding relevance -- from both conceptual and practical perspectives -- changed since the 1960s?

Please prepare a paper of 1000 words in response to the questions posed above. (Your task in preparing this paper should necessarily entail identifying key ideas, key authors, and key papers and providing appropriate analyses in your narrative.)

Assignment 3
_____________

Due October 31

Undiscovered public knowledge addresses bodies of information that are similar but distinct or not normally connected, i.e., not connected bibliographically. (The search for such knowledge is a not-uncommon motivation for the use of information retrieval systems, particularly as certain forms of research become more multi-disciplinary in orientation; yet, there is controversy within the IR community about the extent and relative importance of undiscovered public knowledge.) Swanson and Smalheiser call undiscovered public knowledge a set of complementary literatures that can reveal useful information of scientific interest that would not be apparent in either set individually. Swanson noted that knowledge may be public but undiscovered if the independent segments are never combined for evaluation. Davies described five forms of undiscovered public knowledge: (1) hidden refutation/qualification of a hypothesis; (2) undrawn conclusions from two or more premises; (3) the cumulative evidence of weak, independent experiments; (4) solutions to analogous problems; and, (5) hidden correlations. Later Davies added a sixth category, novel classifications, a morphological analysis of complex systems with a large number of variables. He described methods for this type of knowledge discovery including Swanson's trial-and-error strategy, citation path analysis, and hypertext browsing.

In a report of 500-750 words, your task is to assess the credibility and applicability of these ideas, primarily through an examination of three works:

Jackson, Larry S. Supercomputing Detection of Swanson's Relationship Between Raynaud's Disease and Dietary Fish Oil. GSLIS technical report number ISRN UIUCLIS--2002/2+UPK, University of Illinois, 2002;

Spasser, Mark. The Enacted Fate of Undiscovered Public Knowledge. Journal of the American Society for Information Science 48 (1997): 707-717; and

Swanson, Don R., Neil R. Smalheiser, and A. Bookstein. Information Discovery from Complementary Literatures: Categorizing Viruses as Potential Weapons. Journal of the American Society for Information Science And Technology 52 (2001): 797-812.

Assignment 4
__________________

Due November 21

Part 1: Using DIALOG and the World Wide Web as your primary sources, assess the current state of professional and scholarly publishing, particularly as it applies to periodicals. While the assessment should cover all relevant aspects and issues, substantial attention should be focused on the economics of publishing for professional and scholarly audiences. The resulting report should be at least 1000 words in length.

Part 2: Using DIALOG as your primary source, prepare an assessment of HIV-AIDS as an area of basic biomedical research, clinical medicine, public health, and socioeconomic effect, respectively. Please identify the key authors, key papers, and key journals in each area, and specify the basis for your determinations. Also identify the major sources of funding for research, education, and clincal care. The resulting report should be at least 1500 words in length.

Assignment 5
______________

Due December 12

Using Lexis-Nexis as the principle source (but incorporating information from other sources as necessary), please prepare detailed analyses on one of the following topics. In each instance, you must explain step-by-step how you prepared the answer, and the rationale for each step in your effort to locate relevant data. In addition, you should document any "wasted" steps or unsuccessful strategies. Be sure to supply full bibliographic citation, including the Lexis-Nexis files and accession information (See Chicago Manual of Style or APA for citation style).

In this instance, "analysis" means to compare and contrast information from different sources and provide a synthesis. (Do not quote extensively from your sources to meet the word count.) Each analysis should be 500-750 words in length, excluding bibliographic citations.

The topics are:

  • International trade in microprocessors from 1996-2005
  • Profit, loss, and growth in the U.S. publishing industry since 1998
  • Current status of telecommunications legislation before the U.S. Congress during 2004-2005
  • Current status and short-term prospects of satellite (or DAB) radio
  • Identify and assess the effectiveness of current, legally-mandated efforts to curtail aerosol pollution in the U.S.
  • U.S. diplomatic initiatives involving France during the last twelve months
  • Defense spending among NATO members since September 11, 2001
  • Comparative financial requirements of obtaining and operating a fast-food franchise
  • Current priorities in biomedical research in the U.S. and the United Kingdom
  • Impact of Linux on the market for computer software in the U.S., Canada, and Europe
  • Monetary policies of the European Union since 2000
  • Consumer spending in the U.S. during the most recent Christmas season
  • Levels of employment, unemployment, and wages in the U.S., Canada, and the United Kingdom during 2003
  • SUV (sport utility vehicle) safety
  • Impact of the DVD format on the sale of pre-recorded videos in the U.S. since 2000

WEEKLY ASSIGNMENTS & EXERCISES

A reminder: Be specific, and always cite your sources.

September 12-18

In many libraries, the number of "traditional" reference transactions is declining . Yet, the forms of reference service offered specifically for users in networked environments -- typically email and/or chat -- do not generate significant volumes of traffic at most institutions. Why?

September 19-25

Evaluate the University Library System's WebFeat-based service, Zoom, and, in the process, address the more general advantages and disadvantages of federated searching.

September 26-October 2

How are precision and recall related? Under what operational circumstances is recall more important than precision? Precision more important than recall? Is the relationship between precision and recall influenced by the size and nature of the databases that are queried?

October 3-9

How do the Open Directory Project and Yahoo compare in terms of functions, features, and editorial policies?

October 10-16

How may RSS feeds be used to assist searchers in performing more effectively and efficiently? How about social bookmarking tools, such as CiteULike, Connotea, Furl, etc.?

October 17-23

Two questions: First, how do Web search services such as AltaVista, Google, Vivismo, etc., harvest the data that forms the basis for their respective services? What are the principal limitations of these harvesting methods? Second, how do Web-based metasearch engines integrate results sets? Of the methods employed, which one is most reliable? Least reliable?

October 24-30

Complete Sections 1-2 from the DIALOG Lab Workbook

October 31-November 6

Complete Sections 3-4 from the DIALOG Lab Workbook

November 7-13

Complete the Lexis-Nexis 7.2 Tutorial for Business Research (MIT, 2002)

November 14-20

What are the advantages and disadvantages of using citations as a basis for the formulation of search strategies?

November 21-27

No Assignment (Thanksgiving Recess)

November 28-December 4

Two questions: First, what are the outward characteristics of a successful search? Of a cost-effective search? Second, it is generally argued, but particularly by librarians, that the quality of reference and online information services provided by libraries justifies the costs of mounting and maintaining such services. (The conventional wisdom is that accurate, timely, and relevant information is of significant benefit, saving business, government agencies, and other types of organizations time and money through increased efficiency, improved productivity, and rapid deployment of innovations, as well as supporting teaching and learning.) What evidence supports this position? How would you rate this evidence in terms of its objectivity and persuasiveness?

WEEKLY REQUIRED READING ASSIGNMENTS

Week

Reading(s)

September 6-11

SciInfoSys, pp. 3-136; What Do People Want from Information Retrieval, by W. Bruce Croft. D-Lib Magazine, November 1995; The Seven Ages of Information Retrieval, by Michael Lesk. Bellcore, 1995; How Much Information? (2003), by Peter Lyman and Hal Varian. University of Calfornia, 2003.

September 12-18

SciInfoSys, pp. 156-192, 223-25o; Data structures and Number Systems, by Brian Brown; Information as Thing, by Michael Buckland. JASIS 42 (June 1991): 351-360. IR, Preface, Chapters 1-2.

September 19-25

IR, Chapters 3-4; IRI, Chapters 1-3; OR, Chapters 1,2,4-5; Papers by Blair in Basic Concepts and Models of Information Retrieval

September 26-October 2

IR, Chapters 5-6; IRI, Chapters 4-6; All items in Relevance, Precision, and Recall Folder

October 3-9

IRI, Chapters 7-8 ; Papers by Spink, Wilson, et al., in folder entitled Understanding the Information Retrieval Process from the User's Perspective

October 10-16

GH, Chapters 1-2, 8; OR, Chapters 6-8; All Items in Set 1 of Materials in Searching the World Wide Web Folder

October 17-23

OR, Chapters 9-11; All Items in Sets 2 and 3 of Materials in Searching the World Wide Web Folder; WSG,  Chapters 1-13

October 24-30

DIALOG Lab Workbook , Chapters 1-5; DialogWeb Command Search Tutorial

October 31-November 6

DIALOG Lab Workbook, Chapters 6-10; DialogWeb Guided Search Tutorial ; Review Successful Searching on DIALOG

November 7-13

DIALOG Advanced Searching Techniques: Part 3; Advanced; DIALOG Advanced Searching Techniques, Part 4: Power Searching Techniques (or from DIALOG Resources folder under BOOKS):

November 14-20

Lexis-Nexis Academic User Guide ; Lexis-Nexis Health Care Research Guide; Lexis-Nexis Corporate Research Guide; Lexis-Nexis International Research Guide; Lexis-Nexis Tutorial for Business Research

November 21-27

No Assignment

November 28-December 4

IR, Chapters 7-8;; OR, Chapters 12-14; Introduction to Citation Indexing
and Citation Indexes
in Searching Citation Indexes Folder ; WSG, Chapters 17-21.

December 5-11

All Items in Evaluating Search Results Folder

N.b., Proceedings of the 1998 Conference on the History and Heritage of Science Information Systems is cited above as SciInfoSys; Information Retreival, by C.J. van Rijsbergen, is cited as IR; Information Retrieval Interaction, by Peter Ingwersen, is cited as IRI; Online Retrieval: a Dialogue of Theory and Practice, by Geraldene Walker and Joseph Janes, is cited as OR; Web Search Garage is referred to as WSG; and Google Hacks is cited as GH.