Developing organized information displays for voluminous works: a study of user clustering behavior

https://doi.org/10.1016/S0306-4573(00)00048-0Get rights and content

Abstract

This paper investigates the ways in which people group or categorize documents associated with a voluminous work to guide the construction of organized displays for information retrieval systems (IRSs). Fifty participants completed an unconstrained sorting task in which they were asked to sort into groups 47 documents associated with the voluminous work A Christmas Carol, by Charles Dickens. Participants were asked to group documents based on how similar they were to each other and such that the groups would help them to remember how to find them at a later time. Data collected from the sorting task were summarized using cluster analysis, employed to discover common groupings created by participants. Groupings discovered frequently shared physical format, language, and audience attributes.

Introduction

The purpose of the research presented in this paper is to discover the ways in which people group or categorize documents associated with a voluminous work such as Shakespeare's Hamlet, to guide the construction of organized displays in information retrieval systems (IRSs). The research emphasizes an area of information seeking seldom examined in information retrieval research, namely, queries for known-items. Two types of query are common in IRSs: queries for subjects or topics, and queries for documents already known to the searcher, or known-item queries. Much of the emphasis in research and development of IRSs has been on subject queries. Subject queries pose serious problems to users and to systems designers, so it makes sense that research has focused on them. Known-item queries, on the other hand, are frequently assumed to be unproblematic because they involve formal search terms, i.e., author names and titles. However, queries for known items, like subject queries, present a variety of significant challenges for retrieval and interface design.

One of the challenges that user interfaces for IRSs must meet with respect to known-item searching is how to display large groups of documents, or records representing documents, sharing various types of bibliographic relationships (see Tillet, 1991 for a taxonomy of these relationships). An example of a group of documents sharing such relationships includes: the motion picture Gone With the Wind; the typescript screenplay used to produce it; a book of trivia about the making of the movie; and the original book by Margaret Mitchell, upon which the screenplay and movie are based. This group of documents is representative of a type of document group that will, in this paper, be called a voluminous work or work set.1 A voluminous work is a large group of documents sharing a variety of relationships that evolve out of and are linked to a common originator document. In the case of Gone With the Wind, the common originator document is the original written text by Margaret Mitchell. Thus, the Koran, Stephen Hawking's A Brief History of Time, Charles Dickens' A Christmas Carol, or any of the individual works of Shakespeare would fit the definition of voluminous work. Voluminous works may generate hundreds or thousands of related documents, including editions, translations, works of criticism, and adaptations for other audiences or into other mediums. For example, although published only in 1988, Hawking' A Brief History of Time has already generated over 50 editions, translations, video-recordings, and other related works.

Section snippets

Rationale for the study

Records for documents associated with voluminous works appear frequently in bibliographic databases such as online catalogs. Even relatively small databases may contain large numbers of records representing these works. In the Internet environment, this problem has been identified as the “versions” problem (e.g., Leazer and Smiraglia, 1996; Levy, 1995). Works that embody many bibliographic relationships are an important focus for research. Because they are popular, they are likely to be sought

Review of relevant research

One of the most serious obstacles to the representation and display of information contained in an IRS is information overload. Information overload plagued system design even in the manual environment. Findings from a major study of card catalogs in the 1950s, for example, showed that search failures tended to increase as catalog size increased (Jackson, 1958, p. 19). In this study it was noted that search failures occurred more often in searches for known authors and works than it did in

Research design

The underlying assumption of this research was that if IRSs reflect the ways in which people themselves organize documents, they would be more responsive to users' information needs. The specific goal of the research was to begin to understand how people organize documents that comprise a particular voluminous work set. The study addressed two questions: What are the common groups that are created by people when they categorize documents related to a voluminous work? and, What are the

Cluster analysis results

Hierarchical cluster analysis was used to determine common groupings of documents based on a comparison of the composition of all the groups created by participants in the study. Data collected on which documents appeared together in groups for each participant were compiled, and the cluster analysis calculated the frequency with which any two documents were placed in the same group by all of the study participants. Clusters were formed one step at a time, representing those documents grouped

Discussion

The most frequently appearing attributes discovered in the cluster analysis include: physical format, language, and audience; other attributes that appear are: content age or integrity, physical characteristics, pictorial elements and usage. Dominant attributes, that is, ones used by a majority of study participants in their written descriptions, discovered in the qualitative study, include: physical format, language, content description, audience, pictorial elements, and usage (Carlyle, 1999).

Implications for catalog and other information displays

Library cataloging records already contain indicators representing many of the attributes identified in this study. In fact, card catalogs featured arrangements that grouped cards based on several of the attributes discovered in the study, including language and content age or integrity. If one takes differences in physical format, audience, and usage to indicate significant changes in text, which they often do, then card catalog arrangements also reflected these attributes. Library

Future research

This study represents a first step toward improving displays for voluminous works in IRSs. Future research could begin by investigating a variety of different types of voluminous work. The “typical” composition of a set of documents related to a particular voluminous work is unknown; moreover, it is likely that the composition of work sets varies a good deal from one work to another. Thus, the notion of a “typical” work set may be inappropriate. For example, documents related to the Koran,

Conclusion

Known-item searches, far from being uninteresting and non-problematic, pose a myriad of fascinating challenges to IRS interface designers. In addition, solutions to problems presented by these searches may lead to exciting innovations in the structure and quality of information system displays. Incorporation of alternatives to the long-list retrieval model in our IRSs has the potential to enhance the information environment of users by increasing their ability to identify documents of interest

Acknowledgements

A grant from Kent State University funded this research. An earlier version of this paper won the 2000 OCLC/ALISE Research Paper Award. Research assistants Rebecca Albrecht and Melanie Rapp participated in the data collection. I would also like to acknowledge the following for their invaluable assistance along the way: Julie Gedeon, Rick Rubin, Jan Winchell, and Sue Gong from Kent State University; Raya Fidel, Terry Brooks, Dean Billheimer, Peter F. Cragmile, and David Poole from the University

References (43)

  • N.J. Cooke

    Varieties of knowledge elicitation techniques

    International Journal of Human–Computer Studies

    (1994)
  • D. Hayhoe

    Sorting-based menu categories

    International Journal of Man–Machine Studies

    (1990)
  • S. Lewis

    Cluster analysis as a technique to guide interface design

    International Journal of Man–Machine Studies

    (1991)
  • G.A. Miller

    A psychological method to investigate verbal concepts

    Journal of Mathematical Psychology

    (1969)
  • Aldenderfer, M. S., & Blashfield, R. K. (1984). Cluster analysis. Quantitative applications in the social sciences,...
  • A. Carlyle

    User categorisation of works: toward improved organisation of online catalogue displays

    Journal of Documentation

    (1999)
  • A. Carlyle

    Fulfilling the second objective in the online catalog: schemes for organizing author and work records into usable displays

    Library Resources and Technical Services

    (1997)
  • Carlyle, A. (1997b). The role of classification in the creation of author and work displays in online catalogues. In...
  • A. Carlyle

    Ordering author and work records: an evaluation of collocation in online catalog displays

    Journal of the American Society for Information Science

    (1996)
  • Carlyle, A., & Summerlin, J. (2000). Transforming catalog displays: record clustering for works of fiction. In C....
  • P. Dunn-Rankin

    Scaling methods

    (1983)
  • A. Faiks et al.

    Gaining user insight: a case study illustrating the card sort technique

    College and Research Libraries

    (2000)
  • R. Fidel

    User-centered indexing

    Journal of the American Society for Information Science

    (1994)
  • Hearst, M. A., & Pedersen, J. O. (1996). Reexamining the cluster hypothesis: scatter/gather on retrieval results. In...
  • Jackson, S. L. (1958). In V. Vostecky (Ed.), Catalog use study. Chicago: American Library...
  • Jörgensen, C. (1995). Image attributes: an investigation. Ph.D. Dissertion, Syracuse...
  • Karat, J., Atwood, M. E., Dray, S. M., Rantzer, M., & Wixon, D. R. (1996). User centered design: quality or quackery?...
  • M.T. Kinnucan

    Fisheye views as an aid to subject access in online catalogues

    Canadian Journal of Information Science

    (1992)
  • R.R. Larson

    Classification clustering, probabilistic information retrieval and the online catalog

    Library Quarterly

    (1991)
  • Leazer, G. H., & Smiraglia, R. A. (1996). Toward the bibliographic control of works: derivative bibliographic...
  • Y.K. Leung et al.

    A review and taxonomy of distortion-oriented presentation techniques

    ACM Transactions on Computer–Human Interaction

    (1994)
  • Cited by (20)

    View all citing articles on Scopus
    View full text