Developing organized information displays for voluminous works: a study of user clustering behavior

doi:10.1016/S0306-4573(00)00048-0

Information Processing & Management

Volume 37, Issue 5, September 2001, Pages 677-699

https://doi.org/10.1016/S0306-4573(00)00048-0 Get rights and content

Abstract

This paper investigates the ways in which people group or categorize documents associated with a voluminous work to guide the construction of organized displays for information retrieval systems (IRSs). Fifty participants completed an unconstrained sorting task in which they were asked to sort into groups 47 documents associated with the voluminous work A Christmas Carol, by Charles Dickens. Participants were asked to group documents based on how similar they were to each other and such that the groups would help them to remember how to find them at a later time. Data collected from the sorting task were summarized using cluster analysis, employed to discover common groupings created by participants. Groupings discovered frequently shared physical format, language, and audience attributes.

Introduction

The purpose of the research presented in this paper is to discover the ways in which people group or categorize documents associated with a voluminous work such as Shakespeare's Hamlet, to guide the construction of organized displays in information retrieval systems (IRSs). The research emphasizes an area of information seeking seldom examined in information retrieval research, namely, queries for known-items. Two types of query are common in IRSs: queries for subjects or topics, and queries for documents already known to the searcher, or known-item queries. Much of the emphasis in research and development of IRSs has been on subject queries. Subject queries pose serious problems to users and to systems designers, so it makes sense that research has focused on them. Known-item queries, on the other hand, are frequently assumed to be unproblematic because they involve formal search terms, i.e., author names and titles. However, queries for known items, like subject queries, present a variety of significant challenges for retrieval and interface design.

One of the challenges that user interfaces for IRSs must meet with respect to known-item searching is how to display large groups of documents, or records representing documents, sharing various types of bibliographic relationships (see Tillet, 1991 for a taxonomy of these relationships). An example of a group of documents sharing such relationships includes: the motion picture Gone With the Wind; the typescript screenplay used to produce it; a book of trivia about the making of the movie; and the original book by Margaret Mitchell, upon which the screenplay and movie are based. This group of documents is representative of a type of document group that will, in this paper, be called a voluminous work or work set.¹ A voluminous work is a large group of documents sharing a variety of relationships that evolve out of and are linked to a common originator document. In the case of Gone With the Wind, the common originator document is the original written text by Margaret Mitchell. Thus, the Koran, Stephen Hawking's A Brief History of Time, Charles Dickens' A Christmas Carol, or any of the individual works of Shakespeare would fit the definition of voluminous work. Voluminous works may generate hundreds or thousands of related documents, including editions, translations, works of criticism, and adaptations for other audiences or into other mediums. For example, although published only in 1988, Hawking' A Brief History of Time has already generated over 50 editions, translations, video-recordings, and other related works.

Section snippets

Rationale for the study

Records for documents associated with voluminous works appear frequently in bibliographic databases such as online catalogs. Even relatively small databases may contain large numbers of records representing these works. In the Internet environment, this problem has been identified as the “versions” problem (e.g., Leazer and Smiraglia, 1996; Levy, 1995). Works that embody many bibliographic relationships are an important focus for research. Because they are popular, they are likely to be sought

Review of relevant research

One of the most serious obstacles to the representation and display of information contained in an IRS is information overload. Information overload plagued system design even in the manual environment. Findings from a major study of card catalogs in the 1950s, for example, showed that search failures tended to increase as catalog size increased (Jackson, 1958, p. 19). In this study it was noted that search failures occurred more often in searches for known authors and works than it did in

Research design

The underlying assumption of this research was that if IRSs reflect the ways in which people themselves organize documents, they would be more responsive to users' information needs. The specific goal of the research was to begin to understand how people organize documents that comprise a particular voluminous work set. The study addressed two questions: What are the common groups that are created by people when they categorize documents related to a voluminous work? and, What are the

Cluster analysis results

Hierarchical cluster analysis was used to determine common groupings of documents based on a comparison of the composition of all the groups created by participants in the study. Data collected on which documents appeared together in groups for each participant were compiled, and the cluster analysis calculated the frequency with which any two documents were placed in the same group by all of the study participants. Clusters were formed one step at a time, representing those documents grouped

Discussion

The most frequently appearing attributes discovered in the cluster analysis include: physical format, language, and audience; other attributes that appear are: content age or integrity, physical characteristics, pictorial elements and usage. Dominant attributes, that is, ones used by a majority of study participants in their written descriptions, discovered in the qualitative study, include: physical format, language, content description, audience, pictorial elements, and usage (Carlyle, 1999).

Implications for catalog and other information displays

Library cataloging records already contain indicators representing many of the attributes identified in this study. In fact, card catalogs featured arrangements that grouped cards based on several of the attributes discovered in the study, including language and content age or integrity. If one takes differences in physical format, audience, and usage to indicate significant changes in text, which they often do, then card catalog arrangements also reflected these attributes. Library

Future research

This study represents a first step toward improving displays for voluminous works in IRSs. Future research could begin by investigating a variety of different types of voluminous work. The “typical” composition of a set of documents related to a particular voluminous work is unknown; moreover, it is likely that the composition of work sets varies a good deal from one work to another. Thus, the notion of a “typical” work set may be inappropriate. For example, documents related to the Koran,

Conclusion

Known-item searches, far from being uninteresting and non-problematic, pose a myriad of fascinating challenges to IRS interface designers. In addition, solutions to problems presented by these searches may lead to exciting innovations in the structure and quality of information system displays. Incorporation of alternatives to the long-list retrieval model in our IRSs has the potential to enhance the information environment of users by increasing their ability to identify documents of interest

Acknowledgements

A grant from Kent State University funded this research. An earlier version of this paper won the 2000 OCLC/ALISE Research Paper Award. Research assistants Rebecca Albrecht and Melanie Rapp participated in the data collection. I would also like to acknowledge the following for their invaluable assistance along the way: Julie Gedeon, Rick Rubin, Jan Winchell, and Sue Gong from Kent State University; Raya Fidel, Terry Brooks, Dean Billheimer, Peter F. Cragmile, and David Poole from the University

References (43)

N.J. Cooke
Varieties of knowledge elicitation techniques
International Journal of Human–Computer Studies
(1994)
D. Hayhoe
Sorting-based menu categories
International Journal of Man–Machine Studies
(1990)
S. Lewis
Cluster analysis as a technique to guide interface design
International Journal of Man–Machine Studies
(1991)
G.A. Miller
A psychological method to investigate verbal concepts
Journal of Mathematical Psychology
(1969)
Aldenderfer, M. S., & Blashfield, R. K. (1984). Cluster analysis. Quantitative applications in the social sciences,...
A. Carlyle
User categorisation of works: toward improved organisation of online catalogue displays
Journal of Documentation
(1999)
A. Carlyle
Fulfilling the second objective in the online catalog: schemes for organizing author and work records into usable displays
Library Resources and Technical Services
(1997)
Carlyle, A. (1997b). The role of classification in the creation of author and work displays in online catalogues. In...
A. Carlyle
Ordering author and work records: an evaluation of collocation in online catalog displays
Journal of the American Society for Information Science
(1996)
Carlyle, A., & Summerlin, J. (2000). Transforming catalog displays: record clustering for works of fiction. In C....

P. Dunn-Rankin

Scaling methods

(1983)

A. Faiks et al.

Gaining user insight: a case study illustrating the card sort technique

College and Research Libraries

(2000)

R. Fidel

User-centered indexing

Journal of the American Society for Information Science

(1994)

Hearst, M. A., & Pedersen, J. O. (1996). Reexamining the cluster hypothesis: scatter/gather on retrieval results. In...

Jackson, S. L. (1958). In V. Vostecky (Ed.), Catalog use study. Chicago: American Library...

Jörgensen, C. (1995). Image attributes: an investigation. Ph.D. Dissertion, Syracuse...

Karat, J., Atwood, M. E., Dray, S. M., Rantzer, M., & Wixon, D. R. (1996). User centered design: quality or quackery?...

M.T. Kinnucan

Fisheye views as an aid to subject access in online catalogues

Canadian Journal of Information Science

(1992)

R.R. Larson

Classification clustering, probabilistic information retrieval and the online catalog

Library Quarterly

(1991)

Leazer, G. H., & Smiraglia, R. A. (1996). Toward the bibliographic control of works: derivative bibliographic...

Y.K. Leung et al.

A review and taxonomy of distortion-oriented presentation techniques

ACM Transactions on Computer–Human Interaction

(1994)

Cited by (20)

Learning search keywords for construction procurement
2005, Automation in Construction
Seeking information from websites has become an essential part of a contractor's procurement undertaking, as more and more procurement websites become available on the Internet. Websites host extremely large amounts of information; a keyword search, therefore, is often more efficient than browsing via an index. However, in order to find the desired information, it may be necessary to enter keywords using a trial-and-error process. This research recognizes that professional procurement experience can help users search website information more effectively, by using fewer keywords, and so proposes a learning model and suggestion model that can capture such experience, thus guiding inexperienced users in their search. Experiments, evaluating the performance of the system, were also conducted.
Research on Consumer Preferences and Potential Users of Vacuum Cleaning Robots-Based on Text Mining and Questionnaire Surveys
2023, Advances in Transdisciplinary Engineering
Regional economic performance and the differential prevalence of corporate and family business
2022, Journal of Enterprising Communities
Patterns of enterprise strategies in labour-intensive industries: The case of five EU countries
2019, The Moving Frontier: The Changing Geography of Production in Labour-Intensive Industries
Innovation processes in adverse institutional settings: Connectedness and disconnectedness in three regions of Ukraine
2016, Entrepreneurship, Innovation and Regional Development
Linking information through function
2014, Journal of the Association for Information Science and Technology

View all citing articles on Scopus

View full text

Developing organized information displays for voluminous works: a study of user clustering behavior

Abstract

Introduction

Section snippets

Rationale for the study

Review of relevant research

Research design

Cluster analysis results

Discussion

Implications for catalog and other information displays

Future research

Conclusion

Acknowledgements

International Journal of Human–Computer Studies

International Journal of Man–Machine Studies

International Journal of Man–Machine Studies

Journal of Mathematical Psychology

User categorisation of works: toward improved organisation of online catalogue displays

Journal of Documentation

Fulfilling the second objective in the online catalog: schemes for organizing author and work records into usable displays

Library Resources and Technical Services

Ordering author and work records: an evaluation of collocation in online catalog displays

Journal of the American Society for Information Science

Scaling methods

Gaining user insight: a case study illustrating the card sort technique

College and Research Libraries

User-centered indexing

Journal of the American Society for Information Science

Fisheye views as an aid to subject access in online catalogues

Canadian Journal of Information Science

Classification clustering, probabilistic information retrieval and the online catalog

Library Quarterly

A review and taxonomy of distortion-oriented presentation techniques

ACM Transactions on Computer–Human Interaction