Developing organized information displays for voluminous works: a study of user clustering behavior
Introduction
The purpose of the research presented in this paper is to discover the ways in which people group or categorize documents associated with a voluminous work such as Shakespeare's Hamlet, to guide the construction of organized displays in information retrieval systems (IRSs). The research emphasizes an area of information seeking seldom examined in information retrieval research, namely, queries for known-items. Two types of query are common in IRSs: queries for subjects or topics, and queries for documents already known to the searcher, or known-item queries. Much of the emphasis in research and development of IRSs has been on subject queries. Subject queries pose serious problems to users and to systems designers, so it makes sense that research has focused on them. Known-item queries, on the other hand, are frequently assumed to be unproblematic because they involve formal search terms, i.e., author names and titles. However, queries for known items, like subject queries, present a variety of significant challenges for retrieval and interface design.
One of the challenges that user interfaces for IRSs must meet with respect to known-item searching is how to display large groups of documents, or records representing documents, sharing various types of bibliographic relationships (see Tillet, 1991 for a taxonomy of these relationships). An example of a group of documents sharing such relationships includes: the motion picture Gone With the Wind; the typescript screenplay used to produce it; a book of trivia about the making of the movie; and the original book by Margaret Mitchell, upon which the screenplay and movie are based. This group of documents is representative of a type of document group that will, in this paper, be called a voluminous work or work set.1 A voluminous work is a large group of documents sharing a variety of relationships that evolve out of and are linked to a common originator document. In the case of Gone With the Wind, the common originator document is the original written text by Margaret Mitchell. Thus, the Koran, Stephen Hawking's A Brief History of Time, Charles Dickens' A Christmas Carol, or any of the individual works of Shakespeare would fit the definition of voluminous work. Voluminous works may generate hundreds or thousands of related documents, including editions, translations, works of criticism, and adaptations for other audiences or into other mediums. For example, although published only in 1988, Hawking' A Brief History of Time has already generated over 50 editions, translations, video-recordings, and other related works.
Section snippets
Rationale for the study
Records for documents associated with voluminous works appear frequently in bibliographic databases such as online catalogs. Even relatively small databases may contain large numbers of records representing these works. In the Internet environment, this problem has been identified as the “versions” problem (e.g., Leazer and Smiraglia, 1996; Levy, 1995). Works that embody many bibliographic relationships are an important focus for research. Because they are popular, they are likely to be sought
Review of relevant research
One of the most serious obstacles to the representation and display of information contained in an IRS is information overload. Information overload plagued system design even in the manual environment. Findings from a major study of card catalogs in the 1950s, for example, showed that search failures tended to increase as catalog size increased (Jackson, 1958, p. 19). In this study it was noted that search failures occurred more often in searches for known authors and works than it did in
Research design
The underlying assumption of this research was that if IRSs reflect the ways in which people themselves organize documents, they would be more responsive to users' information needs. The specific goal of the research was to begin to understand how people organize documents that comprise a particular voluminous work set. The study addressed two questions: What are the common groups that are created by people when they categorize documents related to a voluminous work? and, What are the
Cluster analysis results
Hierarchical cluster analysis was used to determine common groupings of documents based on a comparison of the composition of all the groups created by participants in the study. Data collected on which documents appeared together in groups for each participant were compiled, and the cluster analysis calculated the frequency with which any two documents were placed in the same group by all of the study participants. Clusters were formed one step at a time, representing those documents grouped
Discussion
The most frequently appearing attributes discovered in the cluster analysis include: physical format, language, and audience; other attributes that appear are: content age or integrity, physical characteristics, pictorial elements and usage. Dominant attributes, that is, ones used by a majority of study participants in their written descriptions, discovered in the qualitative study, include: physical format, language, content description, audience, pictorial elements, and usage (Carlyle, 1999).
Implications for catalog and other information displays
Library cataloging records already contain indicators representing many of the attributes identified in this study. In fact, card catalogs featured arrangements that grouped cards based on several of the attributes discovered in the study, including language and content age or integrity. If one takes differences in physical format, audience, and usage to indicate significant changes in text, which they often do, then card catalog arrangements also reflected these attributes. Library
Future research
This study represents a first step toward improving displays for voluminous works in IRSs. Future research could begin by investigating a variety of different types of voluminous work. The “typical” composition of a set of documents related to a particular voluminous work is unknown; moreover, it is likely that the composition of work sets varies a good deal from one work to another. Thus, the notion of a “typical” work set may be inappropriate. For example, documents related to the Koran,
Conclusion
Known-item searches, far from being uninteresting and non-problematic, pose a myriad of fascinating challenges to IRS interface designers. In addition, solutions to problems presented by these searches may lead to exciting innovations in the structure and quality of information system displays. Incorporation of alternatives to the long-list retrieval model in our IRSs has the potential to enhance the information environment of users by increasing their ability to identify documents of interest
Acknowledgements
A grant from Kent State University funded this research. An earlier version of this paper won the 2000 OCLC/ALISE Research Paper Award. Research assistants Rebecca Albrecht and Melanie Rapp participated in the data collection. I would also like to acknowledge the following for their invaluable assistance along the way: Julie Gedeon, Rick Rubin, Jan Winchell, and Sue Gong from Kent State University; Raya Fidel, Terry Brooks, Dean Billheimer, Peter F. Cragmile, and David Poole from the University
References (43)
Varieties of knowledge elicitation techniques
International Journal of Human–Computer Studies
(1994)Sorting-based menu categories
International Journal of Man–Machine Studies
(1990)Cluster analysis as a technique to guide interface design
International Journal of Man–Machine Studies
(1991)A psychological method to investigate verbal concepts
Journal of Mathematical Psychology
(1969)- Aldenderfer, M. S., & Blashfield, R. K. (1984). Cluster analysis. Quantitative applications in the social sciences,...
User categorisation of works: toward improved organisation of online catalogue displays
Journal of Documentation
(1999)Fulfilling the second objective in the online catalog: schemes for organizing author and work records into usable displays
Library Resources and Technical Services
(1997)- Carlyle, A. (1997b). The role of classification in the creation of author and work displays in online catalogues. In...
Ordering author and work records: an evaluation of collocation in online catalog displays
Journal of the American Society for Information Science
(1996)- Carlyle, A., & Summerlin, J. (2000). Transforming catalog displays: record clustering for works of fiction. In C....
Scaling methods
Gaining user insight: a case study illustrating the card sort technique
College and Research Libraries
User-centered indexing
Journal of the American Society for Information Science
Fisheye views as an aid to subject access in online catalogues
Canadian Journal of Information Science
Classification clustering, probabilistic information retrieval and the online catalog
Library Quarterly
A review and taxonomy of distortion-oriented presentation techniques
ACM Transactions on Computer–Human Interaction
Cited by (20)
Learning search keywords for construction procurement
2005, Automation in ConstructionResearch on Consumer Preferences and Potential Users of Vacuum Cleaning Robots-Based on Text Mining and Questionnaire Surveys
2023, Advances in Transdisciplinary EngineeringRegional economic performance and the differential prevalence of corporate and family business
2022, Journal of Enterprising CommunitiesPatterns of enterprise strategies in labour-intensive industries: The case of five EU countries
2019, The Moving Frontier: The Changing Geography of Production in Labour-Intensive IndustriesInnovation processes in adverse institutional settings: Connectedness and disconnectedness in three regions of Ukraine
2016, Entrepreneurship, Innovation and Regional DevelopmentLinking information through function
2014, Journal of the Association for Information Science and Technology