research-article

Browsing Hierarchy Construction by Minimum Evolution

Author:
Hui Yang

Department of Computer Science, Georgetown University, Washington DC, USA

Department of Computer Science, Georgetown University, Washington DC, USA
View Profile

Authors Info & Claims

ACM Transactions on Information Systems Volume 33 Issue 3Article No.: 13pp 1–33https://doi.org/10.1145/2714574

Published:23 March 2015Publication History

ACM Transactions on Information Systems

Abstract

Hierarchies serve as browsing tools to access information in document collections. This article explores techniques to derive browsing hierarchies that can be used as an information map for task-based search. It proposes a novel minimum-evolution hierarchy construction framework that directly learns semantic distances from training data and from users to construct hierarchies. The aim is to produce globally optimized hierarchical structures by incorporating user-generated task specifications into the general learning framework. Both an automatic version of the framework and an interactive version are presented. A comparison with state-of-the-art systems and a user study jointly demonstrate that the proposed framework is highly effective.

References

Ibrahim Adepoju Adeyanju, Dawei Song, M-Dyaa Albakour, Udo Kruschwitz, Anne De Roeck, and Maria Fasli. 2012. Adaptation of the Concept Hierarchy Model with Search Logs for Query Recommendation on Intranets. In Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’12). ACM, 5--14. DOI:http://dx.doi.org/10.1145/2348283.2348288 Google ScholarDigital Library
Leif Azzopardi. 2014. Modelling Interaction with Economic Models of Search. In Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’14). ACM, 3--12. DOI:http://dx.doi.org/10.1145/2600428.2609574 Google ScholarDigital Library
Jerome R. Bellegarda, John W. Butzberger, Yen-Lu Chow, Noah B. Coccaro, and Devang Naik. 1996. A Novel Word Clustering Algorithm Based on Latent Semantic Analysis. In Proceedings of the Acoustics, Speech, and Signal Processing, 1996. On Conference Proceedings, 1996 IEEE International Conference - Volume 01 (ICASSP’96). IEEE Computer Society, 172--175. DOI:http://dx.doi.org/10.1109/ICASSP.1996.540318 Google ScholarDigital Library
Matthew Berland and Eugene Charniak. 1999. Finding Parts in Very Large Corpora. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics (ACL’99). Association for Computational Linguistics, 57--64. DOI:http://dx.doi.org/10.3115/1034678.1034697 Google ScholarDigital Library
Rajendra Bhatia. 2006. Positive Definite Matrices (Princeton Series in Applied Mathematics). Princeton University Press.Google Scholar
David Carmel, Haggai Roitman, and Naama Zwerdling. 2009. Enhancing Cluster Labeling Using Wikipedia. In Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’09). ACM, 139--146. DOI:http://dx.doi.org/10.1145/1571941.1571967 Google ScholarDigital Library
Claudio Carpineto, Stefano Mizzaro, Giovanni Romano, and Matteo Snidero. 2009. Mobile Information Retrieval with Search Results Clustering: Prototypes and Evaluations. Journal of the American Society of Information Science Technology. 60, 5 (May 2009), 877--895. DOI:http://dx.doi.org/10.1002/asi.v60:5 Google ScholarDigital Library
Claudio Carpineto and Giovanni Romano. 2010. Optimal Meta Search Results Clustering. In Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’10). ACM, Geneva, Switzerland, 170--177. DOI:http://dx.doi.org/10.1145/1835449.1835480 Google ScholarDigital Library
Barbara S. Chaparro, Veronica D. Hinkle, and Shannon K. Riley. 2008. The Usability of Computerized Card Sorting: A Comparison of Three Applications by Researchers and End Users. Journal of Usability Studies 4, 1 (2008), 31--48.Google ScholarDigital Library
Philipp Cimiano and Johanna Wenderoth. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics. 888--895.Google Scholar
W. Bruce Croft, Donald Metzler, and Trevor Strohman. 2004. Search Engines: Information Retrieval in Practice. Addison Wesley. Google ScholarDigital Library
Carolyn J. Crouch. 1988. A Cluster-based Approach to Thesaurus Construction. In Proceedings of the 11th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’88). ACM, 309--320. DOI:http://dx.doi.org/10.1145/62437.62467 Google ScholarDigital Library
Douglass R. Cutting, David R. Karger, Jan O. Pedersen, and John W. Tukey. 1992. Scatter/Gather: A Cluster-based Approach to Browsing Large Document Collections. In Proceedings of the 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’92). ACM, 318--329. DOI:http://dx.doi.org/10.1145/133160.133214 Google ScholarDigital Library
Wisam Dakka, Panagiotis G. Ipeirotis, and Kenneth R. Wood. 2005. Automatic Construction of Multifaceted Browsing Interfaces. In Proceedings of the 14th ACM International Conference on Information and Knowledge Management (CIKM’05). ACM, 768--775. DOI:http://dx.doi.org/10.1145/1099554.1099738 Google ScholarDigital Library
Adriel Dean-Hall, Charles L. A. Clarke, Jaap Kamps, Paul Thomas, and Ellen Voorhees. 2012. Overview of the TREC 2012 Contextual Suggestion Track. In Proceedings of the 21st Text REtrieval Conference (TREC’12). NIST.Google Scholar
Oren Etzioni, Michael Cafarella, Doug Downey, Ana-Maria Popescu, Tal Shaked, Stephen Soderland, Daniel S. Weld, and Alexander Yates. 2005. Unsupervised Named-entity Extraction from the Web: An Experimental Study. Artificial Inteligence 165, 1 (June 2005), 91--134. DOI:http://dx.doi.org/10.1016/j.artint.2005.03.001 Google ScholarDigital Library
Christiane Fellbaum. 1998. WordNet: An Electronic Lexical Database. MIT Press.Google Scholar
Roxana Girju, Adriana Badulescu, and Dan Moldovan. 2003. Learning Semantic Constraints for the Automatic Discovery of Part-whole Relations. In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1 (NAACL’03). Association for Computational Linguistics, 1--8. http://dx.doi.org/10.3115/1073445.1073456 Google ScholarDigital Library
Dongyi Guan and Hui Yang. 2013. Increasing Stability of Result Organization for Session Search. In Proceedings of the 35th European Conference on Advances in Information Retrieval (ECIR’13). Springer-Verlag, 471--482. DOI:http://dx.doi.org/10.1007/978-3-642-36973-5_40 Google ScholarDigital Library
Zelig Harris. 1970. Distributional structure. In Papers in Structural and Transformational Linguistics. D. Reidel Publishing Company, 775--794.Google Scholar
Marti A. Hearst. 1992. Automatic Acquisition of Hyponyms from Large Text Corpora. In Proceedings of the 14th Conference on Computational Linguistics - Volume 2 (COLING’92). Association for Computational Linguistics, 539--545. DOI:http://dx.doi.org/10.3115/992133.992154 Google ScholarDigital Library
Katja Hofmann, Shimon Whiteson, and Maarten de Rijke. 2011. Balancing Exploration and Exploitation in Learning to Rank Online. In Proceedings of the 33rd European Conference on Advances in Information Retrieval (ECIR’11). Springer-Verlag, 251--263. Google ScholarDigital Library
Katja Hofmann, Shimon Whiteson, and Maarten Rijke. 2013. Balancing Exploration and Exploitation in Listwise and Pairwise Online Learning to Rank for Information Retrieval. Information Retrieval 16, 1 (Feb. 2013), 63--90. DOI:http://dx.doi.org/10.1007/s10791-012-9197-9 Google ScholarDigital Library
Yifen Huang and Tom Mitchell. 2007. A Framework for Mixed-Initiative Clustering. In North East Student Colloquium on Artificial Intelligence (NESCAI’07).Google Scholar
Evangelos Kanoulas, Ben Carterette, Mark Hall, Paul Clough, and Mark Sanderson. 2013. Overview of the TREC 2013 Session Track. In Proceedings of the 22nd Text REtrieval Conference (TREC’13). NIST.Google Scholar
Weimao Ke, Cassidy R. Sugimoto, and Javed Mostafa. 2009. Dynamicity vs. Effectiveness: Studying Online Clustering for Scatter/Gather. In Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’09). ACM, 19--26. DOI:http://dx.doi.org/10.1145/1571941.1571947 Google ScholarDigital Library
Diane Kelly, Amber Cushing, Maureen Dostert, Xi Niu, and Karl Gyllstrom. 2010. Effects of Popularity and Quality on the Usage of Query Suggestions During Information Search. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI’10). ACM, 45--54. DOI:http://dx.doi.org/10.1145/1753326.1753334 Google ScholarDigital Library
Andruid Kerne, Eunyee Koh, Vikram Sundaram, and J. Michael Mistrot. 2005. Generative Semantic Clustering in Spatial Hypertext. In Proceedings of the 2005 ACM Symposium on Document Engineering (DocEng’05). ACM, 84--93. DOI:http://dx.doi.org/10.1145/1096601.1096624 Google ScholarDigital Library
Zornitsa Kozareva and Eduard Hovy. 2010. A Semi-supervised Method to Learn and Construct Taxonomies Using the Web. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing (EMNLP’10). Association for Computational Linguistics, 1110--1118. Google ScholarDigital Library
Zornitsa Kozareva, Ellen Riloff, and Eduard H. Hovy. 2008. Semantic Class Learning from the Web with Hyponym Pattern Linkage Graphs. In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics. The Association for Computer Linguistics, 1048--1056.Google Scholar
Krishna Kummamuru, Rohit Lotlikar, Shourya Roy, Karan Singal, and Raghu Krishnapuram. 2004. A Hierarchical Monothetic Document Clustering Algorithm for Summarization and Browsing Search Results. In Proceedings of the 13th International Conference on World Wide Web (WWW’04). ACM, 658--665. DOI:http://dx.doi.org/10.1145/988672.988762 Google ScholarDigital Library
Dmitry Lagun and Eugene Agichtein. 2011. ViewSer: Enabling Large-scale Remote User Studies of Web Search Examination and Interaction. In Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’11). ACM, 365--374. DOI:http://dx.doi.org/10.1145/2009916.2009967 Google ScholarDigital Library
Dawn Lawrie, W. Bruce Croft, and Arnold Rosenberg. 2001. Finding Topic Words for Hierarchical Summarization. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’01). ACM, 349--357. DOI:http://dx.doi.org/10.1145/383952.384022 Google ScholarDigital Library
Jiyun Luo, Dongyi Guan, and Hui Yang. 2013. InfoLand: Information Lay-of-land for Session Search. In Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’13). ACM, 1097--1098. DOI:http://dx.doi.org/10.1145/2484028.2484213 Google ScholarDigital Library
Jiyun Luo, Sicong Zhang, and Hui Yang. 2014. Win-Win Search: Dual-Agent Stochastic Game in Session Search. In Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’14). ACM, 587--596. DOI:http://dx.doi.org/10.1145/2600428.2609629 Google ScholarDigital Library
Prasanta C. Mahalanobis. 1936. On the Generalised Distance in Statistics. In Proceedings National Institute of Science, India, Vol. 2. 49--55. Retrieved from http://ir.isical.ac.in/dspace/handle/1/1268.Google Scholar
ODP. 2011. Open Directory Project. Retrieved from http://www.dmoz.org/.Google Scholar
Patrick Pantel and Marco Pennacchiotti. 2006. Espresso: Leveraging Generic Patterns for Automatically Harvesting Semantic Relations. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics (ACL-44). Association for Computational Linguistics, 113--120. DOI:http://dx.doi.org/10.3115/1220175.1220190 Google ScholarDigital Library
Deepak Ravichandran and Eduard Hovy. 2002. Learning Surface Text Patterns for a Question Answering System. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (ACL’02). Association for Computational Linguistics, Philadelphia, Pennsylvania, 41--47. http://dx.doi.org/10.3115/1073083.1073092 Google ScholarDigital Library
Mark Sanderson and Bruce Croft. 1999. Deriving Concept Hierarchies from Text. In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’99). ACM, Berkeley, California, USA, 206--213. DOI:http://dx.doi.org/10.1145/312624.312679 Google ScholarDigital Library
Mark Sanderson and Dawn Lawrie. 2000. Building, Testing, and Applying Concept Hierarchies. Kluwer Academic Publishers. 235--256 pages.Google Scholar
Rion Snow, Daniel Jurafsky, and Andrew Y. Ng. 2005. Learning Syntactic Patterns for Automatic Hypernym Discovery. In Proceedings of the 19th Annual Conference on Neural Information Processing Systems (NIPS’05). Vancouver and Whistler, Canada.Google Scholar
Rion Snow, Daniel Jurafsky, and Andrew Y. Ng. 2006. Semantic Taxonomy Induction from Heterogenous Evidence. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics (ACL-44). Association for Computational Linguistics, 801--808. http://dx.doi.org/10.3115/1220175.1220276 Google ScholarDigital Library
Emilia Stoica and Marti A. Hearst. 2007. Automating Creation of Hierarchical Faceted Metadata Structures. In Proceedings of the Human Language Technology Conference (NAACL-HLT’07).Google Scholar
Trevor Strohman, Donald Metzler, Howard Turtle, and W. Bruce Croft. 2005. Indri: A language-model based search engine for complex queries.Google Scholar
Robert Tibshirani, Guenther Walther, and Trevor Hastie. 2000. Estimating the Number of Clusters in a Dataset via the Gap Statistic. Technical Report 208. Department of Statistics, Standfor University.Google Scholar
Xuanhui Wang and ChengXiang Zhai. 2007. Learn from Web Search Logs to Organize Search Results. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’07). ACM, Amsterdam, 87--94. DOI:http://dx.doi.org/10.1145/1277741.1277759 Google ScholarDigital Library
Hui Yang. 2011. Personalized Concept Hierarchy Construction. Ph.D. Dissertation. Carnegie Mellon University. Retrieved from http://www.cs.cmu.edu/&sim;huiyang/publication/dissertation.pdf. Google ScholarDigital Library
Hui Yang. 2014. A Fragment-based Similarity Measure for Concept Hierarchies and Ontologies. In Proceedings of the 7th International Workshop on Exploiting Semantic Annotations in Information Retrieval (ESAIR’14). ACM, 41--42. DOI:http://dx.doi.org/10.1145/2663712.2666188 Google ScholarDigital Library
Hui Yang and Jamie Callan. 2009. A Metric-based Framework for Automatic Taxonomy Induction. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1 (ACL’09). Association for Computational Linguistics, 271--279. Google ScholarDigital Library
Ka-Ping Yee, Kirsten Swearingen, Kevin Li, and Marti Hearst. 2003. Faceted Metadata for Image Search and Browsing. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI’03). ACM, 401--408. DOI:http://dx.doi.org/10.1145/642611.642681 Google ScholarDigital Library
Chengxiang Zhai and John Lafferty. 2004. A Study of Smoothing Methods for Language Models Applied to Information Retrieval. ACM Trans. Inf. Syst. 22, 2 (April 2004), 179--214. DOI:http://dx.doi.org/10.1145/984321.984322 Google ScholarDigital Library
Sicong Zhang, Dongyi Guan, and Hui Yang. 2013. Query Change as Relevance Feedback in Session Search. In Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’13). ACM, 821--824. DOI:http://dx.doi.org/10.1145/2484028.2484171 Google ScholarDigital Library

Index Terms

Browsing Hierarchy Construction by Minimum Evolution
1. Information systems
  1. Information storage systems
    1. Record storage systems

Recommendations

Personalized concept hierarchy construction
Read More
Topic hierarchy construction for the organization of multi-source user generated contents
SIGIR '13: Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval

User generated contents (UGCs) carry a huge amount of high quality information. However, the information overload and diversity of UGC sources limit their potential uses. In this research, we propose a framework to organize information from multiple UGC ...
Read More
Hierarchy evolution for improved classification
CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge management

Hierarchical classification has been shown to have superior performance than flat classification. It is typically performed on hierarchies created by and for humans rather than for classification performance. As a result, classification based on such ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Information Systems Volume 33, Issue 3
March 2015
184 pages
ISSN:1046-8188
EISSN:1558-2868
DOI:10.1145/2737814
Editor:
Maarten de Rijke
University of Amsterdam, The Netherlands
Issue’s Table of Contents
Copyright © 2015 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 23 March 2015
- Accepted: 1 January 2015
- Revised: 1 November 2014
- Received: 1 October 2013
Published in tois Volume 33, Issue 3

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Browsing hierarchy construction
complex search
information organization
minimum evolution
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 4
  Total Citations
  View Citations
- 206
  Total Downloads
- Downloads (Last 12 months)3
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Browsing Hierarchy Construction by Minimum Evolution

ACM Transactions on Information Systems

Abstract

References

Cited By

Index Terms

Recommendations

Personalized concept hierarchy construction

Topic hierarchy construction for the organization of multi-source user generated contents

Hierarchy evolution for improved classification