Abstract
Hierarchies serve as browsing tools to access information in document collections. This article explores techniques to derive browsing hierarchies that can be used as an information map for task-based search. It proposes a novel minimum-evolution hierarchy construction framework that directly learns semantic distances from training data and from users to construct hierarchies. The aim is to produce globally optimized hierarchical structures by incorporating user-generated task specifications into the general learning framework. Both an automatic version of the framework and an interactive version are presented. A comparison with state-of-the-art systems and a user study jointly demonstrate that the proposed framework is highly effective.
- Ibrahim Adepoju Adeyanju, Dawei Song, M-Dyaa Albakour, Udo Kruschwitz, Anne De Roeck, and Maria Fasli. 2012. Adaptation of the Concept Hierarchy Model with Search Logs for Query Recommendation on Intranets. In Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’12). ACM, 5--14. DOI:http://dx.doi.org/10.1145/2348283.2348288 Google ScholarDigital Library
- Leif Azzopardi. 2014. Modelling Interaction with Economic Models of Search. In Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’14). ACM, 3--12. DOI:http://dx.doi.org/10.1145/2600428.2609574 Google ScholarDigital Library
- Jerome R. Bellegarda, John W. Butzberger, Yen-Lu Chow, Noah B. Coccaro, and Devang Naik. 1996. A Novel Word Clustering Algorithm Based on Latent Semantic Analysis. In Proceedings of the Acoustics, Speech, and Signal Processing, 1996. On Conference Proceedings, 1996 IEEE International Conference - Volume 01 (ICASSP’96). IEEE Computer Society, 172--175. DOI:http://dx.doi.org/10.1109/ICASSP.1996.540318 Google ScholarDigital Library
- Matthew Berland and Eugene Charniak. 1999. Finding Parts in Very Large Corpora. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics (ACL’99). Association for Computational Linguistics, 57--64. DOI:http://dx.doi.org/10.3115/1034678.1034697 Google ScholarDigital Library
- Rajendra Bhatia. 2006. Positive Definite Matrices (Princeton Series in Applied Mathematics). Princeton University Press.Google Scholar
- David Carmel, Haggai Roitman, and Naama Zwerdling. 2009. Enhancing Cluster Labeling Using Wikipedia. In Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’09). ACM, 139--146. DOI:http://dx.doi.org/10.1145/1571941.1571967 Google ScholarDigital Library
- Claudio Carpineto, Stefano Mizzaro, Giovanni Romano, and Matteo Snidero. 2009. Mobile Information Retrieval with Search Results Clustering: Prototypes and Evaluations. Journal of the American Society of Information Science Technology. 60, 5 (May 2009), 877--895. DOI:http://dx.doi.org/10.1002/asi.v60:5 Google ScholarDigital Library
- Claudio Carpineto and Giovanni Romano. 2010. Optimal Meta Search Results Clustering. In Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’10). ACM, Geneva, Switzerland, 170--177. DOI:http://dx.doi.org/10.1145/1835449.1835480 Google ScholarDigital Library
- Barbara S. Chaparro, Veronica D. Hinkle, and Shannon K. Riley. 2008. The Usability of Computerized Card Sorting: A Comparison of Three Applications by Researchers and End Users. Journal of Usability Studies 4, 1 (2008), 31--48.Google ScholarDigital Library
- Philipp Cimiano and Johanna Wenderoth. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics. 888--895.Google Scholar
- W. Bruce Croft, Donald Metzler, and Trevor Strohman. 2004. Search Engines: Information Retrieval in Practice. Addison Wesley. Google ScholarDigital Library
- Carolyn J. Crouch. 1988. A Cluster-based Approach to Thesaurus Construction. In Proceedings of the 11th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’88). ACM, 309--320. DOI:http://dx.doi.org/10.1145/62437.62467 Google ScholarDigital Library
- Douglass R. Cutting, David R. Karger, Jan O. Pedersen, and John W. Tukey. 1992. Scatter/Gather: A Cluster-based Approach to Browsing Large Document Collections. In Proceedings of the 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’92). ACM, 318--329. DOI:http://dx.doi.org/10.1145/133160.133214 Google ScholarDigital Library
- Wisam Dakka, Panagiotis G. Ipeirotis, and Kenneth R. Wood. 2005. Automatic Construction of Multifaceted Browsing Interfaces. In Proceedings of the 14th ACM International Conference on Information and Knowledge Management (CIKM’05). ACM, 768--775. DOI:http://dx.doi.org/10.1145/1099554.1099738 Google ScholarDigital Library
- Adriel Dean-Hall, Charles L. A. Clarke, Jaap Kamps, Paul Thomas, and Ellen Voorhees. 2012. Overview of the TREC 2012 Contextual Suggestion Track. In Proceedings of the 21st Text REtrieval Conference (TREC’12). NIST.Google Scholar
- Oren Etzioni, Michael Cafarella, Doug Downey, Ana-Maria Popescu, Tal Shaked, Stephen Soderland, Daniel S. Weld, and Alexander Yates. 2005. Unsupervised Named-entity Extraction from the Web: An Experimental Study. Artificial Inteligence 165, 1 (June 2005), 91--134. DOI:http://dx.doi.org/10.1016/j.artint.2005.03.001 Google ScholarDigital Library
- Christiane Fellbaum. 1998. WordNet: An Electronic Lexical Database. MIT Press.Google Scholar
- Roxana Girju, Adriana Badulescu, and Dan Moldovan. 2003. Learning Semantic Constraints for the Automatic Discovery of Part-whole Relations. In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1 (NAACL’03). Association for Computational Linguistics, 1--8. http://dx.doi.org/10.3115/1073445.1073456 Google ScholarDigital Library
- Dongyi Guan and Hui Yang. 2013. Increasing Stability of Result Organization for Session Search. In Proceedings of the 35th European Conference on Advances in Information Retrieval (ECIR’13). Springer-Verlag, 471--482. DOI:http://dx.doi.org/10.1007/978-3-642-36973-5_40 Google ScholarDigital Library
- Zelig Harris. 1970. Distributional structure. In Papers in Structural and Transformational Linguistics. D. Reidel Publishing Company, 775--794.Google Scholar
- Marti A. Hearst. 1992. Automatic Acquisition of Hyponyms from Large Text Corpora. In Proceedings of the 14th Conference on Computational Linguistics - Volume 2 (COLING’92). Association for Computational Linguistics, 539--545. DOI:http://dx.doi.org/10.3115/992133.992154 Google ScholarDigital Library
- Katja Hofmann, Shimon Whiteson, and Maarten de Rijke. 2011. Balancing Exploration and Exploitation in Learning to Rank Online. In Proceedings of the 33rd European Conference on Advances in Information Retrieval (ECIR’11). Springer-Verlag, 251--263. Google ScholarDigital Library
- Katja Hofmann, Shimon Whiteson, and Maarten Rijke. 2013. Balancing Exploration and Exploitation in Listwise and Pairwise Online Learning to Rank for Information Retrieval. Information Retrieval 16, 1 (Feb. 2013), 63--90. DOI:http://dx.doi.org/10.1007/s10791-012-9197-9 Google ScholarDigital Library
- Yifen Huang and Tom Mitchell. 2007. A Framework for Mixed-Initiative Clustering. In North East Student Colloquium on Artificial Intelligence (NESCAI’07).Google Scholar
- Evangelos Kanoulas, Ben Carterette, Mark Hall, Paul Clough, and Mark Sanderson. 2013. Overview of the TREC 2013 Session Track. In Proceedings of the 22nd Text REtrieval Conference (TREC’13). NIST.Google Scholar
- Weimao Ke, Cassidy R. Sugimoto, and Javed Mostafa. 2009. Dynamicity vs. Effectiveness: Studying Online Clustering for Scatter/Gather. In Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’09). ACM, 19--26. DOI:http://dx.doi.org/10.1145/1571941.1571947 Google ScholarDigital Library
- Diane Kelly, Amber Cushing, Maureen Dostert, Xi Niu, and Karl Gyllstrom. 2010. Effects of Popularity and Quality on the Usage of Query Suggestions During Information Search. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI’10). ACM, 45--54. DOI:http://dx.doi.org/10.1145/1753326.1753334 Google ScholarDigital Library
- Andruid Kerne, Eunyee Koh, Vikram Sundaram, and J. Michael Mistrot. 2005. Generative Semantic Clustering in Spatial Hypertext. In Proceedings of the 2005 ACM Symposium on Document Engineering (DocEng’05). ACM, 84--93. DOI:http://dx.doi.org/10.1145/1096601.1096624 Google ScholarDigital Library
- Zornitsa Kozareva and Eduard Hovy. 2010. A Semi-supervised Method to Learn and Construct Taxonomies Using the Web. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing (EMNLP’10). Association for Computational Linguistics, 1110--1118. Google ScholarDigital Library
- Zornitsa Kozareva, Ellen Riloff, and Eduard H. Hovy. 2008. Semantic Class Learning from the Web with Hyponym Pattern Linkage Graphs. In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics. The Association for Computer Linguistics, 1048--1056.Google Scholar
- Krishna Kummamuru, Rohit Lotlikar, Shourya Roy, Karan Singal, and Raghu Krishnapuram. 2004. A Hierarchical Monothetic Document Clustering Algorithm for Summarization and Browsing Search Results. In Proceedings of the 13th International Conference on World Wide Web (WWW’04). ACM, 658--665. DOI:http://dx.doi.org/10.1145/988672.988762 Google ScholarDigital Library
- Dmitry Lagun and Eugene Agichtein. 2011. ViewSer: Enabling Large-scale Remote User Studies of Web Search Examination and Interaction. In Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’11). ACM, 365--374. DOI:http://dx.doi.org/10.1145/2009916.2009967 Google ScholarDigital Library
- Dawn Lawrie, W. Bruce Croft, and Arnold Rosenberg. 2001. Finding Topic Words for Hierarchical Summarization. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’01). ACM, 349--357. DOI:http://dx.doi.org/10.1145/383952.384022 Google ScholarDigital Library
- Jiyun Luo, Dongyi Guan, and Hui Yang. 2013. InfoLand: Information Lay-of-land for Session Search. In Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’13). ACM, 1097--1098. DOI:http://dx.doi.org/10.1145/2484028.2484213 Google ScholarDigital Library
- Jiyun Luo, Sicong Zhang, and Hui Yang. 2014. Win-Win Search: Dual-Agent Stochastic Game in Session Search. In Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’14). ACM, 587--596. DOI:http://dx.doi.org/10.1145/2600428.2609629 Google ScholarDigital Library
- Prasanta C. Mahalanobis. 1936. On the Generalised Distance in Statistics. In Proceedings National Institute of Science, India, Vol. 2. 49--55. Retrieved from http://ir.isical.ac.in/dspace/handle/1/1268.Google Scholar
- ODP. 2011. Open Directory Project. Retrieved from http://www.dmoz.org/.Google Scholar
- Patrick Pantel and Marco Pennacchiotti. 2006. Espresso: Leveraging Generic Patterns for Automatically Harvesting Semantic Relations. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics (ACL-44). Association for Computational Linguistics, 113--120. DOI:http://dx.doi.org/10.3115/1220175.1220190 Google ScholarDigital Library
- Deepak Ravichandran and Eduard Hovy. 2002. Learning Surface Text Patterns for a Question Answering System. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (ACL’02). Association for Computational Linguistics, Philadelphia, Pennsylvania, 41--47. http://dx.doi.org/10.3115/1073083.1073092 Google ScholarDigital Library
- Mark Sanderson and Bruce Croft. 1999. Deriving Concept Hierarchies from Text. In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’99). ACM, Berkeley, California, USA, 206--213. DOI:http://dx.doi.org/10.1145/312624.312679 Google ScholarDigital Library
- Mark Sanderson and Dawn Lawrie. 2000. Building, Testing, and Applying Concept Hierarchies. Kluwer Academic Publishers. 235--256 pages.Google Scholar
- Rion Snow, Daniel Jurafsky, and Andrew Y. Ng. 2005. Learning Syntactic Patterns for Automatic Hypernym Discovery. In Proceedings of the 19th Annual Conference on Neural Information Processing Systems (NIPS’05). Vancouver and Whistler, Canada.Google Scholar
- Rion Snow, Daniel Jurafsky, and Andrew Y. Ng. 2006. Semantic Taxonomy Induction from Heterogenous Evidence. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics (ACL-44). Association for Computational Linguistics, 801--808. http://dx.doi.org/10.3115/1220175.1220276 Google ScholarDigital Library
- Emilia Stoica and Marti A. Hearst. 2007. Automating Creation of Hierarchical Faceted Metadata Structures. In Proceedings of the Human Language Technology Conference (NAACL-HLT’07).Google Scholar
- Trevor Strohman, Donald Metzler, Howard Turtle, and W. Bruce Croft. 2005. Indri: A language-model based search engine for complex queries.Google Scholar
- Robert Tibshirani, Guenther Walther, and Trevor Hastie. 2000. Estimating the Number of Clusters in a Dataset via the Gap Statistic. Technical Report 208. Department of Statistics, Standfor University.Google Scholar
- Xuanhui Wang and ChengXiang Zhai. 2007. Learn from Web Search Logs to Organize Search Results. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’07). ACM, Amsterdam, 87--94. DOI:http://dx.doi.org/10.1145/1277741.1277759 Google ScholarDigital Library
- Hui Yang. 2011. Personalized Concept Hierarchy Construction. Ph.D. Dissertation. Carnegie Mellon University. Retrieved from http://www.cs.cmu.edu/∼huiyang/publication/dissertation.pdf. Google ScholarDigital Library
- Hui Yang. 2014. A Fragment-based Similarity Measure for Concept Hierarchies and Ontologies. In Proceedings of the 7th International Workshop on Exploiting Semantic Annotations in Information Retrieval (ESAIR’14). ACM, 41--42. DOI:http://dx.doi.org/10.1145/2663712.2666188 Google ScholarDigital Library
- Hui Yang and Jamie Callan. 2009. A Metric-based Framework for Automatic Taxonomy Induction. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1 (ACL’09). Association for Computational Linguistics, 271--279. Google ScholarDigital Library
- Ka-Ping Yee, Kirsten Swearingen, Kevin Li, and Marti Hearst. 2003. Faceted Metadata for Image Search and Browsing. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI’03). ACM, 401--408. DOI:http://dx.doi.org/10.1145/642611.642681 Google ScholarDigital Library
- Chengxiang Zhai and John Lafferty. 2004. A Study of Smoothing Methods for Language Models Applied to Information Retrieval. ACM Trans. Inf. Syst. 22, 2 (April 2004), 179--214. DOI:http://dx.doi.org/10.1145/984321.984322 Google ScholarDigital Library
- Sicong Zhang, Dongyi Guan, and Hui Yang. 2013. Query Change as Relevance Feedback in Session Search. In Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’13). ACM, 821--824. DOI:http://dx.doi.org/10.1145/2484028.2484171 Google ScholarDigital Library
Index Terms
- Browsing Hierarchy Construction by Minimum Evolution
Recommendations
Topic hierarchy construction for the organization of multi-source user generated contents
SIGIR '13: Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrievalUser generated contents (UGCs) carry a huge amount of high quality information. However, the information overload and diversity of UGC sources limit their potential uses. In this research, we propose a framework to organize information from multiple UGC ...
Hierarchy evolution for improved classification
CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge managementHierarchical classification has been shown to have superior performance than flat classification. It is typically performed on hierarchies created by and for humans rather than for classification performance. As a result, classification based on such ...
Comments