skip to main content
research-article

Browsing Hierarchy Construction by Minimum Evolution

Authors Info & Claims
Published:23 March 2015Publication History
Skip Abstract Section

Abstract

Hierarchies serve as browsing tools to access information in document collections. This article explores techniques to derive browsing hierarchies that can be used as an information map for task-based search. It proposes a novel minimum-evolution hierarchy construction framework that directly learns semantic distances from training data and from users to construct hierarchies. The aim is to produce globally optimized hierarchical structures by incorporating user-generated task specifications into the general learning framework. Both an automatic version of the framework and an interactive version are presented. A comparison with state-of-the-art systems and a user study jointly demonstrate that the proposed framework is highly effective.

References

  1. Ibrahim Adepoju Adeyanju, Dawei Song, M-Dyaa Albakour, Udo Kruschwitz, Anne De Roeck, and Maria Fasli. 2012. Adaptation of the Concept Hierarchy Model with Search Logs for Query Recommendation on Intranets. In Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’12). ACM, 5--14. DOI:http://dx.doi.org/10.1145/2348283.2348288 Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Leif Azzopardi. 2014. Modelling Interaction with Economic Models of Search. In Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’14). ACM, 3--12. DOI:http://dx.doi.org/10.1145/2600428.2609574 Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Jerome R. Bellegarda, John W. Butzberger, Yen-Lu Chow, Noah B. Coccaro, and Devang Naik. 1996. A Novel Word Clustering Algorithm Based on Latent Semantic Analysis. In Proceedings of the Acoustics, Speech, and Signal Processing, 1996. On Conference Proceedings, 1996 IEEE International Conference - Volume 01 (ICASSP’96). IEEE Computer Society, 172--175. DOI:http://dx.doi.org/10.1109/ICASSP.1996.540318 Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Matthew Berland and Eugene Charniak. 1999. Finding Parts in Very Large Corpora. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics (ACL’99). Association for Computational Linguistics, 57--64. DOI:http://dx.doi.org/10.3115/1034678.1034697 Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Rajendra Bhatia. 2006. Positive Definite Matrices (Princeton Series in Applied Mathematics). Princeton University Press.Google ScholarGoogle Scholar
  6. David Carmel, Haggai Roitman, and Naama Zwerdling. 2009. Enhancing Cluster Labeling Using Wikipedia. In Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’09). ACM, 139--146. DOI:http://dx.doi.org/10.1145/1571941.1571967 Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Claudio Carpineto, Stefano Mizzaro, Giovanni Romano, and Matteo Snidero. 2009. Mobile Information Retrieval with Search Results Clustering: Prototypes and Evaluations. Journal of the American Society of Information Science Technology. 60, 5 (May 2009), 877--895. DOI:http://dx.doi.org/10.1002/asi.v60:5 Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Claudio Carpineto and Giovanni Romano. 2010. Optimal Meta Search Results Clustering. In Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’10). ACM, Geneva, Switzerland, 170--177. DOI:http://dx.doi.org/10.1145/1835449.1835480 Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Barbara S. Chaparro, Veronica D. Hinkle, and Shannon K. Riley. 2008. The Usability of Computerized Card Sorting: A Comparison of Three Applications by Researchers and End Users. Journal of Usability Studies 4, 1 (2008), 31--48.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Philipp Cimiano and Johanna Wenderoth. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics. 888--895.Google ScholarGoogle Scholar
  11. W. Bruce Croft, Donald Metzler, and Trevor Strohman. 2004. Search Engines: Information Retrieval in Practice. Addison Wesley. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Carolyn J. Crouch. 1988. A Cluster-based Approach to Thesaurus Construction. In Proceedings of the 11th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’88). ACM, 309--320. DOI:http://dx.doi.org/10.1145/62437.62467 Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Douglass R. Cutting, David R. Karger, Jan O. Pedersen, and John W. Tukey. 1992. Scatter/Gather: A Cluster-based Approach to Browsing Large Document Collections. In Proceedings of the 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’92). ACM, 318--329. DOI:http://dx.doi.org/10.1145/133160.133214 Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Wisam Dakka, Panagiotis G. Ipeirotis, and Kenneth R. Wood. 2005. Automatic Construction of Multifaceted Browsing Interfaces. In Proceedings of the 14th ACM International Conference on Information and Knowledge Management (CIKM’05). ACM, 768--775. DOI:http://dx.doi.org/10.1145/1099554.1099738 Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Adriel Dean-Hall, Charles L. A. Clarke, Jaap Kamps, Paul Thomas, and Ellen Voorhees. 2012. Overview of the TREC 2012 Contextual Suggestion Track. In Proceedings of the 21st Text REtrieval Conference (TREC’12). NIST.Google ScholarGoogle Scholar
  16. Oren Etzioni, Michael Cafarella, Doug Downey, Ana-Maria Popescu, Tal Shaked, Stephen Soderland, Daniel S. Weld, and Alexander Yates. 2005. Unsupervised Named-entity Extraction from the Web: An Experimental Study. Artificial Inteligence 165, 1 (June 2005), 91--134. DOI:http://dx.doi.org/10.1016/j.artint.2005.03.001 Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Christiane Fellbaum. 1998. WordNet: An Electronic Lexical Database. MIT Press.Google ScholarGoogle Scholar
  18. Roxana Girju, Adriana Badulescu, and Dan Moldovan. 2003. Learning Semantic Constraints for the Automatic Discovery of Part-whole Relations. In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1 (NAACL’03). Association for Computational Linguistics, 1--8. http://dx.doi.org/10.3115/1073445.1073456 Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Dongyi Guan and Hui Yang. 2013. Increasing Stability of Result Organization for Session Search. In Proceedings of the 35th European Conference on Advances in Information Retrieval (ECIR’13). Springer-Verlag, 471--482. DOI:http://dx.doi.org/10.1007/978-3-642-36973-5_40 Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Zelig Harris. 1970. Distributional structure. In Papers in Structural and Transformational Linguistics. D. Reidel Publishing Company, 775--794.Google ScholarGoogle Scholar
  21. Marti A. Hearst. 1992. Automatic Acquisition of Hyponyms from Large Text Corpora. In Proceedings of the 14th Conference on Computational Linguistics - Volume 2 (COLING’92). Association for Computational Linguistics, 539--545. DOI:http://dx.doi.org/10.3115/992133.992154 Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Katja Hofmann, Shimon Whiteson, and Maarten de Rijke. 2011. Balancing Exploration and Exploitation in Learning to Rank Online. In Proceedings of the 33rd European Conference on Advances in Information Retrieval (ECIR’11). Springer-Verlag, 251--263. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Katja Hofmann, Shimon Whiteson, and Maarten Rijke. 2013. Balancing Exploration and Exploitation in Listwise and Pairwise Online Learning to Rank for Information Retrieval. Information Retrieval 16, 1 (Feb. 2013), 63--90. DOI:http://dx.doi.org/10.1007/s10791-012-9197-9 Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Yifen Huang and Tom Mitchell. 2007. A Framework for Mixed-Initiative Clustering. In North East Student Colloquium on Artificial Intelligence (NESCAI’07).Google ScholarGoogle Scholar
  25. Evangelos Kanoulas, Ben Carterette, Mark Hall, Paul Clough, and Mark Sanderson. 2013. Overview of the TREC 2013 Session Track. In Proceedings of the 22nd Text REtrieval Conference (TREC’13). NIST.Google ScholarGoogle Scholar
  26. Weimao Ke, Cassidy R. Sugimoto, and Javed Mostafa. 2009. Dynamicity vs. Effectiveness: Studying Online Clustering for Scatter/Gather. In Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’09). ACM, 19--26. DOI:http://dx.doi.org/10.1145/1571941.1571947 Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Diane Kelly, Amber Cushing, Maureen Dostert, Xi Niu, and Karl Gyllstrom. 2010. Effects of Popularity and Quality on the Usage of Query Suggestions During Information Search. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI’10). ACM, 45--54. DOI:http://dx.doi.org/10.1145/1753326.1753334 Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Andruid Kerne, Eunyee Koh, Vikram Sundaram, and J. Michael Mistrot. 2005. Generative Semantic Clustering in Spatial Hypertext. In Proceedings of the 2005 ACM Symposium on Document Engineering (DocEng’05). ACM, 84--93. DOI:http://dx.doi.org/10.1145/1096601.1096624 Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Zornitsa Kozareva and Eduard Hovy. 2010. A Semi-supervised Method to Learn and Construct Taxonomies Using the Web. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing (EMNLP’10). Association for Computational Linguistics, 1110--1118. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Zornitsa Kozareva, Ellen Riloff, and Eduard H. Hovy. 2008. Semantic Class Learning from the Web with Hyponym Pattern Linkage Graphs. In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics. The Association for Computer Linguistics, 1048--1056.Google ScholarGoogle Scholar
  31. Krishna Kummamuru, Rohit Lotlikar, Shourya Roy, Karan Singal, and Raghu Krishnapuram. 2004. A Hierarchical Monothetic Document Clustering Algorithm for Summarization and Browsing Search Results. In Proceedings of the 13th International Conference on World Wide Web (WWW’04). ACM, 658--665. DOI:http://dx.doi.org/10.1145/988672.988762 Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Dmitry Lagun and Eugene Agichtein. 2011. ViewSer: Enabling Large-scale Remote User Studies of Web Search Examination and Interaction. In Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’11). ACM, 365--374. DOI:http://dx.doi.org/10.1145/2009916.2009967 Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Dawn Lawrie, W. Bruce Croft, and Arnold Rosenberg. 2001. Finding Topic Words for Hierarchical Summarization. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’01). ACM, 349--357. DOI:http://dx.doi.org/10.1145/383952.384022 Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Jiyun Luo, Dongyi Guan, and Hui Yang. 2013. InfoLand: Information Lay-of-land for Session Search. In Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’13). ACM, 1097--1098. DOI:http://dx.doi.org/10.1145/2484028.2484213 Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Jiyun Luo, Sicong Zhang, and Hui Yang. 2014. Win-Win Search: Dual-Agent Stochastic Game in Session Search. In Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’14). ACM, 587--596. DOI:http://dx.doi.org/10.1145/2600428.2609629 Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Prasanta C. Mahalanobis. 1936. On the Generalised Distance in Statistics. In Proceedings National Institute of Science, India, Vol. 2. 49--55. Retrieved from http://ir.isical.ac.in/dspace/handle/1/1268.Google ScholarGoogle Scholar
  37. ODP. 2011. Open Directory Project. Retrieved from http://www.dmoz.org/.Google ScholarGoogle Scholar
  38. Patrick Pantel and Marco Pennacchiotti. 2006. Espresso: Leveraging Generic Patterns for Automatically Harvesting Semantic Relations. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics (ACL-44). Association for Computational Linguistics, 113--120. DOI:http://dx.doi.org/10.3115/1220175.1220190 Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Deepak Ravichandran and Eduard Hovy. 2002. Learning Surface Text Patterns for a Question Answering System. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (ACL’02). Association for Computational Linguistics, Philadelphia, Pennsylvania, 41--47. http://dx.doi.org/10.3115/1073083.1073092 Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Mark Sanderson and Bruce Croft. 1999. Deriving Concept Hierarchies from Text. In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’99). ACM, Berkeley, California, USA, 206--213. DOI:http://dx.doi.org/10.1145/312624.312679 Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Mark Sanderson and Dawn Lawrie. 2000. Building, Testing, and Applying Concept Hierarchies. Kluwer Academic Publishers. 235--256 pages.Google ScholarGoogle Scholar
  42. Rion Snow, Daniel Jurafsky, and Andrew Y. Ng. 2005. Learning Syntactic Patterns for Automatic Hypernym Discovery. In Proceedings of the 19th Annual Conference on Neural Information Processing Systems (NIPS’05). Vancouver and Whistler, Canada.Google ScholarGoogle Scholar
  43. Rion Snow, Daniel Jurafsky, and Andrew Y. Ng. 2006. Semantic Taxonomy Induction from Heterogenous Evidence. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics (ACL-44). Association for Computational Linguistics, 801--808. http://dx.doi.org/10.3115/1220175.1220276 Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Emilia Stoica and Marti A. Hearst. 2007. Automating Creation of Hierarchical Faceted Metadata Structures. In Proceedings of the Human Language Technology Conference (NAACL-HLT’07).Google ScholarGoogle Scholar
  45. Trevor Strohman, Donald Metzler, Howard Turtle, and W. Bruce Croft. 2005. Indri: A language-model based search engine for complex queries.Google ScholarGoogle Scholar
  46. Robert Tibshirani, Guenther Walther, and Trevor Hastie. 2000. Estimating the Number of Clusters in a Dataset via the Gap Statistic. Technical Report 208. Department of Statistics, Standfor University.Google ScholarGoogle Scholar
  47. Xuanhui Wang and ChengXiang Zhai. 2007. Learn from Web Search Logs to Organize Search Results. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’07). ACM, Amsterdam, 87--94. DOI:http://dx.doi.org/10.1145/1277741.1277759 Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Hui Yang. 2011. Personalized Concept Hierarchy Construction. Ph.D. Dissertation. Carnegie Mellon University. Retrieved from http://www.cs.cmu.edu/∼huiyang/publication/dissertation.pdf. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Hui Yang. 2014. A Fragment-based Similarity Measure for Concept Hierarchies and Ontologies. In Proceedings of the 7th International Workshop on Exploiting Semantic Annotations in Information Retrieval (ESAIR’14). ACM, 41--42. DOI:http://dx.doi.org/10.1145/2663712.2666188 Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Hui Yang and Jamie Callan. 2009. A Metric-based Framework for Automatic Taxonomy Induction. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1 (ACL’09). Association for Computational Linguistics, 271--279. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Ka-Ping Yee, Kirsten Swearingen, Kevin Li, and Marti Hearst. 2003. Faceted Metadata for Image Search and Browsing. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI’03). ACM, 401--408. DOI:http://dx.doi.org/10.1145/642611.642681 Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Chengxiang Zhai and John Lafferty. 2004. A Study of Smoothing Methods for Language Models Applied to Information Retrieval. ACM Trans. Inf. Syst. 22, 2 (April 2004), 179--214. DOI:http://dx.doi.org/10.1145/984321.984322 Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Sicong Zhang, Dongyi Guan, and Hui Yang. 2013. Query Change as Relevance Feedback in Session Search. In Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’13). ACM, 821--824. DOI:http://dx.doi.org/10.1145/2484028.2484171 Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Browsing Hierarchy Construction by Minimum Evolution

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Information Systems
      ACM Transactions on Information Systems  Volume 33, Issue 3
      March 2015
      184 pages
      ISSN:1046-8188
      EISSN:1558-2868
      DOI:10.1145/2737814
      Issue’s Table of Contents

      Copyright © 2015 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 23 March 2015
      • Accepted: 1 January 2015
      • Revised: 1 November 2014
      • Received: 1 October 2013
      Published in tois Volume 33, Issue 3

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader