ABSTRACT
The design and architecture of MIaS (Math Indexer and Searcher), a system for mathematics retrieval is presented, and design decisions are discussed. We argue for an approach based on Presentation MathML using a similarity of math subformulae. The system was implemented as a math-aware search engine based on the state-of-the-art system Apache Lucene.
Scalability issues were checked against more than 400,000 arXiv documents with 158 million mathematical formulae. Almost three billion MathML subformulae were indexed using a Solr-compatible Lucene.
- \c S. Anca. Natural Language and Mathematics Processing for Applicable Theorem Search. Master's thesis, Jacobs University, Bremen, Aug. 2009. https://svn.eecs.jacobs-university.de/svn/eecs/archive/msc-2009/aanca.pdf.Google Scholar
- D. Archambault and V. Moco. Canonical MathML to Simplify Conversion of MathML to Braille Mathematical Notations. In K. Miesenberger, J. Klaus, W. Zagler, and A. Karshmer, editors, Computers Helping People with Special Needs, volume 4061 of Lecture Notes in Computer Science, pages 1191--1198. Springer Berlin / Heidelberg, 2006. http://dx.doi.org/10.1007/11788713_172. Google ScholarDigital Library
- M. Líaka. Vyhledávání v matematickém textu (in Slovak), Searching Mathematical Texts, 2010. Bachelor Thesis, Masaryk University, Brno, Faculty of Informatics (advisor: Petr Sojka), https://is.muni.cz/th/255768/fi_b/?lang=en.Google Scholar
- M. Líaka, P. Sojka, M. R°u~icka, and P. Mravec. Web Interface and Collection for Mathematical Retrieval. In P. Sojka and T. Bouche, editors, Proceedings of DML 2011, pages 77--84, Bertinoro, Italy, July 2011. Masaryk University. http://www.fi.muni.cz/ sojka/dml-2011-program.html.Google Scholar
- J. Miautka and L. Galamboa. Extending Full Text Search Engine for Mathematical Content. In P. Sojka, editor, Proceedings of DML 2008, pages 55--67, Birmingham, UK, July 2008. Masaryk University. http://dml.cz/dmlcz/702546.Google Scholar
- R. Munavalli and R. Miner. MathFind: A Math-Aware Search Engine. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, SIGIR,'06, pages 735--735, New York, NY, USA, 2006. ACM. http://doi.acm.org/10.1145/1148170.1148348. Google ScholarDigital Library
- P. Sojka and M. Líaka. Indexing and Searching Mathematics in Digital Libraries -- Architecture, Design and Scalability Issues. In J. H. Davenport, W.M. Farmer, J. Urban and F. Rabe, editors, Proceedings of CICM Conference 2011 (Calculemus/MKM), volume 6824 of Lecture Notes in Artificial Intelligence, LNAI, pages 228--243, Berlin, Germany, July 2011. Springer\discretionary-Verlag. http://dx.doi.org/10.1007/978-3-642-22673-1_16. Google ScholarDigital Library
- H. Stamerjohanns, M. Kohlhase, D. Ginev, C. David, and B. Miller. Transforming Large Collections of Scientific Publications to XML. Mathematics in Computer Science, 3:299--307, 2010. http://dx.doi.org/10.1007/s11786-010-0024-7.Google Scholar
- W. Sylwestrzak, J. Borbinha, T. Bouche, A. Nowinski, and P. Sojka. EuDML--Towards the European Digital Mathematics Library. In P. Sojka, editor, Proceedings of DML 2010, pages 11--24, Paris, France, July 2010. Masaryk University. http://dml.cz/dmlcz/702569.Google Scholar
Index Terms
- The art of mathematics retrieval
Recommendations
A mathematics retrieval system for formulae in layout presentations
SIGIR '14: Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrievalThe semantics of mathematical formulae depend on their spatial structure, and they usually exist in layout presentations such as PDF, LaTeX, and Presentation MathML, which challenges previous text index and retrieval methods. This paper proposes an ...
Exploiting semantic annotations in math information retrieval
ESAIR '12: Proceedings of the fifth workshop on Exploiting semantic annotations in information retrievalThis paper describes exploitation of semantic annotations in the design and architecture of MIaS (Math Indexer and Searcher) system for mathematics retrieval. Basing on the claim that navigational and research search are `killer' applications for ...
Indexing and searching mathematics in digital libraries: architecture, design and scalability issues
MKM'11: Proceedings of the 18th Calculemus and 10th international conference on Intelligent computer mathematicsThis paper surveys approaches and systems for searching mathematical formulae in mathematical corpora and on the web. The design and architecture of our MIaS (Math Indexer and Searcher) system is presented, and our design decisions are discussed in ...
Comments