Indexing and Searching Mathematics in Digital Libraries -- Architecture, Design and Scalability Issues

Warning

This publication doesn't include Faculty of Arts. It includes Faculty of Informatics. Official publication website can be found on muni.cz.
Authors

SOJKA Petr LÍŠKA Martin

Year of publication 2011
Type Article in Proceedings
Conference Intelligent Computer Mathematics Lecture Notes in Computer Science, 2011, Volume 6824/2011
MU Faculty or unit

Faculty of Informatics

Citation
Web DOI
Doi http://dx.doi.org/10.1007/978-3-642-22673-1_16
Field Informatics
Keywords math indexing and retrieval; mathematical digital libraries; information systems; information retrieval; mathematical content search; document ranking of mathematical papers; math text mining; MIaS; WebMIaS
Description This paper surveys approaches and systems for searching mathematical formulae in mathematical corpora and on the web. The design and architecture of our MIaS (Math Indexer and Searcher) system is presented, and our design decisions are discussed in detail. An approach based on Presentation MathML using a similarity of math subformulae is suggested and verified by implementing it as a math-aware search engine based on the state-of-the-art system, Apache Lucene. Scalability issues were checked based on 324,000 real scientific documents from arXiv archive with 112 million mathematical formulae. More than two billions MathML subformulae were indexed using our Solr-compatible Lucene extension.
Related projects:

You are running an old browser version. We recommend updating your browser to its latest version.