DocumentCode :
3627917
Title :
Mathematical Extension of Full Text Search Engine Indexer
Author :
Jozef Misutka;Leo Galambos
Author_Institution :
Department of Software Engineering, Charles University in Prague, Ke Karlovu 3, 121 16 Prague, Czech Republic. Email: jmisutka@gmail.com
fYear :
2008
Firstpage :
1
Lastpage :
6
Abstract :
The world of mathematical knowledge on the WWW has grown enormously. Despite the clear importance of a mathematical search engine this research field had been abandoned until very recently. Although, currently available full text search engines can be used on these documents too, they are deficient in almost all cases. They cannot handle structured mathematical text and mathematical operations. Many problems are the result of the mathematical nature. By applying axioms, equal transformations, and by using different notation each formula can be expressed in numerous ways. Ambiguous searches like "sin" or "a" would return documents containing sine function and the English noun sin or documents containing variable a and indefinite article a. Moreover, mathematical operators and special notation cannot be expressed in their query languages. In this work, we address these issues and present a technique how to index real-world scientific documents containing mathematical notation by exploiting the current state-of-art of full text search engines. Our approach has several advantages over existing solutions. It is primarily intended for documents on the WWW, which are mostly semantically poor, and offers an extensible level of mathematical awareness supporting also similarity searches. Furthermore, it is designed as an extension and therefore any full text search engine can easily adopt it. The experiments over two real-world document sets showed that the performance is highly dependent on several features of the mathematical search engine.
Keywords :
"Search engines","World Wide Web","Indexing","Database languages","Mathematics","Pattern matching","Markup languages","Software engineering","Fault tolerance","HTML"
Publisher :
ieee
Conference_Titel :
Information and Communication Technologies: From Theory to Applications, 2008. ICTTA 2008. 3rd International Conference on
Print_ISBN :
978-1-4244-1751-3
Type :
conf
DOI :
10.1109/ICTTA.2008.4530006
Filename :
4530006
Link To Document :
بازگشت