Title :
Feature extraction and clustering-based retrieval for mathematical formulas
Author :
Ma, Kai ; Hui, Siu Cheung ; Chang, Kuiyu
Author_Institution :
Sch. of Comput. Eng., Nanyang Technol. Univ., Singapore, Singapore
Abstract :
Mathematical formulas or expressions are essential for presenting scientific knowledge in many research documents in academic areas such as physics and mathematics. Searching for related mathematical formulas is an important but challenging problem as formulas contain both structural and semantic information. Such information is hidden inside the mathematical expressions of the formulas. To support effective formula search, it is necessary to extract the structural and semantic features from the mathematical presentation of the formulas faithfully. In this paper, we propose an effective approach for formula feature extraction. To evaluate the proposed approach, the extracted features are tested with three popular clustering algorithms, namely K-means, Self Organizing Map (SOM), and Agglomerative Hierarchical Clustering (AHC), for formula retrieval. The performance of the clustering-based retrieval is measured based on a dataset of 881 formulas and promising results have been achieved.
Keywords :
feature extraction; information retrieval; mathematics computing; pattern clustering; self-organising feature maps; K-mean clustering algorithms; agglomerative hierarchical clustering; clustering-based retrieval; mathematical formulas; self organizing map clustering algorithms; semantic feature extraction; semantic information; structural feature extraction; structural information; Automatic testing; Clustering algorithms; Data mining; Feature extraction; Information retrieval; Knowledge engineering; Mathematics; Organizing; Physics computing; Search engines; clustering; feature extracction; formula search; information retrieval;
Conference_Titel :
Software Engineering and Data Mining (SEDM), 2010 2nd International Conference on
Conference_Location :
Chengdu
Print_ISBN :
978-1-4244-7324-3
Electronic_ISBN :
978-89-88678-22-0