DocumentCode :
2483983
Title :
Combining content and structure similarity for XML document classification using composite SVM kernels
Author :
Ghosh, Saptarshi ; Mitra, Pabitra
Author_Institution :
Comput. Sci. & Eng., IIT Kharagpur, Kharagpur
fYear :
2008
fDate :
8-11 Dec. 2008
Firstpage :
1
Lastpage :
4
Abstract :
Combination of structure and content features is necessary for effective retrieval and classification of XML documents. Composite kernels provide a way for fusion of content and structure information. In this paper, we demonstrate that a linear combination of simple and low cost kernels such as cosine similarity on terms and selective paths provide a good classification performance. We also propose a corpus-driven entropy-based heuristic for determining the optimal combination weights. Classification experiments performed on the INEX 1.3 XML corpus, demonstrate that the composite kernel classifier achieves significantly better performance as compared to complex and time consuming approaches.
Keywords :
XML; classification; entropy; information retrieval; support vector machines; INEX 1.3 XML corpus; XML document classification; XML document retrieval; composite SVM kernel classifier; corpus-driven entropy-based heuristic; Classification tree analysis; Content based retrieval; Fourier transforms; HTML; Indexing; Information retrieval; Kernel; Support vector machine classification; Support vector machines; XML;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Pattern Recognition, 2008. ICPR 2008. 19th International Conference on
Conference_Location :
Tampa, FL
ISSN :
1051-4651
Print_ISBN :
978-1-4244-2174-9
Electronic_ISBN :
1051-4651
Type :
conf
DOI :
10.1109/ICPR.2008.4761539
Filename :
4761539
Link To Document :
بازگشت