DocumentCode
2483983
Title
Combining content and structure similarity for XML document classification using composite SVM kernels
Author
Ghosh, Saptarshi ; Mitra, Pabitra
Author_Institution
Comput. Sci. & Eng., IIT Kharagpur, Kharagpur
fYear
2008
fDate
8-11 Dec. 2008
Firstpage
1
Lastpage
4
Abstract
Combination of structure and content features is necessary for effective retrieval and classification of XML documents. Composite kernels provide a way for fusion of content and structure information. In this paper, we demonstrate that a linear combination of simple and low cost kernels such as cosine similarity on terms and selective paths provide a good classification performance. We also propose a corpus-driven entropy-based heuristic for determining the optimal combination weights. Classification experiments performed on the INEX 1.3 XML corpus, demonstrate that the composite kernel classifier achieves significantly better performance as compared to complex and time consuming approaches.
Keywords
XML; classification; entropy; information retrieval; support vector machines; INEX 1.3 XML corpus; XML document classification; XML document retrieval; composite SVM kernel classifier; corpus-driven entropy-based heuristic; Classification tree analysis; Content based retrieval; Fourier transforms; HTML; Indexing; Information retrieval; Kernel; Support vector machine classification; Support vector machines; XML;
fLanguage
English
Publisher
ieee
Conference_Titel
Pattern Recognition, 2008. ICPR 2008. 19th International Conference on
Conference_Location
Tampa, FL
ISSN
1051-4651
Print_ISBN
978-1-4244-2174-9
Electronic_ISBN
1051-4651
Type
conf
DOI
10.1109/ICPR.2008.4761539
Filename
4761539
Link To Document