DocumentCode :
3278291
Title :
Text classification using word sequence kernel methods
Author :
Trindade, Luis A. ; Wang, Hui ; Blackburn, William ; Rooney, Niall
Author_Institution :
Fac. of Comput. & Eng., Univ. of Ulster, Newtownabbey, UK
Volume :
4
fYear :
2011
fDate :
10-13 July 2011
Firstpage :
1532
Lastpage :
1537
Abstract :
This paper presents a comparison study of two sequence kernels for text classification, namely, all common subsequences and sequence kernel. We consider some variations of the two kernels - kernels based on individual features, linear combination of individual kernels and kernels with a factored representation of features - and evaluate them in text classification by employing them as similarity functions in a support vector machine. A sentence is represented as a sequence of words along with their lemma and part-of-speech tags. Experiments show that sequence kernel has a clear advantage over all common subsequences. Since the main difference between the two kernels lies in the fact that the frequency of words (objects) is considered in sequence kernel but not in all common subsequences, we conclude that the frequency of words is an important factor in the successful application of kernels to text classification.
Keywords :
natural language processing; pattern classification; support vector machines; text analysis; SVM; factored representation; linear combination; part-of-speech tags; support vector machine; text classification; word sequence; word sequence kernel methods; Accuracy; Cybernetics; Kernel; Machine learning; Motion pictures; Support vector machines; Text categorization; Information retrieval; Kernel methods; Machine learning; Natural language processing; Text classification;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Machine Learning and Cybernetics (ICMLC), 2011 International Conference on
Conference_Location :
Guilin
ISSN :
2160-133X
Print_ISBN :
978-1-4577-0305-8
Type :
conf
DOI :
10.1109/ICMLC.2011.6016983
Filename :
6016983
Link To Document :
بازگشت