DocumentCode :
3078848
Title :
Part-of-speech enhanced context recognition
Author :
Madsen, Rasmus Elsborg ; Larsen, Jan ; Hansen, Lars Kai
Author_Institution :
Dept. of Mathematical Modeling, Tech. Univ. Denmark, Lyngby
fYear :
2004
fDate :
Sept. 29 2004-Oct. 1 2004
Firstpage :
635
Lastpage :
643
Abstract :
Language independent ´bag-of-words´ representations are surprisingly effective for text classification. In this communication our aim is to elucidate the synergy between language independent features and simple language model features. We consider term tag features estimated by a so-called part-of-speech tagger. The feature sets are combined in an early binding design with an optimized binding coefficient that allows weighting of the relative variance contributions of the participating feature sets. With the combined features documents classified using a latent semantic indexing representation and a probabilistic neural network classifier. Three medium size data-sets are analyzed and we find consistent synergy between the term and natural language features in all three sets for a range of training set sizes. The most significant enhancement is found for small text databases where high recognition rates are possible
Keywords :
neural nets; optimisation; text analysis; language independent feature; language model feature; latent semantic indexing representation; natural language feature; optimized binding coefficient; part-of-speech enhanced context recognition; part-of-speech tagger; probabilistic neural network classifier; text classification; text database; Electronic mail; Humans; Indexing; Internet; Large scale integration; Learning systems; Machine learning; Mathematical model; Spatial databases; Text categorization;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Machine Learning for Signal Processing, 2004. Proceedings of the 2004 14th IEEE Signal Processing Society Workshop
Conference_Location :
Sao Luis
ISSN :
1551-2541
Print_ISBN :
0-7803-8608-4
Type :
conf
DOI :
10.1109/MLSP.2004.1423027
Filename :
1423027
Link To Document :
بازگشت