DocumentCode :
3530929
Title :
Part-of-speech histograms for genre classification of text
Author :
Feldman, S. ; Marin, M.A. ; Ostendorf, M. ; Gupta, M.R.
Author_Institution :
Dept. of Electr. Eng., Univ. of Washington, Seattle, WA
fYear :
2009
fDate :
19-24 April 2009
Firstpage :
4781
Lastpage :
4784
Abstract :
This work addresses the problem of classifying the genre of text, which is useful for a variety of language processing problems. We propose statistics of POS histograms as classification features, coupled with a quadratic discriminant classifier. In experiments on six different text and speech genres, we demonstrate enhanced performance compared to standard techniques using word frequency count features and POS trigram features. Experiments on genres that were not seen in training show intuitive overlaps with the training classes.
Keywords :
classification; statistical analysis; text analysis; genre text classification; natural language processing problem; part-of-speech histogram; quadratic discriminant classifier; Ethanol; Frequency; Histograms; Natural language processing; Natural languages; Speech enhancement; Speech recognition; Statistics; Testing; Text categorization; genre; text classification; web-filtering;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing, 2009. ICASSP 2009. IEEE International Conference on
Conference_Location :
Taipei
ISSN :
1520-6149
Print_ISBN :
978-1-4244-2353-8
Electronic_ISBN :
1520-6149
Type :
conf
DOI :
10.1109/ICASSP.2009.4960700
Filename :
4960700
Link To Document :
بازگشت