DocumentCode
2422656
Title
A Comparison of Stylometric and Lexical Features for Web Genre Classification and Emotion Classification in Blogs
Author
Lex, Elisabeth ; Juffinger, Andreas ; Granitzer, Michael
Author_Institution
Know-Center GmbH, Graz, Austria
fYear
2010
fDate
Aug. 30 2010-Sept. 3 2010
Firstpage
10
Lastpage
14
Abstract
In the blogosphere, the amount of digital content is expanding and for search engines, new challenges have been imposed. Due to the changing information need, automatic methods are needed to support blog search users to filter information by different facets. In our work, we aim to support blog search with genre and facet information. Since we focus on the news genre, our approach is to classify blogs into news versus rest. Also, we assess the emotionality facet in news related blogs to enable users to identify people´s feelings towards specific events. Our approach is to evaluate the performance of text classifiers with lexical and stylometric features to determine the best performing combination for our tasks. Our experiments on a subset of the TREC Blogs08 dataset reveal that classifiers trained on lexical features perform consistently better than classifiers trained on the best stylometric features.
Keywords
Internet; Web sites; classification; data mining; information retrieval; search engines; text analysis; TREC Blogs08 dataset; Web genre classification; blog search; blogosphere; data mining; document classification; emotion classification; emotionality facet; lexical features; news genre; search engines; stylometric features; text classifiers; Accuracy; Blogs; Classification algorithms; Feature extraction; Mutual information; Support vector machines; Training; Data Mining; Document Classification; Features;
fLanguage
English
Publisher
ieee
Conference_Titel
Database and Expert Systems Applications (DEXA), 2010 Workshop on
Conference_Location
Bilbao
ISSN
1529-4188
Print_ISBN
978-1-4244-8049-4
Type
conf
DOI
10.1109/DEXA.2010.24
Filename
5591976
Link To Document