DocumentCode
3745837
Title
Genre Classification on German Novels
Author
Lena Hettinger;Martin Becker;Isabella Reger;Fotis Jannidis;Andreas Hotho
fYear
2015
Firstpage
249
Lastpage
253
Abstract
The study of German literature is mostly based on literary canons, i.e., small sets of specifically chosen documents. In particular, the history of novels has been characterized using a set of only 100 to 250 works. In this paper we address the issue of genre classification in the context of a large set of novels using machine learning methods in order to achieve a better understanding of the genre of novels. To this end, we explore how different types of features affect the performance of different classification algorithms. We employ commonly used stylometric features, and evaluate two types of features not yet applied to genre classification, namely topic based features and features based on social network graphs and character interaction. We build features on a data set of close to 1700 novels either written in or translated into German. Even though topics are often considered orthogonal to genres, we find that topic based features in combination with support vector machines achieve the best results. Overall, we successfully apply new feature types for genre classification in the context of novels and give directions for further research in this area.
Keywords
"Feature extraction","Social network services","Context","Web pages","Error analysis","Data mining","Electronic mail"
Publisher
ieee
Conference_Titel
Database and Expert Systems Applications (DEXA), 2015 26th International Workshop on
ISSN
1529-4188
Print_ISBN
978-1-4673-7581-8
Electronic_ISBN
2378-3915
Type
conf
DOI
10.1109/DEXA.2015.62
Filename
7406301
Link To Document