• DocumentCode
    3745837
  • Title

    Genre Classification on German Novels

  • Author

    Lena Hettinger;Martin Becker;Isabella Reger;Fotis Jannidis;Andreas Hotho

  • fYear
    2015
  • Firstpage
    249
  • Lastpage
    253
  • Abstract
    The study of German literature is mostly based on literary canons, i.e., small sets of specifically chosen documents. In particular, the history of novels has been characterized using a set of only 100 to 250 works. In this paper we address the issue of genre classification in the context of a large set of novels using machine learning methods in order to achieve a better understanding of the genre of novels. To this end, we explore how different types of features affect the performance of different classification algorithms. We employ commonly used stylometric features, and evaluate two types of features not yet applied to genre classification, namely topic based features and features based on social network graphs and character interaction. We build features on a data set of close to 1700 novels either written in or translated into German. Even though topics are often considered orthogonal to genres, we find that topic based features in combination with support vector machines achieve the best results. Overall, we successfully apply new feature types for genre classification in the context of novels and give directions for further research in this area.
  • Keywords
    "Feature extraction","Social network services","Context","Web pages","Error analysis","Data mining","Electronic mail"
  • Publisher
    ieee
  • Conference_Titel
    Database and Expert Systems Applications (DEXA), 2015 26th International Workshop on
  • ISSN
    1529-4188
  • Print_ISBN
    978-1-4673-7581-8
  • Electronic_ISBN
    2378-3915
  • Type

    conf

  • DOI
    10.1109/DEXA.2015.62
  • Filename
    7406301