• DocumentCode
    2707429
  • Title

    Compression and stylometry for author identification

  • Author

    Pavelec, D. ; Oliveira, L.S. ; Justino, E. ; Neto, F. D Nobre ; Batista, L.V.

  • Author_Institution
    Pontifica Univ. Catolica do Parana, Curitiba, Brazil
  • fYear
    2009
  • fDate
    14-19 June 2009
  • Firstpage
    2445
  • Lastpage
    2450
  • Abstract
    In this paper we compare two different paradigms for author identification. The first one is based on compression algorithms where the entire process of defining and extracting features and training a classifier is avoided. The second paradigm, on the other hand, takes into account the classical pattern recognition framework, where linguistic features proposed by forensic experts are used to train a Support Vector Machine classifier. Comprehensive experiments performed on a database composed of 20 writers show that both strategies achieve similar performance but with an interesting degree of complementarity demonstrated through the confusion matrices. Advantages and drawback of both paradigms are also discussed.
  • Keywords
    classification; data compression; feature extraction; learning (artificial intelligence); support vector machines; author identification; compression algorithm; forensic expert; linguistic feature extraction; pattern recognition; stylometry; support vector machine classifier training; Compression algorithms; Data mining; Feature extraction; Forensics; Frequency; Neural networks; Pattern recognition; Spatial databases; Support vector machine classification; Support vector machines; Author identification; Compression; Stylometry;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Neural Networks, 2009. IJCNN 2009. International Joint Conference on
  • Conference_Location
    Atlanta, GA
  • ISSN
    1098-7576
  • Print_ISBN
    978-1-4244-3548-7
  • Electronic_ISBN
    1098-7576
  • Type

    conf

  • DOI
    10.1109/IJCNN.2009.5178675
  • Filename
    5178675