• DocumentCode
    3055511
  • Title

    Authorship attribution

  • Author

    Bozkurt, Ilker Nadi ; Baghoglu, O. ; Uyar, Erkan

  • Author_Institution
    Bilkent Univ., Ankara
  • fYear
    2007
  • fDate
    7-9 Nov. 2007
  • Firstpage
    1
  • Lastpage
    5
  • Abstract
    Authorship attribution is the process of determining the writer of a document. In literature, there are lots of classification techniques conducted in this process. In this paper we explore information retrieval methods such as tf-Idf structure with support vector machines, parametric and nonparametric methods with supervised and unsupervised (clustering) classification techniques in authorship attribution. We performed various experiments with articles gathered from Turkish newspaper Milliyet. We performed experiments on different features extracted from these texts with different classifiers, and combined these results to improve our success rates. We identified which classifiers give satisfactory results on which feature sets. According to experiments, the success rates dramatically changes with different combinations, however the best among them are support vector classifier with bag of words, and Gaussian with function words.
  • Keywords
    feature extraction; information retrieval; pattern classification; pattern clustering; support vector machines; text analysis; authorship attribution; clustering technique; document writer determination; feature extraction; information retrieval method; nonparametric method; support vector machine; text classification; unsupervised classification technique; Computer science; Data mining; Feature extraction; Information retrieval; Internet; Plagiarism; Support vector machine classification; Support vector machines; Text categorization; Writing; Authorship attribution; classifier feature reationship; feature reduction; parametric nonparametric classifiers; text categorization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer and information sciences, 2007. iscis 2007. 22nd international symposium on
  • Conference_Location
    Ankara
  • Print_ISBN
    978-1-4244-1363-8
  • Electronic_ISBN
    978-1-4244-1364-5
  • Type

    conf

  • DOI
    10.1109/ISCIS.2007.4456854
  • Filename
    4456854