• DocumentCode
    3414689
  • Title

    Authorship Identification for Online Text

  • Author

    Tan, Richmond Hong Rui ; Tsai, Flora S.

  • Author_Institution
    Sch. of Electr. & Electron. Eng., Nanyang Technol. Univ., Singapore, Singapore
  • fYear
    2010
  • fDate
    20-22 Oct. 2010
  • Firstpage
    155
  • Lastpage
    162
  • Abstract
    Authorship identification for online text such as blogs and e-books is a challenging problem as these documents do not have a considerable amount of content. Therefore, identification is much harder than other documents such as books and reports. The paper investigates the choice of features and classifier accuracy which are suitable for such texts. Syntactic features are found to be good for large data sets, whereas lexical features are good for small data sets. The results can be used to customize and further improve authorship detection techniques according to the characteristics of the writing samples.
  • Keywords
    data mining; feature extraction; pattern classification; text analysis; authorship detection technique; authorship identification; classifier accuracy; e-books; lexical feature; online text; syntactic feature; writing sample; Accuracy; Blogs; Databases; Feature extraction; Syntactics; Vocabulary; Writing; authorship attribution; authorship detection; authorship identification; blog; classification; data mining;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Cyberworlds (CW), 2010 International Conference on
  • Conference_Location
    Singapore
  • Print_ISBN
    978-1-4244-8301-3
  • Electronic_ISBN
    978-0-7695-4215-7
  • Type

    conf

  • DOI
    10.1109/CW.2010.50
  • Filename
    5656486