• DocumentCode
    2515057
  • Title

    Authorship attribution for Chinese text based on sentence rhythm features

  • Author

    Wang, Shaokang ; Yan, Baoping

  • Author_Institution
    Comput. Network Inf. Center, Chinese Acad. of Sci., Beijing, China
  • fYear
    2010
  • fDate
    28-30 Nov. 2010
  • Firstpage
    61
  • Lastpage
    64
  • Abstract
    Authorship attribution, i.e., identifying the authorship of a piece of disputed text, is an important problem due to the increased concerns on copyright violations. While various authorship attribution algorithms have been proposed to identify the authorship of articles, they fail in several situations. This paper proposes a new authorship attribution algorithm for Chinese text using the sentence rhythm features of articles. In our algorithm, a rhythm feature matrix is proposed to depict the sentence rhythm of Chinese text. In order to determine the similarity of rhythm feature matrices, we compare two definitions of similarity based on Euclidean distance and improved Kullback-Leibler Divergence, respectively. Experimental results show that our algorithm achieves a success rate of 80%.
  • Keywords
    copyright; literature; text analysis; Chinese text; Euclidean distance; Kullback-Leibler divergence; authorship attribution; authorship attribution algorithm; copyright violations; rhythm feature matrix; sentence rhythm features; Algorithm design and analysis; Databases; Measurement; Probability distribution; Rhythm; Software algorithms; Writing; authorship attribution; multi-dimensional matrix; rhythm feature; text similarity;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Computing and Telecommunications (YC-ICT), 2010 IEEE Youth Conference on
  • Conference_Location
    Beijing
  • Print_ISBN
    978-1-4244-8883-4
  • Type

    conf

  • DOI
    10.1109/YCICT.2010.5713152
  • Filename
    5713152