• Title of article

    Stylometric analyses using Dirichlet process mixture models

  • Author/Authors

    Gill، نويسنده , , Paramjit S. and Swartz، نويسنده , , Tim B.، نويسنده ,

  • Issue Information
    روزنامه با شماره پیاپی سال 2011
  • Pages
    10
  • From page
    3665
  • To page
    3674
  • Abstract
    Stylometry refers to the statistical analysis of literary style of authors based on the characteristics of expression in their writings. We propose an approach to stylometry based on a Bayesian Dirichlet process mixture model using multinomial word frequency data. The parameters of the multinomial distribution of word frequency data are the “word prints” of the author. Our approach is based on model-based clustering of the vectors of probability values of the multinomial distribution. The resultant clusters identify different writing styles that assist in author attribution for disputed works in a corpus. As a test case, the methodology is applied to the problem of authorship attribution involving the Federalist papers. Our results are consistent with previous stylometric analyses of these papers.
  • Keywords
    Clustering , Dirichlet process priors , Multinomial distribution , Federalist papers , Bayesian methods , disputed authorship , Computational Linguistics
  • Journal title
    Journal of Statistical Planning and Inference
  • Serial Year
    2011
  • Journal title
    Journal of Statistical Planning and Inference
  • Record number

    2221646