• DocumentCode
    1832106
  • Title

    Author attribution on streaming data

  • Author

    Seker, Sadi Evren ; Al-Naami, Khaled ; Khan, Latifur

  • Author_Institution
    Comput. Sci. Dept., Univ. of Texas at Dallas, Dallas, TX, USA
  • fYear
    2013
  • fDate
    14-16 Aug. 2013
  • Firstpage
    497
  • Lastpage
    503
  • Abstract
    The concept of novel authors occurring in streaming data source, such as evolving social media, is an unaddressed problem up until now. Existing author attribution techniques deals with the datasets, where the total number of authors do not change in the training or the testing time of the classifiers. This study focuses on the question, “what happens if new authors are added into the system by time?”. Moreover in this study we are also dealing with the problems that some of the authors may not stay and may disappear by time or may reappear after a while. In this study stream mining approaches are proposed to solve the problem. The test scenarios are created over the existing IMDB62 data set, which is widely used by author attribution algorithms already. We used our own shuffling algorithms to create the effect of novel authors. Also before the stream mining, POS tagging approaches and the TF-IDF methods are applied for the feature extraction. And we have applied bi-tag approach where two consecutive tags are considered as a new feature in our approach. By the help of novel techniques, first time proposed in this paper, the success rate has been increased from 35% to 61% for the authorship attribution on streaming text data.
  • Keywords
    data mining; text analysis; IMDB62 data set; POS tagging approaches; TF-IDF methods; author attribution algorithms; authorship attribution; bi-tag approach; feature extraction; shuffling algorithms; stream mining; streaming data source; streaming text data; Data mining; Databases; Feature extraction; Motion pictures; Natural language processing; Tagging; Writing; POS Tagging; author recognition; authorship attribution; big data; data mining; natural language processing; text mining;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Reuse and Integration (IRI), 2013 IEEE 14th International Conference on
  • Conference_Location
    San Francisco, CA
  • Type

    conf

  • DOI
    10.1109/IRI.2013.6642511
  • Filename
    6642511