DocumentCode :
1832106
Title :
Author attribution on streaming data
Author :
Seker, Sadi Evren ; Al-Naami, Khaled ; Khan, Latifur
Author_Institution :
Comput. Sci. Dept., Univ. of Texas at Dallas, Dallas, TX, USA
fYear :
2013
fDate :
14-16 Aug. 2013
Firstpage :
497
Lastpage :
503
Abstract :
The concept of novel authors occurring in streaming data source, such as evolving social media, is an unaddressed problem up until now. Existing author attribution techniques deals with the datasets, where the total number of authors do not change in the training or the testing time of the classifiers. This study focuses on the question, “what happens if new authors are added into the system by time?”. Moreover in this study we are also dealing with the problems that some of the authors may not stay and may disappear by time or may reappear after a while. In this study stream mining approaches are proposed to solve the problem. The test scenarios are created over the existing IMDB62 data set, which is widely used by author attribution algorithms already. We used our own shuffling algorithms to create the effect of novel authors. Also before the stream mining, POS tagging approaches and the TF-IDF methods are applied for the feature extraction. And we have applied bi-tag approach where two consecutive tags are considered as a new feature in our approach. By the help of novel techniques, first time proposed in this paper, the success rate has been increased from 35% to 61% for the authorship attribution on streaming text data.
Keywords :
data mining; text analysis; IMDB62 data set; POS tagging approaches; TF-IDF methods; author attribution algorithms; authorship attribution; bi-tag approach; feature extraction; shuffling algorithms; stream mining; streaming data source; streaming text data; Data mining; Databases; Feature extraction; Motion pictures; Natural language processing; Tagging; Writing; POS Tagging; author recognition; authorship attribution; big data; data mining; natural language processing; text mining;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information Reuse and Integration (IRI), 2013 IEEE 14th International Conference on
Conference_Location :
San Francisco, CA
Type :
conf
DOI :
10.1109/IRI.2013.6642511
Filename :
6642511
Link To Document :
بازگشت