DocumentCode :
3414689
Title :
Authorship Identification for Online Text
Author :
Tan, Richmond Hong Rui ; Tsai, Flora S.
Author_Institution :
Sch. of Electr. & Electron. Eng., Nanyang Technol. Univ., Singapore, Singapore
fYear :
2010
fDate :
20-22 Oct. 2010
Firstpage :
155
Lastpage :
162
Abstract :
Authorship identification for online text such as blogs and e-books is a challenging problem as these documents do not have a considerable amount of content. Therefore, identification is much harder than other documents such as books and reports. The paper investigates the choice of features and classifier accuracy which are suitable for such texts. Syntactic features are found to be good for large data sets, whereas lexical features are good for small data sets. The results can be used to customize and further improve authorship detection techniques according to the characteristics of the writing samples.
Keywords :
data mining; feature extraction; pattern classification; text analysis; authorship detection technique; authorship identification; classifier accuracy; e-books; lexical feature; online text; syntactic feature; writing sample; Accuracy; Blogs; Databases; Feature extraction; Syntactics; Vocabulary; Writing; authorship attribution; authorship detection; authorship identification; blog; classification; data mining;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Cyberworlds (CW), 2010 International Conference on
Conference_Location :
Singapore
Print_ISBN :
978-1-4244-8301-3
Electronic_ISBN :
978-0-7695-4215-7
Type :
conf
DOI :
10.1109/CW.2010.50
Filename :
5656486
Link To Document :
بازگشت