DocumentCode
3414689
Title
Authorship Identification for Online Text
Author
Tan, Richmond Hong Rui ; Tsai, Flora S.
Author_Institution
Sch. of Electr. & Electron. Eng., Nanyang Technol. Univ., Singapore, Singapore
fYear
2010
fDate
20-22 Oct. 2010
Firstpage
155
Lastpage
162
Abstract
Authorship identification for online text such as blogs and e-books is a challenging problem as these documents do not have a considerable amount of content. Therefore, identification is much harder than other documents such as books and reports. The paper investigates the choice of features and classifier accuracy which are suitable for such texts. Syntactic features are found to be good for large data sets, whereas lexical features are good for small data sets. The results can be used to customize and further improve authorship detection techniques according to the characteristics of the writing samples.
Keywords
data mining; feature extraction; pattern classification; text analysis; authorship detection technique; authorship identification; classifier accuracy; e-books; lexical feature; online text; syntactic feature; writing sample; Accuracy; Blogs; Databases; Feature extraction; Syntactics; Vocabulary; Writing; authorship attribution; authorship detection; authorship identification; blog; classification; data mining;
fLanguage
English
Publisher
ieee
Conference_Titel
Cyberworlds (CW), 2010 International Conference on
Conference_Location
Singapore
Print_ISBN
978-1-4244-8301-3
Electronic_ISBN
978-0-7695-4215-7
Type
conf
DOI
10.1109/CW.2010.50
Filename
5656486
Link To Document