DocumentCode
2632671
Title
Sequential Pattern Mining for Chinese E-mail Authorship Identification
Author
Ma, Jianbin ; Li, Ying ; Teng, Guifa ; Fang Wang ; Zhao, Yang
Author_Institution
Sch. of Inf. Sci. & Technol., Agric. Univ. of Hebei, Baoding
fYear
2008
fDate
18-20 June 2008
Firstpage
73
Lastpage
73
Abstract
With the rapid growth in computer technology and popularization of Internet, e-mail has become one economical and convenient form of communication. But different types of crime and civil action involving e-mail documents appear which do harm to people´s life and social´s stabilization. So the criminal e-mail´s authorship has to be identified automatically for the purpose of computer forensic. To solve the problem, the appropriate feature extraction and selection methods are essential. Unlike English and other IndoEuropean languages, Chinese text does not have a natural delimiter between words. Word segmentation is a major problem in Chinese text processing. So in this paper, sequential pattern feature mining methods were described without word segmentation. The support vector machine algorithm was adopted as classification algorithm. The experiments on limited samples gained satisfying results, which proved that the sequential pattern feature mining methods were effective.
Keywords
computer crime; feature extraction; pattern classification; support vector machines; text analysis; unsolicited e-mail; word processing; Chinese e-mail authorship identification; Chinese text processing; computer forensic; feature extraction; feature selection; sequential pattern mining; support vector machine algorithm; Data mining; Electronic mail; Feature extraction; Forensics; Information science; Internet; Natural languages; Postal services; Sequences; Support vector machines;
fLanguage
English
Publisher
ieee
Conference_Titel
Innovative Computing Information and Control, 2008. ICICIC '08. 3rd International Conference on
Conference_Location
Dalian, Liaoning
Print_ISBN
978-0-7695-3161-8
Electronic_ISBN
978-0-7695-3161-8
Type
conf
DOI
10.1109/ICICIC.2008.489
Filename
4603262
Link To Document