DocumentCode :
2632671
Title :
Sequential Pattern Mining for Chinese E-mail Authorship Identification
Author :
Ma, Jianbin ; Li, Ying ; Teng, Guifa ; Fang Wang ; Zhao, Yang
Author_Institution :
Sch. of Inf. Sci. & Technol., Agric. Univ. of Hebei, Baoding
fYear :
2008
fDate :
18-20 June 2008
Firstpage :
73
Lastpage :
73
Abstract :
With the rapid growth in computer technology and popularization of Internet, e-mail has become one economical and convenient form of communication. But different types of crime and civil action involving e-mail documents appear which do harm to people´s life and social´s stabilization. So the criminal e-mail´s authorship has to be identified automatically for the purpose of computer forensic. To solve the problem, the appropriate feature extraction and selection methods are essential. Unlike English and other IndoEuropean languages, Chinese text does not have a natural delimiter between words. Word segmentation is a major problem in Chinese text processing. So in this paper, sequential pattern feature mining methods were described without word segmentation. The support vector machine algorithm was adopted as classification algorithm. The experiments on limited samples gained satisfying results, which proved that the sequential pattern feature mining methods were effective.
Keywords :
computer crime; feature extraction; pattern classification; support vector machines; text analysis; unsolicited e-mail; word processing; Chinese e-mail authorship identification; Chinese text processing; computer forensic; feature extraction; feature selection; sequential pattern mining; support vector machine algorithm; Data mining; Electronic mail; Feature extraction; Forensics; Information science; Internet; Natural languages; Postal services; Sequences; Support vector machines;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Innovative Computing Information and Control, 2008. ICICIC '08. 3rd International Conference on
Conference_Location :
Dalian, Liaoning
Print_ISBN :
978-0-7695-3161-8
Electronic_ISBN :
978-0-7695-3161-8
Type :
conf
DOI :
10.1109/ICICIC.2008.489
Filename :
4603262
Link To Document :
بازگشت