Title of article :
A Framework for Authorship Identification of
Online Messages: Writing-Style Features and
Classification Techniques
Author/Authors :
Rong Zheng، نويسنده , , Jiexun Li، نويسنده , , Hsinchun Chen، نويسنده , , Zan Huang، نويسنده ,
Issue Information :
ماهنامه با شماره پیاپی سال 2006
Abstract :
With the rapid proliferation of Internet technologies and
applications, misuse of online messages for inappropriate
or illegal purposes has become a major concern for
society. The anonymous nature of online-message distribution
makes identity tracing a critical problem. We
developed a framework for authorship identification of
online messages to address the identity-tracing problem.
In this framework, four types of writing-style features
(lexical, syntactic, structural, and content-specific
features) are extracted and inductive learning algorithms
are used to build feature-based classification models to
identify authorship of online messages. To examine this
framework, we conducted experiments on English and
Chinese online-newsgroup messages. We compared the
discriminating power of the four types of features and
of three classification techniques: decision trees, backpropagation
neural networks, and support vector
machines. The experimental results showed that the proposed
approach was able to identify authors of online
messages with satisfactory accuracy of 70 to 95%. All
four types of message features contributed to discriminating
authors of online messages. Support vector
machines outperformed the other two classification
techniques in our experiments. The high performance
we achieved for both the English and Chinese datasets
showed the potential of this approach in a multiplelanguage
context.
Journal title :
Journal of the American Society for Information Science and Technology
Journal title :
Journal of the American Society for Information Science and Technology