Title :
Applying authorship analysis to extremist-group Web forum messages
Author :
Abbasi, Ahmed ; Chen, Hsinchun
Author_Institution :
Dept. of Manage. Inf. Syst., Arizona Univ., Tucson, AZ, USA
Abstract :
The speed, ubiquity, and potential anonymity of Internet media - email, Web sites, and Internet forums - make them ideal communication channels for militant groups and terrorist organizations. Analyzing Web content has therefore become increasingly important to the intelligence and security agencies that monitor these groups. Authorship analysis can assist this activity by automatically extracting linguistic features from online messages and evaluating stylistic details for patterns of terrorist communication. However, authorship analysis techniques are rooted in work with literary texts, which differ significantly from online communication. To explore these problems, we modified an existing framework for analyzing online authorship and applied it to Arabic and English Web forum messages associated with known extremist groups. We developed a special multilingual model - the set of algorithms and related features - to identify Arabic messages, gearing this model toward the language´s unique characteristics. Furthermore, we incorporated a complex message extraction component to allow the use of a more comprehensive set of features tailored specifically toward online messages. Evaluating the linguistic features of Web messages and comparing them to known writing styles offers the intelligence community a tool for identifying patterns of terrorist communication.
Keywords :
Internet; authoring systems; feature extraction; linguistics; natural languages; social aspects of automation; terrorism; Arabic Web forum messages; English Web forum messages; Internet forums; Web content analysis; Web sites; automatic linguistic feature extraction; email; extremist group Web forum messages; message extraction component; multilingual model; online authorship analysis; pattern identification tool; terrorist communication; Algorithm design and analysis; Communication channels; Discussion forums; Feature extraction; Monitoring; Pattern analysis; Security; Terrorism; Vocabulary; Writing; Web content analysis; Web forum postings; Web mining; authorship analysis; multilingual; security; text analysis;
Journal_Title :
Intelligent Systems, IEEE
DOI :
10.1109/MIS.2005.81