DocumentCode
2872749
Title
A framework for multi-features based Web harmful information identification
Author
Tian, Xiao-Ping ; Geng, Guang-Gang ; Li, Hong-Tao
Author_Institution
Center of Inf. & Network Technol., Beijing Normal Univ., Beijing, China
Volume
11
fYear
2010
fDate
22-24 Oct. 2010
Abstract
In recent years, the spread of harmful information such as pornography, phishing and violence, seriously disturbs the order of the Web, causes a series of adverse effects, and especially affects young people´s physical and mental health. Statistical learning based harmful information detection methods, the current research focus, have shown their superiority for easily adapting to newly developed harmful techniques. Feature selection is one of key factors that influence the development of Web harmful information detection system. This paper will describe a novel framework for recognizing harmful Web pages. In this framework multi-modal features will be extracted and each modal feather shows the different aspect of the spam information. Based on these features, we will give a feature fusion strategy. Considering the distribution of normal and harmful websites, we investigate the use of an ensemble under-sampling classification strategy to exploit the inherent imbalance of labels in this classification problem.
Keywords
Internet; Web sites; classification; computer crime; feature extraction; statistical analysis; Web harmful information identification; World Wide Web; feature fusion strategy; harmful Web pages; harmful Web sites; harmful information detection methods; mental health; multimodal feature extraction; normal Web sites; phishing; physical health; pornography; spam information; statistical learning; under-sampling classification strategy; violence; Data mining; Feature extraction; Internet; Modeling; Training; Unsolicited electronic mail; Web pages;
fLanguage
English
Publisher
ieee
Conference_Titel
Computer Application and System Modeling (ICCASM), 2010 International Conference on
Conference_Location
Taiyuan
Print_ISBN
978-1-4244-7235-2
Electronic_ISBN
978-1-4244-7237-6
Type
conf
DOI
10.1109/ICCASM.2010.5623130
Filename
5623130
Link To Document