• DocumentCode
    2872749
  • Title

    A framework for multi-features based Web harmful information identification

  • Author

    Tian, Xiao-Ping ; Geng, Guang-Gang ; Li, Hong-Tao

  • Author_Institution
    Center of Inf. & Network Technol., Beijing Normal Univ., Beijing, China
  • Volume
    11
  • fYear
    2010
  • fDate
    22-24 Oct. 2010
  • Abstract
    In recent years, the spread of harmful information such as pornography, phishing and violence, seriously disturbs the order of the Web, causes a series of adverse effects, and especially affects young people´s physical and mental health. Statistical learning based harmful information detection methods, the current research focus, have shown their superiority for easily adapting to newly developed harmful techniques. Feature selection is one of key factors that influence the development of Web harmful information detection system. This paper will describe a novel framework for recognizing harmful Web pages. In this framework multi-modal features will be extracted and each modal feather shows the different aspect of the spam information. Based on these features, we will give a feature fusion strategy. Considering the distribution of normal and harmful websites, we investigate the use of an ensemble under-sampling classification strategy to exploit the inherent imbalance of labels in this classification problem.
  • Keywords
    Internet; Web sites; classification; computer crime; feature extraction; statistical analysis; Web harmful information identification; World Wide Web; feature fusion strategy; harmful Web pages; harmful Web sites; harmful information detection methods; mental health; multimodal feature extraction; normal Web sites; phishing; physical health; pornography; spam information; statistical learning; under-sampling classification strategy; violence; Data mining; Feature extraction; Internet; Modeling; Training; Unsolicited electronic mail; Web pages;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Application and System Modeling (ICCASM), 2010 International Conference on
  • Conference_Location
    Taiyuan
  • Print_ISBN
    978-1-4244-7235-2
  • Electronic_ISBN
    978-1-4244-7237-6
  • Type

    conf

  • DOI
    10.1109/ICCASM.2010.5623130
  • Filename
    5623130