• DocumentCode
    2539470
  • Title

    A Keyword Based Strategy for Spam Topic Discovery from the Internet

  • Author

    Qiu, Yongqin ; Xu, Yan ; Li, Dan ; Li, Hengxun

  • Author_Institution
    Beijing Language & Culture Univ., Beijing, China
  • fYear
    2010
  • fDate
    13-15 Dec. 2010
  • Firstpage
    260
  • Lastpage
    263
  • Abstract
    The increasing volume of spam has become a serious threat not only to the Internet, but also to the society. However, it´s a great challenge to discover the spam from the Internet effectively and efficiently. Content-based filtering is one of the mainstream methods to solve the problem. This paper proposed a content based spam topic detection strategy through keyword extraction. In particular, spam topic is detected by using the topic model of multiple features with the keywords of clues, which integrate the corresponding feature of News, BBS and Blog. We get the min cost of 0.282 through TDT4 evaluating corpus and the satisfaction of 93.3% through the golaxy public opinion monitoring system of ICT, which is more effective than traditional method. The Experiments show that this algorithm is effective for spam topic detection.
  • Keywords
    Internet; information filtering; unsolicited e-mail; word processing; Internet; content based spam topic detection; content-based filtering; golaxy public opinion monitoring system; keyword extraction; keywords based spam topic detection; spam topic discovery; Data mining; Feature extraction; Filtering; Information services; Internet; Unsolicited electronic mail; Web sites; anti-spam; information filtering; information sucurity; keywords extraction; spam topic detection;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Genetic and Evolutionary Computing (ICGEC), 2010 Fourth International Conference on
  • Conference_Location
    Shenzhen
  • Print_ISBN
    978-1-4244-8891-9
  • Electronic_ISBN
    978-0-7695-4281-2
  • Type

    conf

  • DOI
    10.1109/ICGEC.2010.71
  • Filename
    5715419