• DocumentCode
    2282877
  • Title

    A Utility-Based Web Content Sensitivity Mining Approach

  • Author

    Wang, Cheng ; Liu, Ying ; Jian, Liheng ; Zhang, Peng

  • Author_Institution
    Agilent Technol. Co. Ltd., Beijing
  • Volume
    3
  • fYear
    2008
  • fDate
    9-12 Dec. 2008
  • Firstpage
    428
  • Lastpage
    431
  • Abstract
    Abnormal remarks on World Wide Web, such as violence, threat, superstition, etc. may disturb the social order and public morality. Most traditional methods filter a page as long as it contains a keyword in a predefined blacklist. Such methods cannot provide a quantitative measure of how sensitive the content is. In this paper, we propose a utility-based Web content sensitivity mining approach. Utility is viewed as the measure of how sensitive a page is. It allows the Internet regulators to take different operations according to different sensitivity values. We apply our approach on a real-world Web dataset. It identified a number of sensitive Web pages that traditional frequency-based methods failed to find. By varying the sensitive values of the keywords, different sets of high sensitivity keywords were discovered.
  • Keywords
    Internet; content management; data mining; utility theory; Internet regulator; sensitive value; utility-based Web content sensitivity mining approach; Data security; Databases; Frequency; Information filters; Intelligent agent; Internet; Itemsets; Monitoring; Regulators; Web pages; Web audit; Web content mining; public opinion monitoring; utility mining;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Web Intelligence and Intelligent Agent Technology, 2008. WI-IAT '08. IEEE/WIC/ACM International Conference on
  • Conference_Location
    Sydney, NSW
  • Print_ISBN
    978-0-7695-3496-1
  • Type

    conf

  • DOI
    10.1109/WIIAT.2008.203
  • Filename
    4740814