• DocumentCode
    658375
  • Title

    Adaptive Topic Modeling for Detection Objectionable Text

  • Author

    Jianping Zeng ; Jiangjiao Duan ; Chengrong Wu

  • Author_Institution
    Sch. of Comput. Sci., Fudan Univ., Shanghai, China
  • Volume
    1
  • fYear
    2013
  • fDate
    17-20 Nov. 2013
  • Firstpage
    381
  • Lastpage
    388
  • Abstract
    Objectionable text content on the Web is harmful to young children. Although keyword-based methods are superior in achieving faster detection, they fail to detect text content that is semantically objectionable. A novel framework based on adaptive topic modeling is proposed to detect objectionable text content. Firstly, a weighted graph is constructed based on several seed words and a set of training texts. Feature words are then selected from the graph according to the measure which shows how likely a word to be sensitive. Adaptive LDA (Latent Dirichlet Allocation) topic model in which topic number can be automatically estimated is proposed to find the latent objectionable topic structure for the text set. An objectionable topic criterion is devised for the adaptive selection method which takes the objectionable topic characteristic into consideration. Finally, detection for a given text is evaluated based on its probability value with respect to the model. Extensive comparison experiments on real world text sets show that the proposed method can effectively detect objectionable text. The performance is superior to that of keyword-based methods with several different approaches to generate keyword list. Experiments also show that the performance is better than that of detection methods based on traditional topic modeling.
  • Keywords
    graph theory; information retrieval; text analysis; adaptive LDA topic model; adaptive selection method; adaptive topic modeling; keyword-based methods; latent Dirichlet allocation; latent objectionable topic structure; objectionable text detection; objectionable topic criterion; seed words; topic number; weighted graph; Adaptation models; Computational modeling; Feature extraction; Filtering; Mathematical model; Semantics; Training; adaptive topic model; detection; feature selection; objectionable text;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Web Intelligence (WI) and Intelligent Agent Technologies (IAT), 2013 IEEE/WIC/ACM International Joint Conferences on
  • Conference_Location
    Atlanta, GA
  • Print_ISBN
    978-1-4799-2902-3
  • Type

    conf

  • DOI
    10.1109/WI-IAT.2013.54
  • Filename
    6690040