• DocumentCode
    244980
  • Title

    Mining Contentious Documents Using an Unsupervised Topic Model Based Approach

  • Author

    Trabelsi, Amine ; Zaiane, Osmar R.

  • Author_Institution
    Dept. of Comput. Sci., Univ. of Alberta, Edmonton, AB, Canada
  • fYear
    2014
  • fDate
    14-17 Dec. 2014
  • Firstpage
    550
  • Lastpage
    559
  • Abstract
    This work proposes an unsupervised method intended to enhance the quality of opinion mining in contentious text. It presents a Joint Topic Viewpoint (JTV) probabilistic model to analyse the underlying divergent arguing expressions that may be present in a collection of contentious documents. It extends the original Latent Dirichlet Allocation (LDA), which makes it domain and thesaurus-independent, e.g., does not rely on Word Net coverage. The conceived JTV has the potential of automatically carrying the tasks of extracting associated terms denoting an arguing expression, according to the hidden topics it discusses and the embedded viewpoint it voices. Furthermore, JTV´s structure enables the unsupervised grouping of obtained arguing expressions according to their viewpoints, using a constrained clustering approach. Experiments are conducted on three types of contentious documents: polls, online debates and editorials. The qualitative and quantitative analysis of the experimental results show the effectiveness of our model to handle six different contentious issues when compared to a state-of-the-art method. Moreover, the ability to automatically generate distinctive and informative patterns of arguing expressions is demonstrated.
  • Keywords
    data mining; document handling; probability; JTV probabilistic model; LDA; Word Net coverage; arguing expression; contentious document mining; editorials; joint topic viewpoint probabilistic model; latent Dirichlet allocation; online debates; opinion mining; polls; qualitative analysis; quantitative analysis; unsupervised grouping; unsupervised method; unsupervised topic model based approach; Data mining; Data models; Editorials; Government; Insurance; Joints; Medical services;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining (ICDM), 2014 IEEE International Conference on
  • Conference_Location
    Shenzhen
  • ISSN
    1550-4786
  • Print_ISBN
    978-1-4799-4303-6
  • Type

    conf

  • DOI
    10.1109/ICDM.2014.120
  • Filename
    7023372