• DocumentCode
    3525706
  • Title

    Modeling Both Coarse-Grained and Fine-Grained Topics in Massive Text Data

  • Author

    Weifan Zhang ; Hui Zhang ; Yuan Zuo ; Deqing Wang

  • Author_Institution
    Sch. of Comput. Sci., Beihang Univ., Beijing, China
  • fYear
    2015
  • fDate
    March 30 2015-April 2 2015
  • Firstpage
    378
  • Lastpage
    383
  • Abstract
    Topic model has attracted much attention from investigators, as it provides users with insights into the huge volumes of documents. However, most previous related studies that based on Non-negative Matrix Factorization (NMF) neglect to figure out which topics are widespread in the documents and which are not. These widespread topics, which we refer to coarse-grained topics, have great significance for people who concentrate on common topics in a given text set. For example, after reading the massive job ads, the jobseekers are eager to learn employers´ basic requirements which can be regarded as the coarse-grained topics, as well as the additional requirements that can be deemed to be the fine-grained topics. In this paper, we propose a novel method which applies two different sparseness constraints to NMF to tell coarse-grained topics and fine-grained topics apart. The experimental results of demonstrate that the new model can not only discover coarse-grained topics but also extract fine-grained topics. We evaluate the performance of the new model via text clustering and classification, and the results show the new model can learn more accurate topic representations of documents.
  • Keywords
    matrix decomposition; pattern classification; pattern clustering; text analysis; NMF; coarse-grained topic modeling; coarse-grained topics; document topic representations; fine-grained topic modeling; massive text data; nonnegative matrix factorization; performance evaluation; text classification; text clustering; Computers; Electronic publishing; Encyclopedias; Internet; Matrix decomposition; Optimization; non-negative matrix factorization; text clustering; text mining; topic model;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Big Data Computing Service and Applications (BigDataService), 2015 IEEE First International Conference on
  • Conference_Location
    Redwood City, CA
  • Type

    conf

  • DOI
    10.1109/BigDataService.2015.21
  • Filename
    7184905