• DocumentCode
    1830918
  • Title

    A smoothed Latent Dirichlet Allocation model with application to Business Intelligence

  • Author

    Wei, Zhihua ; Zhao, Rui ; Wang, Ying ; Miao, Duoqian ; Yuan, Wenbo

  • Author_Institution
    Dept. of Comput. Sci. & Technol., Tongji Univ., Shanghai, China
  • Volume
    5
  • fYear
    2011
  • fDate
    13-15 May 2011
  • Firstpage
    42
  • Lastpage
    46
  • Abstract
    As a kind of intelligent component, text classification plays an important role in Business Intelligence (BI) application such as client opinion classification, market feedback analysis and so on. Latent Dirichlet Allocation (LDA) model, which is a kind of excellent text representation model, has been widely used in various document processing applications. However, its performance is affected by the data sparseness problem. Existing smoothing techniques usually make use of statistic theory to assign a uniform distribution to absent words. They don´t concern the real word distribution or distinguish between words. In this paper, a method based on Tolerance Rough Set Theory (TRST) is proposed, which makes use of upper approximation and lower approximation theory in Rough Set to assign different values for absent words in different approximation regions. Theoretically, our algorithms can estimate smoothing value for absent words according to their relation with respect to existing words. Text classification experiments on public corpora have shown that our algorithms greatly improve the performance of LDA model, especially for unbalanced corpus.
  • Keywords
    classification; competitive intelligence; rough set theory; text analysis; business intelligence application; client opinion classification; data sparseness problem; document processing application; intelligent component; market feedback analysis; smoothed latent Dirichlet allocation model; smoothing technique; statistic theory; text classification; text representation model; tolerance rough set theory; Approximation methods; Classification algorithms; Computational modeling; Resource management; Smoothing methods; Text categorization; Vocabulary; Business Intelligence (BI); Latent Dirichlet Allocation (LDA); Smoothing; Text Classification; Tolerance Rough Set;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Business Management and Electronic Information (BMEI), 2011 International Conference on
  • Conference_Location
    Guangzhou
  • Print_ISBN
    978-1-61284-108-3
  • Type

    conf

  • DOI
    10.1109/ICBMEI.2011.5914426
  • Filename
    5914426