DocumentCode
1830918
Title
A smoothed Latent Dirichlet Allocation model with application to Business Intelligence
Author
Wei, Zhihua ; Zhao, Rui ; Wang, Ying ; Miao, Duoqian ; Yuan, Wenbo
Author_Institution
Dept. of Comput. Sci. & Technol., Tongji Univ., Shanghai, China
Volume
5
fYear
2011
fDate
13-15 May 2011
Firstpage
42
Lastpage
46
Abstract
As a kind of intelligent component, text classification plays an important role in Business Intelligence (BI) application such as client opinion classification, market feedback analysis and so on. Latent Dirichlet Allocation (LDA) model, which is a kind of excellent text representation model, has been widely used in various document processing applications. However, its performance is affected by the data sparseness problem. Existing smoothing techniques usually make use of statistic theory to assign a uniform distribution to absent words. They don´t concern the real word distribution or distinguish between words. In this paper, a method based on Tolerance Rough Set Theory (TRST) is proposed, which makes use of upper approximation and lower approximation theory in Rough Set to assign different values for absent words in different approximation regions. Theoretically, our algorithms can estimate smoothing value for absent words according to their relation with respect to existing words. Text classification experiments on public corpora have shown that our algorithms greatly improve the performance of LDA model, especially for unbalanced corpus.
Keywords
classification; competitive intelligence; rough set theory; text analysis; business intelligence application; client opinion classification; data sparseness problem; document processing application; intelligent component; market feedback analysis; smoothed latent Dirichlet allocation model; smoothing technique; statistic theory; text classification; text representation model; tolerance rough set theory; Approximation methods; Classification algorithms; Computational modeling; Resource management; Smoothing methods; Text categorization; Vocabulary; Business Intelligence (BI); Latent Dirichlet Allocation (LDA); Smoothing; Text Classification; Tolerance Rough Set;
fLanguage
English
Publisher
ieee
Conference_Titel
Business Management and Electronic Information (BMEI), 2011 International Conference on
Conference_Location
Guangzhou
Print_ISBN
978-1-61284-108-3
Type
conf
DOI
10.1109/ICBMEI.2011.5914426
Filename
5914426
Link To Document