DocumentCode :
1830918
Title :
A smoothed Latent Dirichlet Allocation model with application to Business Intelligence
Author :
Wei, Zhihua ; Zhao, Rui ; Wang, Ying ; Miao, Duoqian ; Yuan, Wenbo
Author_Institution :
Dept. of Comput. Sci. & Technol., Tongji Univ., Shanghai, China
Volume :
5
fYear :
2011
fDate :
13-15 May 2011
Firstpage :
42
Lastpage :
46
Abstract :
As a kind of intelligent component, text classification plays an important role in Business Intelligence (BI) application such as client opinion classification, market feedback analysis and so on. Latent Dirichlet Allocation (LDA) model, which is a kind of excellent text representation model, has been widely used in various document processing applications. However, its performance is affected by the data sparseness problem. Existing smoothing techniques usually make use of statistic theory to assign a uniform distribution to absent words. They don´t concern the real word distribution or distinguish between words. In this paper, a method based on Tolerance Rough Set Theory (TRST) is proposed, which makes use of upper approximation and lower approximation theory in Rough Set to assign different values for absent words in different approximation regions. Theoretically, our algorithms can estimate smoothing value for absent words according to their relation with respect to existing words. Text classification experiments on public corpora have shown that our algorithms greatly improve the performance of LDA model, especially for unbalanced corpus.
Keywords :
classification; competitive intelligence; rough set theory; text analysis; business intelligence application; client opinion classification; data sparseness problem; document processing application; intelligent component; market feedback analysis; smoothed latent Dirichlet allocation model; smoothing technique; statistic theory; text classification; text representation model; tolerance rough set theory; Approximation methods; Classification algorithms; Computational modeling; Resource management; Smoothing methods; Text categorization; Vocabulary; Business Intelligence (BI); Latent Dirichlet Allocation (LDA); Smoothing; Text Classification; Tolerance Rough Set;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Business Management and Electronic Information (BMEI), 2011 International Conference on
Conference_Location :
Guangzhou
Print_ISBN :
978-1-61284-108-3
Type :
conf
DOI :
10.1109/ICBMEI.2011.5914426
Filename :
5914426
Link To Document :
بازگشت