مرکز منطقه ای اطلاع رساني علوم و فناوري - Bayesian feature selection for sparse topic model

DocumentCode :

2131600

Title :

Bayesian feature selection for sparse topic model

Author :

Chang, Ying-Lan ; Lee, Kuen-Feng ; Chien, Jen-Tzung

Author_Institution :

Dept. of Comput. Sci. & Inf. Eng., Cheng Kung Univ., Tainan, Taiwan

fYear :

2011

fDate :

18-21 Sept. 2011

Firstpage :

Lastpage :

Abstract :

This paper presents a new Bayesian sparse learning approach to select salient lexical features and build sparse topic model (sTM). The Bayesian learning is performed by incorporating the spike-and-slab priors so that the words with spiky distributions are filtered and those with slab distributions are selected as features for estimating the topic model (TM) based on latent Dirichlet allocation. The variational inference procedure is developed to train sTM parameters. In the experiments on document modeling using TM and sTM, we find that the proposed sTM does not only reduce the model perplexity but also reduce the memory and computation costs. Bayesian feature selection method does effectively identify the representative topic words for building a sparse learning model.

Keywords :

Bayes methods; belief networks; Bayesian feature selection; Bayesian sparse learning model; latent Dirichlet allocation; model perplexity; salient lexical features; slab distribution; sparse topic model; spiky distribution; variational inference procedure; Bayesian methods; Computational modeling; Machine learning; Slabs; Sparse matrices; Training; Vocabulary; Bayesian learning; feature selection; sparse features; topic model;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Machine Learning for Signal Processing (MLSP), 2011 IEEE International Workshop on

Conference_Location :

Santander

ISSN :

1551-2541

Print_ISBN :

978-1-4577-1621-8

Electronic_ISBN :

1551-2541

Type :

conf

DOI :

10.1109/MLSP.2011.6064568

Filename :

6064568

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2131600