Title :
Bayesian nonparametric modeling of hierarchical topics and sentences
Author :
Chang, Ying-Lan ; Hung, Jui-Jung ; Chien, Jen-Tzung
Author_Institution :
Dept. of Comput. Sci. & Inf. Eng., Cheng Kung Univ., Tainan, Taiwan
Abstract :
Automatically scoring the sentences of multiple documents plays an important role for document summarization. This study presents a new Bayesian nonparametric approach to conduct unsupervised learning of a hierarchical topic and sentence model (HTSM). This HTSM discovers an extended hierarchy in the nested Chinese restaurant process (nCRP) where each sentence is assigned by a hierarchical topic path. A tree structure with distributions ranging from broad topics to precise topics is established. The dependencies among sentences are characterized. The words in different sentences are represented by a shared hierarchical Dirichlet process (HDP). The topic mixtures in word level and sentence level are estimated according to unsupervised nonparametric processes based on HDP and nCRP, respectively. Compared with the nCRP representing a document based on a single path, the proposed HTSM is flexible with a new nCRP where multiple paths are incorporated to generate different sentences of a document. A summarization system is developed to extract semantically-rich sentences from documents. A new Gibbs sampling algorithm is developed to infer the structural parameters of HTSM. In the experiments on DUC corpus, the proposed HTSM outperforms the other methods for document summarization in terms of ROUGE measures.
Keywords :
Bayes methods; document handling; learning (artificial intelligence); natural language processing; tree data structures; Bayesian nonparametric modeling; DUC corpus; Gibbs sampling algorithm; ROUGE measures; document summarization; hierarchical Dirichlet process; hierarchical topic and sentence model; nested Chinese restaurant process; tree structure; unsupervised learning; Approximation algorithms; Bayesian methods; Data models; Graphical models; Resource management; Unsupervised learning; Vocabulary; Bayesian nonparametrics; Topic model; document summarization; unsupervised learning;
Conference_Titel :
Machine Learning for Signal Processing (MLSP), 2011 IEEE International Workshop on
Conference_Location :
Santander
Print_ISBN :
978-1-4577-1621-8
Electronic_ISBN :
1551-2541
DOI :
10.1109/MLSP.2011.6064569