DocumentCode :
3268912
Title :
An Automatic Multi-domain Thesauri Construction Method Based on LDA
Author :
Ni, Na ; Liu, Kai ; Li, YaoDong
Author_Institution :
Inst. of Autom., Beijing, China
Volume :
2
fYear :
2011
fDate :
18-21 Dec. 2011
Firstpage :
235
Lastpage :
240
Abstract :
This paper proposed a method for building domain-specific thesauri automatically from plain text corpus based on Latent Dirichlet Allocation (LDA). This method consists of two steps: 1) discovering domain-specific terms from document collections of multiple domains, and 2) learning hierarchical relations between the associated terms of each domain. The novelty of step 1 lies in the utilization of LDA in selecting terms with high predictive probability of a specific domain via latent topics, which overcomes the drawbacks of unigram model. Meanwhile, the hierarchical relations among domain terms are exploited by a novel approach based on word association analysis in step 2. The proposed method is tested on two datasets in different languages. The experimental results show that the terms obtained by this method are intuitively relevant to the reference domain and many term pairs with hierarchical relations are discovered. And the relations reflect the structure of the domain rather well. Compared to other approaches, the proposed one is more accurate in both domain terms mining and hierarchical relation learning tasks.
Keywords :
data mining; learning (artificial intelligence); natural language processing; probability; text analysis; thesauri; LDA; automatic multidomain thesauri construction method; document collections; domain terms mining; hierarchical relation learning tasks; high predictive probability; latent Dirichlet allocation; plain text corpus; unigram model; word association analysis; Electronic publishing; Encyclopedias; Internet; Mathematical model; Semantics; Thesauri; Latent Dirichlet Allocation; domain-specific thesaurus; term hierarchy; word association;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Machine Learning and Applications and Workshops (ICMLA), 2011 10th International Conference on
Conference_Location :
Honolulu, HI
Print_ISBN :
978-1-4577-2134-2
Type :
conf
DOI :
10.1109/ICMLA.2011.28
Filename :
6147680
Link To Document :
بازگشت