DocumentCode
3466175
Title
A Method for the Construction of a Probabilistic Hierarchical Structure Based on a Statistical Analysis of a Large-scale Corpus
Author
Terai, Asuka ; Liu, Bin ; Nakagawa, Masanori
Author_Institution
Tokyo Inst. of Technol., Tokyo
fYear
2007
fDate
17-19 Sept. 2007
Firstpage
129
Lastpage
136
Abstract
The purpose of this study is to develop a method of constructing a probabilistic hierarchical structure based on a statistical analysis of a Japanese corpus using a combination of Kameya and Sato´s statistical language analysis and Rose´s model. First, the co-occurrence frequencies of adjectives and nouns are calculated from a Japanese corpus based on modification relations. Second, latent classes are extracted from a statistical language analysis of the cooccurrence data. Third, the centroid vectors of the latent classes are calculated from the analysis results and a probabilistic hierarchical structure of the latent classes is constructed by utilizing Rose´s model. Finally, the conditional probabilities of the categories given the latent classes are computed as the association probabilities of the concepts to the categories and the conditional probabilities of the categories given the concepts are computed as the association probabilities of the concepts to the categories.
Keywords
computational linguistics; statistical analysis; Japanese corpus; conditional probabilities; cooccurrence frequencies; large-scale corpus; probabilistic hierarchical structure; statistical language analysis; Costs; Data mining; Frequency; Humans; Information analysis; Information technology; Large-scale systems; Natural languages; Probability; Statistical analysis;
fLanguage
English
Publisher
ieee
Conference_Titel
Semantic Computing, 2007. ICSC 2007. International Conference on
Conference_Location
Irvine, CA
Print_ISBN
978-0-7695-2997-4
Type
conf
DOI
10.1109/ICSC.2007.60
Filename
4338341
Link To Document