• DocumentCode
    3466175
  • Title

    A Method for the Construction of a Probabilistic Hierarchical Structure Based on a Statistical Analysis of a Large-scale Corpus

  • Author

    Terai, Asuka ; Liu, Bin ; Nakagawa, Masanori

  • Author_Institution
    Tokyo Inst. of Technol., Tokyo
  • fYear
    2007
  • fDate
    17-19 Sept. 2007
  • Firstpage
    129
  • Lastpage
    136
  • Abstract
    The purpose of this study is to develop a method of constructing a probabilistic hierarchical structure based on a statistical analysis of a Japanese corpus using a combination of Kameya and Sato´s statistical language analysis and Rose´s model. First, the co-occurrence frequencies of adjectives and nouns are calculated from a Japanese corpus based on modification relations. Second, latent classes are extracted from a statistical language analysis of the cooccurrence data. Third, the centroid vectors of the latent classes are calculated from the analysis results and a probabilistic hierarchical structure of the latent classes is constructed by utilizing Rose´s model. Finally, the conditional probabilities of the categories given the latent classes are computed as the association probabilities of the concepts to the categories and the conditional probabilities of the categories given the concepts are computed as the association probabilities of the concepts to the categories.
  • Keywords
    computational linguistics; statistical analysis; Japanese corpus; conditional probabilities; cooccurrence frequencies; large-scale corpus; probabilistic hierarchical structure; statistical language analysis; Costs; Data mining; Frequency; Humans; Information analysis; Information technology; Large-scale systems; Natural languages; Probability; Statistical analysis;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Semantic Computing, 2007. ICSC 2007. International Conference on
  • Conference_Location
    Irvine, CA
  • Print_ISBN
    978-0-7695-2997-4
  • Type

    conf

  • DOI
    10.1109/ICSC.2007.60
  • Filename
    4338341