• DocumentCode
    3133694
  • Title

    A novel similarity measure for semantic class induction in human-computer spoken dialogues

  • Author

    Li, Yali ; Bao, Changchun ; Yan, Yonghong

  • Author_Institution
    ThinkIT Lab., Chinese Acad. of Sci., Beijing, China
  • fYear
    2009
  • fDate
    20-21 Sept. 2009
  • Firstpage
    351
  • Lastpage
    354
  • Abstract
    In this paper, we introduced a new semantic induction metric which can induce some semantic classes from a set of domain-specific unannotated data. We emphasized on the co-occurrence probability instead of just distances of word probability distribution. Compared to the traditional approach on right or left context to calculate the similarity, we used both left and right information simultaneously in the metric. Before processing, we removed fillers based on their unigram and bigram context distribution. We can find that co-occurrence metric is simple, effective and have lower misclassified portion. We test the metric on our Chinese voice-search data, and get F1 for 84.3.
  • Keywords
    human computer interaction; interactive systems; natural language interfaces; natural language processing; probability; speech-based user interfaces; Chinese voice-search data; bigram context distribution; co-occurrence probability; domain-specific unannotated data; human-computer spoken dialogues; semantic class induction; similarity measure; unigram context distribution; word probability distribution; Acoustic measurements; Electrostatic precipitators; Entropy; Induction generators; Laboratories; Man machine systems; Natural languages; Probability distribution; Tagging; Testing; semantic class induction; similarity metric;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information, Computing and Telecommunication, 2009. YC-ICT '09. IEEE Youth Conference on
  • Conference_Location
    Beijing
  • Print_ISBN
    978-1-4244-5074-9
  • Electronic_ISBN
    978-1-4244-5076-3
  • Type

    conf

  • DOI
    10.1109/YCICT.2009.5382351
  • Filename
    5382351