Title :
A comparison of four metrics for auto-inducing semantic classes
Author :
Pargellis, Andrew ; Fosler-Lussier, Eric ; Potamianos, Alexandros ; Lee, Chin-Hui
Author_Institution :
Dialogue Syst. Res. Dept, Lucent Technol. Bell Labs., Murray Hill, NJ, USA
Abstract :
A speech understanding system typically includes a natural language understanding module that defines concepts, i.e., groups of semantically related words. It is a challenge to build a set of concepts for a new domain for which prior knowledge and training data are limited. In our work, concepts are induced automatically from unannotated training data by grouping semantically similar words and phrases together into concept classes. Four context-dependent similarity metrics are proposed and their performance for auto-inducing concepts is evaluated. Two of these metrics are based on the Kullback-Leibler (KL) distance measure, a third is the Manhattan norm, and the fourth is the vector product (VP) similarity measure. The KL and VP metrics consistently underperform the other metrics on the four tasks investigated: movie information, a children´s game, travel reservations, and Wall Street Journal news articles. Correct concept classification rates are up to 90% for the movie task.
Keywords :
computational linguistics; natural language interfaces; speech recognition; Kullback-Leibler distance measure; Manhattan norm; Wall Street Journal news articles; auto-inducing concepts; concept classes; concept classification; context-dependent similarity metrics; game; movie information; natural language understanding module; performance; semantically related words; speech understanding system; travel reservations; unannotated training data; vector product similarity measure; Application software; Humans; Man machine systems; Motion pictures; Speech;
Conference_Titel :
Automatic Speech Recognition and Understanding, 2001. ASRU '01. IEEE Workshop on
Print_ISBN :
0-7803-7343-X
DOI :
10.1109/ASRU.2001.1034626