DocumentCode :
1559475
Title :
Semiautomatic acquisition of semantic structures for understanding domain-specific natural language queries
Author :
Meng, Helen M. ; Siu, Kai-Chung
Author_Institution :
Dept. of Syst. Eng. & Eng. Manage., Chinese Univ. of Hong Kong, Shatin, China
Volume :
14
Issue :
1
fYear :
2002
Firstpage :
172
Lastpage :
181
Abstract :
This paper describes a methodology for semiautomatic grammar induction from unannotated corpora of information-seeking queries in a restricted domain. The grammar contains both semantic and syntactic structures, which are conducive to (spoken) natural language understanding. Our work aims to ameliorate the reliance of grammar development on expert handcrafting or on the availability of annotated corpora. To strive for reasonable coverage on real data, as well as portability across domains and languages, we adopt a statistical approach. Agglomerative clustering using the symmetrized divergence criterion groups words "spatially". These words have similar left and right contexts and tend to form semantic classes. Agglomerative clustering using mutual information groups words "temporally". These words tend to co-occur sequentially to form phrases or multiword entities. Our approach is amenable to the optional injection of prior knowledge to catalyze grammar induction. The resultant grammar is interpretable by humans and is amenable to hand-editing for refinement. Hence, our approach is semiautomatic in nature. Experiments were conducted using the ATIS (Air Travel Information Service) corpus and the semiautomatically-induced grammar GSA is compared to an entirely handcrafted grammar GH. GH took two months to develop and gave concept error rates of 7 percent and 11.3 percent, respectively, in language understanding of two test corpora. GSA took only three days to produce and gave concept errors of 14 percent and 12.2 percent on the corresponding test corpora. These results provide a desirable trade-off between language understanding performance and grammar development effort
Keywords :
grammars; knowledge acquisition; natural language interfaces; query processing; agglomerative clustering; concepts extraction; grammar induction; information-seeking queries; knowledge acquisition; natural language query; natural language understanding; semiautomatic grammar induction; Natural languages;
fLanguage :
English
Journal_Title :
Knowledge and Data Engineering, IEEE Transactions on
Publisher :
ieee
ISSN :
1041-4347
Type :
jour
DOI :
10.1109/69.979980
Filename :
979980
Link To Document :
بازگشت