DocumentCode
2019365
Title
Automatic word classification using simulated annealing
Author
Jardino, Michèle ; Adda, Gilles
Author_Institution
LIMSI-CNRS, Orsay, France
Volume
2
fYear
1993
fDate
27-30 April 1993
Firstpage
41
Abstract
A bigram class model which gives the probability of a word class given its predecessor class has been developed. Simulated annealing is used to classify automatically the words of large text corpora. A first validation of the use of simulated annealing in language modeling is presented. Results are presented using a French corpus of 40000 words and a German corpus of 100000 words. It is demonstrated that simulated annealing makes it possible to classify words without any syntactic or semantic knowledge. The best results are obtained with all words gathered into a unique class at the beginning of the optimization. Simulated annealing is easy to implement and CPU time cost is not prohibitive: seven hours on a 486-33 MHz PC to classify 14000 words into 120 classes using a 75000 word training set, without any code optimization.<>
Keywords
classification; computational complexity; computational linguistics; simulated annealing; speech recognition; CPU time cost; French; German; automatic word classification; bigram class model; language modeling; simulated annealing; training set;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech, and Signal Processing, 1993. ICASSP-93., 1993 IEEE International Conference on
Conference_Location
Minneapolis, MN, USA
ISSN
1520-6149
Print_ISBN
0-7803-7402-9
Type
conf
DOI
10.1109/ICASSP.1993.319224
Filename
319224
Link To Document