DocumentCode :
3425439
Title :
Text normalization in mandarin text-to-speech system
Author :
Yuxiang Jia ; Dezhi Huang ; Wu Liu ; Shiwen Yu ; Haila Wang
Author_Institution :
Inst. of Comput. Linguistics, Peking Univ., Beijing
fYear :
2008
fDate :
March 31 2008-April 4 2008
Firstpage :
4693
Lastpage :
4696
Abstract :
Text normalization is an important component in text-to-speech system and the difficulty in text normalization is to disambiguate the non-standard words (NSWs). This paper develops a taxonomy of NSWs on the basis of a large scale Chinese corpus, and proposes a two-stage NSWs disambiguation strategy, finite state automata (FSA) for initial classification and maximum entropy (ME) classifiers for subclass disambiguation. Based on the above NSWs taxonomy, the two-stage approach achieves an F-score of 98.53% in open test, 5.23% higher than that of FSA based approach. Experiments show that the NSWs taxonomy ensures FSA a high baseline performance and ME classifiers make considerable improvement, and the two-stage approach adapts well to new domains.
Keywords :
finite state machines; maximum entropy methods; natural language processing; pattern classification; speech synthesis; Mandarin text-to-speech system; finite state automata; large scale Chinese corpus; maximum entropy classifiers; nonstandard words taxonomy; text normalization; Automata; Computational linguistics; Entropy; Large-scale systems; Natural language processing; Speech processing; Speech synthesis; Taxonomy; Telecommunications; Text analysis; Finite State Automata; Maximum Entropy Classifier; Text Normalization; Text-to-Speech (TTS);
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing, 2008. ICASSP 2008. IEEE International Conference on
Conference_Location :
Las Vegas, NV
ISSN :
1520-6149
Print_ISBN :
978-1-4244-1483-3
Electronic_ISBN :
1520-6149
Type :
conf
DOI :
10.1109/ICASSP.2008.4518704
Filename :
4518704
Link To Document :
بازگشت