DocumentCode :
2065623
Title :
A Three-Stage Text Normalization Strategy for Mandarin Text-to-Speech Systems
Author :
Zhou, Tao ; Dong, Yuan ; Huang, Dezhi ; Liu, Wu ; Wang, Haila
Author_Institution :
Beijing Univ. of Posts & Telecommun., Beijing, China
fYear :
2008
fDate :
16-19 Dec. 2008
Firstpage :
1
Lastpage :
4
Abstract :
Text normalization is an important component in mandarin Text-to-Speech system. This paper develops a taxonomy of Non-Standard Words (NSW´s) based on a Large-scale Chinese corpus and proposes a three-stage text normalization strategy: Finite State Automata (FSA) for initial classification, Maximum Entropy (ME) Classifier & Rules for further classification and General Rules for standard word conversion. The three-stage approach achieves Precision of 96.02% in experiments, 5.21% higher than that of simple rule based approach and 2.21% higher than that of simple machine learning method. Experiments results show that the approach of three-stage disambiguation strategy for text normalization makes considerable improvement, and works well in real TTS system.
Keywords :
speech processing; text analysis; Mandarin text-to-speech systems; finite state automata; large-scale Chinese corpus; machine learning method; maximum entropy classifier; non-standard words; taxonomy; three-stage text normalization strategy; Entropy; Large-scale systems; Learning automata; Learning systems; Research and development; Speech synthesis; Support vector machines; Taxonomy; Telecommunications; Text analysis;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Chinese Spoken Language Processing, 2008. ISCSLP '08. 6th International Symposium on
Conference_Location :
Kunming
Print_ISBN :
978-1-4244-2942-4
Electronic_ISBN :
978-1-4244-2943-1
Type :
conf
DOI :
10.1109/CHINSL.2008.ECP.43
Filename :
4730297
Link To Document :
بازگشت