DocumentCode :
2260141
Title :
Document Structure Analysis and Text Normalization for Chinese Putonghua and Cantonese Text-to-Speech Synthesis
Author :
Zhou, Xinxin ; Wu, Zhiyong ; Yuan, Chun ; Zhong, Yuzhuo
Author_Institution :
Tsinghua-CUHK Joint Res. Center for Media Sci., Tsinghua Univ., Shenzhen
Volume :
1
fYear :
2008
fDate :
20-22 Dec. 2008
Firstpage :
477
Lastpage :
481
Abstract :
This paper describes our recent effort on document structure analysis (DSA) and text normalization (NORM) for Chinese Putonghua and Cantonese text-to-speech synthesis. A unified framework has been proposed, where DSA and NORM procedures are language-independent for the two-dialects of Chinese. For document structure analysis, regular expressions have been utilized to detect and identify the non-standard-words (NSWs) and punctuations related to document structure; a new document segmentation approach is then proposed by considering the information provided by NSWs and punctuations. For text normalization, a method which considers the contextual information is put forward to handle the ambiguity of the NSWs, symbols and punctuations.
Keywords :
natural language processing; speech synthesis; text analysis; Cantonese; Chinese Putonghua; document segmentation approach; document structure analysis; nonstandard-words; text normalization; text-to-speech synthesis; Digital signal processing; Engines; Flowcharts; Information analysis; Information technology; Intelligent structures; Natural languages; Signal synthesis; Speech synthesis; Text analysis; document structure analysis; speech synthesis; text normalization;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Intelligent Information Technology Application, 2008. IITA '08. Second International Symposium on
Conference_Location :
Shanghai
Print_ISBN :
978-0-7695-3497-8
Type :
conf
DOI :
10.1109/IITA.2008.28
Filename :
4739619
Link To Document :
بازگشت