DocumentCode
435445
Title
Transliteration system for Arabic-Numeral Expressions using decision tree for intelligent Korean TTS
Author
Jung, Youngim ; Lee, Donghun ; Aesun Yoon ; Kwon, Hyuk-Chul
Author_Institution
Dept. of Electr. & Comput. Eng., Pusan Nat. Univ., South Korea
Volume
1
fYear
2004
fDate
2-6 Nov. 2004
Firstpage
657
Abstract
Though there has been much work on TTS technologies and several TTS systems have customized for Korean, current TTS systems output many errors in transliterating non-alphabetic symbols such as Arabic numerals and text symbols. Arabic Numeral Expressions (ANEs) show a high occurrence-frequency and deliver significant senses, especially in scientific or informative documents and texts. This paper proposes TAN (Transliteration system for Arabic-Numeral expressions) which can efficiently disambiguate the meaning and reading of Arabic Numeral Expressions in texts by using a decision tree. For the purpose of analyzing and learning data, three phases of learning elements were suggested: patterns of Arabic numerals combined with text symbols, contextual features and heuristic information were classified according to the senses and sounds of ANEs. Our corpus was made up of news articles issued from January 1st, 2000 to December 31SI, 2001 from 10 major newspapers in Korea. By learning the three phases of learning elements, the system shows 97.52% and 97.29% accuracies for the training set and the test set, respectively. This result shows that the accuracy of our system is 9.72% higher than that of a current TTS system for Korean.
Keywords
context-free languages; decision trees; language translation; natural languages; ANE; Arabic Numeral Expression; Arabic numeral; Korean TTS technologies; contextual feature; decision tree; heuristic information; informative document; learning element; scientific document; text symbol; training set; transliterating nonalphabetic symbol; Classification tree analysis; Computer errors; Data analysis; Decision trees; Electronic mail; Humans; Information analysis; Pattern analysis; Speech synthesis; System testing;
fLanguage
English
Publisher
ieee
Conference_Titel
Industrial Electronics Society, 2004. IECON 2004. 30th Annual Conference of IEEE
Print_ISBN
0-7803-8730-9
Type
conf
DOI
10.1109/IECON.2004.1433388
Filename
1433388
Link To Document