DocumentCode
144894
Title
A method for the extraction of phonetically-rich triphone sentences
Author
Mendonca, Gustavo ; Candeias, Sara ; Perdigao, Fernando ; Shulby, Christopher ; Toniazzo, Rean ; Klautau, Aldebaro ; Aluisio, Sandra
Author_Institution
Inst. de Cienc. Mat. e de Comput., Univ. de Sao Paulo, Sao Carlos, Brazil
fYear
2014
fDate
17-20 Aug. 2014
Firstpage
1
Lastpage
5
Abstract
A method is proposed for compiling a corpus of phonetically-rich triphone sentences; i.e., sentences with a high variety of triphones, distributed in a uniform fashion. Such a corpus is of interest for a wide range of contexts, from automatic speech recognition to speech therapy. We evaluated this method by building phonetically-rich corpora for Brazilian Portuguese. The data employed comes from Wikipedia´s dumps, which were converted into plain text, segmented and phonetically transcribed. The method consists of comparing the distance between the triphone distribution of the available sentences to an ideal uniform distribution, with equiprobable triphones. A greedy algorithm was implemented to recognize and evaluate the distance among sentences. A heuristic metric is proposed for pre-selecting sentences for the algorithm, in order to quicken its execution. The results show that, by applying the proposed metric, one can build corpora with more uniform triphone distributions.
Keywords
greedy algorithms; information retrieval; natural language processing; speech recognition; speech synthesis; text analysis; Brazilian Portuguese; Wikipedia dumps; automatic speech recognition; corpus compiling; distance evaluate; equiprobable triphones; greedy algorithm; heuristic metric; phonetically-rich triphone sentence extraction; sentences preselection; speech technology; speech therapy; text-to-speech systems; triphone distribution; Electronic publishing; Encyclopedias; Internet; Measurement; Speech; Speech recognition;
fLanguage
English
Publisher
ieee
Conference_Titel
Telecommunications Symposium (ITS), 2014 International
Conference_Location
Sao Paulo
Type
conf
DOI
10.1109/ITS.2014.6947957
Filename
6947957
Link To Document