• DocumentCode
    144894
  • Title

    A method for the extraction of phonetically-rich triphone sentences

  • Author

    Mendonca, Gustavo ; Candeias, Sara ; Perdigao, Fernando ; Shulby, Christopher ; Toniazzo, Rean ; Klautau, Aldebaro ; Aluisio, Sandra

  • Author_Institution
    Inst. de Cienc. Mat. e de Comput., Univ. de Sao Paulo, Sao Carlos, Brazil
  • fYear
    2014
  • fDate
    17-20 Aug. 2014
  • Firstpage
    1
  • Lastpage
    5
  • Abstract
    A method is proposed for compiling a corpus of phonetically-rich triphone sentences; i.e., sentences with a high variety of triphones, distributed in a uniform fashion. Such a corpus is of interest for a wide range of contexts, from automatic speech recognition to speech therapy. We evaluated this method by building phonetically-rich corpora for Brazilian Portuguese. The data employed comes from Wikipedia´s dumps, which were converted into plain text, segmented and phonetically transcribed. The method consists of comparing the distance between the triphone distribution of the available sentences to an ideal uniform distribution, with equiprobable triphones. A greedy algorithm was implemented to recognize and evaluate the distance among sentences. A heuristic metric is proposed for pre-selecting sentences for the algorithm, in order to quicken its execution. The results show that, by applying the proposed metric, one can build corpora with more uniform triphone distributions.
  • Keywords
    greedy algorithms; information retrieval; natural language processing; speech recognition; speech synthesis; text analysis; Brazilian Portuguese; Wikipedia dumps; automatic speech recognition; corpus compiling; distance evaluate; equiprobable triphones; greedy algorithm; heuristic metric; phonetically-rich triphone sentence extraction; sentences preselection; speech technology; speech therapy; text-to-speech systems; triphone distribution; Electronic publishing; Encyclopedias; Internet; Measurement; Speech; Speech recognition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Telecommunications Symposium (ITS), 2014 International
  • Conference_Location
    Sao Paulo
  • Type

    conf

  • DOI
    10.1109/ITS.2014.6947957
  • Filename
    6947957