• DocumentCode
    454695
  • Title

    An Extremely Large Vocabulary Approach to Named Entity Extraction from Speech

  • Author

    Hori, Takaaki ; Nakamura, Atsushi

  • Author_Institution
    NTT Commun. Sci. Lab., NTT Corp., Kyoto
  • Volume
    1
  • fYear
    2006
  • fDate
    14-19 May 2006
  • Abstract
    This paper describes an approach to named entity (NE) extraction from speech data, in which an extremely large vocabulary lexicon including all NEs occurring in a large text corpus is used for automatic speech recognition (ASR). Accordingly, NEs appear in the recognition results just as they are. Our approach is implemented by the following steps: (1) run an NE-tagger for a whole text corpus and make an NE-tagged corpus in which each NE is padded with its category, (2) construct a lexicon and a language model for ASR using the tagged corpus where each NE is considered as a regular word, and (3) run the speech recognizer in one pass. Although a very large vocabulary is necessary to ensure a high coverage of NEs, that is no longer a major problem since we recently achieved real-time extremely large vocabulary ASR using a WEST framework. In experiments on NE extraction from spoken queries for an open-domain question-answering system, our approach yielded higher F-measure values than a conventional approach
  • Keywords
    feature extraction; speech processing; speech recognition; automatic speech recognition; language model; large vocabulary approach; named entity extraction; named entity tagged corpus; open-domain question-answering system; speech data; spoken queries; vocabulary lexicon; Automatic speech recognition; Data mining; Information retrieval; Laboratories; Natural language processing; Natural languages; Speech analysis; Speech recognition; Text recognition; Vocabulary;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings. 2006 IEEE International Conference on
  • Conference_Location
    Toulouse
  • ISSN
    1520-6149
  • Print_ISBN
    1-4244-0469-X
  • Type

    conf

  • DOI
    10.1109/ICASSP.2006.1660185
  • Filename
    1660185