• DocumentCode
    430207
  • Title

    Spoken document summarization using topic-related corpus and semantic dependency grammar

  • Author

    Hsieh, Chia-Hsin ; Huang, Chien-Lin ; Wu, Chung-Hsien

  • Author_Institution
    Dept. of Comput. Sci. & Inf. Eng., Nat. Cheng Kung Univ., Tainan, Taiwan
  • fYear
    2004
  • fDate
    15-18 Dec. 2004
  • Firstpage
    333
  • Lastpage
    336
  • Abstract
    The paper presents a spoken document summarization scheme using a topic-related corpus and semantic dependency grammar. The summarization score considers speech recognition confidence, word significance, word trigram, semantic dependency grammar (SDG) and probabilistic context free grammar (PCFG). In addition, a topic-related corpus consisting of keywords as well as articles is used to estimate the word significance score using latent semantic indexing (LSI). Semantic relations between words are determined by SDG using HowNet and Sinica Treebank. A dynamic programming algorithm is applied to decide the summarization ratio and look for the best summarization result according to summarization scores. Experimental results indicate that the proposed approach effectively extracts important words with semantic dependency and gives a promising speech summary.
  • Keywords
    context-free grammars; dynamic programming; parameter estimation; speech processing; speech recognition; statistical analysis; text analysis; HowNet; Sinica Treebank; dynamic programming algorithm; keywords; latent semantic indexing; probabilistic context free grammar; semantic dependency grammar; speech recognition confidence; speech summary; spoken document summarization; summarization ratio; summarization score; topic-related corpus; word significance score; word trigram; Computer science; Dynamic programming; Heuristic algorithms; Humans; Indexing; Internet; Large scale integration; Multimedia databases; Speech analysis; Speech recognition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Chinese Spoken Language Processing, 2004 International Symposium on
  • Print_ISBN
    0-7803-8678-7
  • Type

    conf

  • DOI
    10.1109/CHINSL.2004.1409654
  • Filename
    1409654