• DocumentCode
    2447242
  • Title

    Automatic Extraction Method of Tibetan New Valid Words

  • Author

    Sun, Yuan ; Yan, Xiaodong ; Zhao, Xiaobing ; Yang, Guosheng

  • Author_Institution
    Minority Languages Branch, Nat. Language Resource & Monitoring Res. Center, Beijing, China
  • fYear
    2012
  • fDate
    1-3 Nov. 2012
  • Firstpage
    228
  • Lastpage
    231
  • Abstract
    This paper proposes a model to automatically extract Tibetan new valid words. Through building the dynamic Tibetan corpus from 2009 to 2012, which covers more than 18 Tibetan network media of Tibet, Qinghai, Sichuan, Gansu and Yunnan, we research on the key techniques of Tibetan new valid word extraction: (1) using statistical method to establish Tibetan new word knowledge base, (2) using information entropy to filter Tibetan new valid words, (3) using vector space module similarity calculation to extract Tibetan new valid word.
  • Keywords
    information filters; information retrieval; natural language processing; statistical analysis; Tibetan new valid words; automatic extraction method; dynamic Tibetan corpus; information entropy; information filter; statistical method; vector space module similarity calculation; Compounds; Data mining; Dictionaries; Educational institutions; Information processing; Statistical analysis; Tibetan new valid word; dynamic Tibetan corpus; extraction;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Intelligent Networks and Intelligent Systems (ICINIS), 2012 Fifth International Conference on
  • Conference_Location
    Tianjin
  • Print_ISBN
    978-1-4673-3083-1
  • Type

    conf

  • DOI
    10.1109/ICINIS.2012.61
  • Filename
    6376528