• DocumentCode
    2669439
  • Title

    A Method for Collecting Tibetan-Websites

  • Author

    Zhi-juan, Wang ; Xiao-bin, Zhao ; Rui, Yang

  • Author_Institution
    Nat. Language Resource Monitoring & Res. Center, Minzu Univ. of China, Beijing, China
  • fYear
    2011
  • fDate
    1-3 Nov. 2011
  • Firstpage
    222
  • Lastpage
    224
  • Abstract
    Features of Tibetan-websites are analyzed first in this paper. Then, the method to collect Tibetan-websites is introduced in three steps: collect the web pages using Tibetan high-frequency words first, judge whether the web page is in Tibetan or not according to the frequency of Tibetan syllable dot in one web page, at last, find the URL of Tibetan-website using the URL of Tibetan web page. The method is proved to be efficient and fast in collecting Tibetan-websites. The Tibetan websites information collected using this method is already submitted to National Language Resource Monitoring & Research Center.
  • Keywords
    Web sites; natural languages; National Language Resource Monitoring & Research Center; Tibetan high-frequency words; Tibetan syllable dot; Tibetan web page URL; Tibetan-Websites collection method; Encoding; Equations; HTML; Internet; Mathematical model; Monitoring; Web pages; Tibetan-websites; web page collecting; web page language;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Intelligent Networks and Intelligent Systems (ICINIS), 2011 4th International Conference on
  • Conference_Location
    Kunming
  • Print_ISBN
    978-1-4577-1626-3
  • Type

    conf

  • DOI
    10.1109/ICINIS.2011.3
  • Filename
    6104733