DocumentCode
2669439
Title
A Method for Collecting Tibetan-Websites
Author
Zhi-juan, Wang ; Xiao-bin, Zhao ; Rui, Yang
Author_Institution
Nat. Language Resource Monitoring & Res. Center, Minzu Univ. of China, Beijing, China
fYear
2011
fDate
1-3 Nov. 2011
Firstpage
222
Lastpage
224
Abstract
Features of Tibetan-websites are analyzed first in this paper. Then, the method to collect Tibetan-websites is introduced in three steps: collect the web pages using Tibetan high-frequency words first, judge whether the web page is in Tibetan or not according to the frequency of Tibetan syllable dot in one web page, at last, find the URL of Tibetan-website using the URL of Tibetan web page. The method is proved to be efficient and fast in collecting Tibetan-websites. The Tibetan websites information collected using this method is already submitted to National Language Resource Monitoring & Research Center.
Keywords
Web sites; natural languages; National Language Resource Monitoring & Research Center; Tibetan high-frequency words; Tibetan syllable dot; Tibetan web page URL; Tibetan-Websites collection method; Encoding; Equations; HTML; Internet; Mathematical model; Monitoring; Web pages; Tibetan-websites; web page collecting; web page language;
fLanguage
English
Publisher
ieee
Conference_Titel
Intelligent Networks and Intelligent Systems (ICINIS), 2011 4th International Conference on
Conference_Location
Kunming
Print_ISBN
978-1-4577-1626-3
Type
conf
DOI
10.1109/ICINIS.2011.3
Filename
6104733
Link To Document