DocumentCode :
2447492
Title :
Model of Data Gathering and Processing on Tibetan and Uyghur Language
Author :
Weng, Yu ; Jia, Hanxin ; Ma, Qingli
Author_Institution :
Coll. of Inf. Eng., Minzu Univ. of China, Beijing, China
fYear :
2012
fDate :
1-3 Nov. 2012
Firstpage :
264
Lastpage :
266
Abstract :
A model of web data gathering and processing on Tibetan and Uyghur language is introduced in this paper, including page crawler, content extraction, word segmentation and frequency statistics and display. Firstly, It extracts the website\´s templates and use the template to extract the content and title of the web page, then the software transforms the HTML file to the XML file. The second step is to segment the content of XML file into words and to count the number of words, in order to store the statistics into database. Finally", "there is a web page to display the the result of the frequency statistics.
Keywords :
Web sites; XML; data handling; hypermedia markup languages; natural language processing; HTML; Tibetan language; Uyghur language; Web data gathering; Web page; Website templates; XML; content extraction; data processing; frequency statistics; page crawler; word segmentation; Data mining; Data models; Databases; Java; Transforms; Web pages; XML; Data Processing; Data gathering; Tibetan and Uyghur language;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Intelligent Networks and Intelligent Systems (ICINIS), 2012 Fifth International Conference on
Conference_Location :
Tianjin
Print_ISBN :
978-1-4673-3083-1
Type :
conf
DOI :
10.1109/ICINIS.2012.81
Filename :
6376538
Link To Document :
بازگشت