DocumentCode :
3300672
Title :
Information Extraction from Semi-structured WEB Page Based on DOM Tree and its Application in Scientific Literature Statistical Analysis System
Author :
Li Weidong ; Dong Yibing ; Wang Ruijiang ; Tian Hongxia
Author_Institution :
Sch. of Inf. Technol., Hebei Univ. of Econ. & Bus., Shijiazhuang, China
fYear :
2009
fDate :
11-12 July 2009
Firstpage :
124
Lastpage :
127
Abstract :
To extract information automatically from semi-structured Web pages, this paper puts forward a method named IESS for discovering the record model based on DOM and maximal similar sub tree, to identify records automatically and correctly when there are some differences in expression models of records that belong to the same type. To test the performance of the method, a scientific literature statistical analysis system is designed. The practice shows that users can quickly understand the distribution of papers in their retrieving field and grasp the importance with the help of the system.
Keywords :
Web sites; information retrieval; statistical analysis; tree data structures; DOM tree; IESS method; information extraction; maximal similar sub tree; scientific literature statistical analysis system; semistructured Web page; Application software; Conference management; Data mining; Databases; Engineering management; HTML; Information management; Statistical analysis; Technology management; Web pages; Automatic information extraction; DOM; Scientific Literature; Statistical Analysis;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Services Science, Management and Engineering, 2009. SSME '09. IITA International Conference on
Conference_Location :
Zhangjiajie
Print_ISBN :
978-0-7695-3729-0
Type :
conf
DOI :
10.1109/SSME.2009.59
Filename :
5233332
Link To Document :
بازگشت