DocumentCode
3300672
Title
Information Extraction from Semi-structured WEB Page Based on DOM Tree and its Application in Scientific Literature Statistical Analysis System
Author
Li Weidong ; Dong Yibing ; Wang Ruijiang ; Tian Hongxia
Author_Institution
Sch. of Inf. Technol., Hebei Univ. of Econ. & Bus., Shijiazhuang, China
fYear
2009
fDate
11-12 July 2009
Firstpage
124
Lastpage
127
Abstract
To extract information automatically from semi-structured Web pages, this paper puts forward a method named IESS for discovering the record model based on DOM and maximal similar sub tree, to identify records automatically and correctly when there are some differences in expression models of records that belong to the same type. To test the performance of the method, a scientific literature statistical analysis system is designed. The practice shows that users can quickly understand the distribution of papers in their retrieving field and grasp the importance with the help of the system.
Keywords
Web sites; information retrieval; statistical analysis; tree data structures; DOM tree; IESS method; information extraction; maximal similar sub tree; scientific literature statistical analysis system; semistructured Web page; Application software; Conference management; Data mining; Databases; Engineering management; HTML; Information management; Statistical analysis; Technology management; Web pages; Automatic information extraction; DOM; Scientific Literature; Statistical Analysis;
fLanguage
English
Publisher
ieee
Conference_Titel
Services Science, Management and Engineering, 2009. SSME '09. IITA International Conference on
Conference_Location
Zhangjiajie
Print_ISBN
978-0-7695-3729-0
Type
conf
DOI
10.1109/SSME.2009.59
Filename
5233332
Link To Document