DocumentCode
3006526
Title
Research on Web Information Extraction Based on XML
Author
Hu, Yan ; Xuan, Yanyan
Author_Institution
Dept. Comput. Sci. & Technol., Wuhan Univ. of Technol., Wuhan
fYear
2008
fDate
25-26 Sept. 2008
Firstpage
201
Lastpage
204
Abstract
The standard XML technology is used for Web information extraction in this paper, and a generic XML-based Web information extraction solution is proposed. In the extraction process, two key technologies are proposed and implemented: the XML-based Web data conversion technology and the DOM-based XPath generation technology, to simplify the information extraction work. XSLT is used as the description language of extraction rules, which is conductive to the unity of extraction patterns.
Keywords
Internet; XML; information retrieval; DOM-based XPath generation technology; Web information extraction; XML technology; XML-based Web data conversion technology; XSLT description language; extraction rule pattern; Artificial intelligence; Computer science; Data conversion; Data mining; Genetics; HTML; Memory; Optimization methods; Web pages; XML;
fLanguage
English
Publisher
ieee
Conference_Titel
Genetic and Evolutionary Computing, 2008. WGEC '08. Second International Conference on
Conference_Location
Hubei
Print_ISBN
978-0-7695-3334-6
Type
conf
DOI
10.1109/WGEC.2008.16
Filename
4637427
Link To Document