Research on Web Information Extraction Based on XML

Author

Hu, Yan ; Xuan, Yanyan

Author_Institution

Dept. Comput. Sci. & Technol., Wuhan Univ. of Technol., Wuhan

fYear

2008

fDate

25-26 Sept. 2008

Firstpage

201

Lastpage

204

Abstract

The standard XML technology is used for Web information extraction in this paper, and a generic XML-based Web information extraction solution is proposed. In the extraction process, two key technologies are proposed and implemented: the XML-based Web data conversion technology and the DOM-based XPath generation technology, to simplify the information extraction work. XSLT is used as the description language of extraction rules, which is conductive to the unity of extraction patterns.

Keywords

Internet; XML; information retrieval; DOM-based XPath generation technology; Web information extraction; XML technology; XML-based Web data conversion technology; XSLT description language; extraction rule pattern; Artificial intelligence; Computer science; Data conversion; Data mining; Genetics; HTML; Memory; Optimization methods; Web pages; XML;

fLanguage

English

Publisher

ieee

Conference_Titel

Genetic and Evolutionary Computing, 2008. WGEC '08. Second International Conference on

Conference_Location

Hubei

Print_ISBN

978-0-7695-3334-6

Type

conf

DOI

10.1109/WGEC.2008.16

Filename

4637427

Link To Document

https://search.isc.ac/dl/search/defaultta.aspx?DTC=49&DC=3006526