Title :
Research on Web Information Extraction Based on XML
Author :
Hu, Yan ; Xuan, Yanyan
Author_Institution :
Dept. Comput. Sci. & Technol., Wuhan Univ. of Technol., Wuhan
Abstract :
The standard XML technology is used for Web information extraction in this paper, and a generic XML-based Web information extraction solution is proposed. In the extraction process, two key technologies are proposed and implemented: the XML-based Web data conversion technology and the DOM-based XPath generation technology, to simplify the information extraction work. XSLT is used as the description language of extraction rules, which is conductive to the unity of extraction patterns.
Keywords :
Internet; XML; information retrieval; DOM-based XPath generation technology; Web information extraction; XML technology; XML-based Web data conversion technology; XSLT description language; extraction rule pattern; Artificial intelligence; Computer science; Data conversion; Data mining; Genetics; HTML; Memory; Optimization methods; Web pages; XML;
Conference_Titel :
Genetic and Evolutionary Computing, 2008. WGEC '08. Second International Conference on
Conference_Location :
Hubei
Print_ISBN :
978-0-7695-3334-6
DOI :
10.1109/WGEC.2008.16