Title :
Web information extraction and its application
Author :
Peng, Yan ; Zhang, Chenyue
Author_Institution :
Sch. of Manage., Capital Normal Univ., Beijing, China
Abstract :
Information extraction (IE) addresses the problem of extracting specific information from a collection of documents. The work presented in this paper described an approach of design an information extraction system; put forward basic system architecture. Describe the detail steps of web information extraction, such as web page organize, rule generate and result show. Finally, successfully extracted information is placed in an XML template, which has been designed to capture information needed in the teaching-learning system. Although the work presented in this paper was restricted to HTML course outlines, the concepts and methods are easily applied to other different domains.
Keywords :
Internet; XML; computer aided instruction; hypermedia markup languages; teaching; HTML course outlines; Web information extraction; Web page; XML template; basic system architecture; teaching-learning system; Data mining; Databases; HTML; Service oriented architecture; Web pages; XML; Extraction Rule; HTML; Information Extraction; XML;
Conference_Titel :
Cloud Computing and Intelligence Systems (CCIS), 2011 IEEE International Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-61284-203-5
DOI :
10.1109/CCIS.2011.6045107