DocumentCode :
1747118
Title :
Extracting information from semi-structured Internet sources
Author :
Jeong, Jong-Seok ; Oh, Dong-Ik
Author_Institution :
Div. of Inf. Technol. Eng., Soonchunhyang Univ., Asan, South Korea
Volume :
2
fYear :
2001
fDate :
2001
Firstpage :
1378
Abstract :
Information Harvest Warehouse (IHWA) is a web-based information search system. It is designed using the Component Based Software Engineering (CBSE) paradigm, where applications are to be developed by integrating server-side EJB and client-side JCC components. The search system is under a major reconstruction in order to be more general and robust, and to be ready for evolving electronic commerce demands. In this paper, we describe the development of the meta-information gathering service of IHWA (meta gatherer), which collects and extracts information from semi-structured or unstructured data sources. Focus is on the development of the information extraction service of the gatherer from semi-structured (DTD-unknown XML data) Internet information sources. The information extraction module implemented provides clean Java programming interfaces, so that it can be easily integrated with other applications. Its implementation is an efficient one as well, since it analyzes a source XML file in one path, where most other systems use the two paths approach
Keywords :
Internet; data warehouses; distributed object management; electronic commerce; information resources; information retrieval; software engineering; Component Based Software Engineering; Information Harvest Warehouse; Java Commerce Client; Java programming interfaces; client-side JCC; enterprise JavaBeans; information extraction module; meta gatherer; meta-information gathering service; semi-structured Internet sources; server-side EJB; source XML file; unstructured data sources; web-based information search system; Application software; Data mining; Educational institutions; Electronic commerce; Information management; Information technology; Internet; Java; Software engineering; XML;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Industrial Electronics, 2001. Proceedings. ISIE 2001. IEEE International Symposium on
Conference_Location :
Pusan
Print_ISBN :
0-7803-7090-2
Type :
conf
DOI :
10.1109/ISIE.2001.931683
Filename :
931683
Link To Document :
بازگشت