Information mining system design and implementation based on web crawler

Author

Lin, Shan ; Li, You-meng ; Li, Qing-Cheng

Author_Institution

Coll. of Inf. Tech. Sci., Nankai Univ., Tianjin

fYear

2008

fDate

2-4 June 2008

Firstpage

1

Lastpage

5

Abstract

With the information explosion causing by the World Wide Web in recent years, the issue of how to execute the enormous information efficiently at a reasonable lost has become the concern of information providers, service agencies and end users. When many research focus on how to design an efficient Web crawler, we pay our attention to how to make the best of the result of Web crawler. In this paper, we describe the design and implementation of an information mining system running on the results of Web crawler to gain more metadata from unstructured documents for focused search (such as RSS search). We present the software architecture of the system, describe efficient techniques for achieving high performance and report preliminary experimental results to prove that this system can address the issue of robustness, flexibility and accuracy at a low cost.

Keywords

Internet; data mining; document handling; information retrieval; meta data; software architecture; Web crawler; World Wide Web; information mining system; information provider; metadata; service agency; software architecture; Costs; Crawlers; Data mining; Educational institutions; Electronic mail; Fuzzy logic; Internet; Search engines; Web pages; Web sites; Crawler; RSS; information mining; low cost;

fLanguage

English

Publisher

ieee

Conference_Titel

System of Systems Engineering, 2008. SoSE '08. IEEE International Conference on

Conference_Location

Singapore

Print_ISBN

978-1-4244-2172-5

Electronic_ISBN

978-1-4244-2173-2

Type

conf

DOI

10.1109/SYSOSE.2008.4724148

Filename

4724148