DocumentCode
1987987
Title
Information mining system design and implementation based on web crawler
Author
Lin, Shan ; Li, You-meng ; Li, Qing-Cheng
Author_Institution
Coll. of Inf. Tech. Sci., Nankai Univ., Tianjin
fYear
2008
fDate
2-4 June 2008
Firstpage
1
Lastpage
5
Abstract
With the information explosion causing by the World Wide Web in recent years, the issue of how to execute the enormous information efficiently at a reasonable lost has become the concern of information providers, service agencies and end users. When many research focus on how to design an efficient Web crawler, we pay our attention to how to make the best of the result of Web crawler. In this paper, we describe the design and implementation of an information mining system running on the results of Web crawler to gain more metadata from unstructured documents for focused search (such as RSS search). We present the software architecture of the system, describe efficient techniques for achieving high performance and report preliminary experimental results to prove that this system can address the issue of robustness, flexibility and accuracy at a low cost.
Keywords
Internet; data mining; document handling; information retrieval; meta data; software architecture; Web crawler; World Wide Web; information mining system; information provider; metadata; service agency; software architecture; Costs; Crawlers; Data mining; Educational institutions; Electronic mail; Fuzzy logic; Internet; Search engines; Web pages; Web sites; Crawler; RSS; information mining; low cost;
fLanguage
English
Publisher
ieee
Conference_Titel
System of Systems Engineering, 2008. SoSE '08. IEEE International Conference on
Conference_Location
Singapore
Print_ISBN
978-1-4244-2172-5
Electronic_ISBN
978-1-4244-2173-2
Type
conf
DOI
10.1109/SYSOSE.2008.4724148
Filename
4724148
Link To Document