Title :
The design and implementation of a subject-oriented Web information classification system
Author :
Huang, Yishan ; Wang, Qianping ; Yang, Jing ; Ding, Quan
Author_Institution :
Sch. of Comput., China Univ. of Min. & Technol., JiangSu, China
Abstract :
With the explosive growth of World Wide Web, it is becoming increasingly difficult for users to collect and analyze Web pages that are relevant to a particular subject. In this paper, a subject-oriented Web information classification system (WICS) is presented, by which Web pages can be efficiently collected and classified into several subjects, and the search results are provided to users. Based on analyzing the ordinary search engines, Web text mining is introduced and applied to the WICS. The text preprocessing, index, inverted files and vector space distance algorithm (vector space model, VSM) are brought forward in the prototype. The initial experiments show that classify Web information by the prototype makes convenience for users to inquire information; the relevancy and precision are improved.
Keywords :
classification; data mining; document handling; search engines; Web page classification; Web page collection; Web text mining; World Wide Web; data mining; information inquiry; search engine; subject-oriented Web information classification system; vector space distance algorithm; vector space model; Data mining; Explosives; Frequency; Information analysis; Internet; Prototypes; Search engines; Text mining; Web pages; Wide area networks;
Conference_Titel :
Computer Supported Cooperative Work in Design, 2005. Proceedings of the Ninth International Conference on
Print_ISBN :
1-84600-002-5
DOI :
10.1109/CSCWD.2005.194294