DocumentCode
2664583
Title
A Novel Efficient Classification Algorithm for Search Engines
Author
Alla, H.A.H.M.A. ; Al-Ghreimil, N.
Author_Institution
Inf. Technol. Dept., King Saud Univ., Riyadh, Saudi Arabia
fYear
2008
fDate
10-12 Dec. 2008
Firstpage
773
Lastpage
778
Abstract
In this paper a new classification algorithm of Web documents into a set of categories, is proposed. The proposed technique is based on analyzing relationships between different documents and the terms they contain by producing a set of rules relating the category of the document, its terms and their frequencies. Each document is represented by a graph that correlates its most frequent combined words and its category. The relationships among these graphs and the documentspsila categories are captured. The proposed technique has three phases. The first phase is a training phase where human experts determines the categories of different Web pages and articles and combine these categories with appropriate weighted index. The second phase is the blind categorization phase to build a database that will be categorized according to the result of the first phase. The third phase is applying the proposed graph representation technique on the whole set of documents per category to determine its final graph representation. The third phase will produce better classification rules because the sample size is larger with no additional cost of supervised categorization. Experiments using data sets collected from different Web portals are conducted.
Keywords
document handling; graph theory; search engines; Web documents; Web portals; blind categorization phase; classification algorithm; documents categories; graph representation technique; search engines; training phase; weighted index; Classification algorithms; Database systems; Educational institutions; Humans; Information technology; Portals; Search engines; Web mining; Web pages; World Wide Web; Document Classification.; Information Processing; Supervised Classification;
fLanguage
English
Publisher
ieee
Conference_Titel
Computational Intelligence for Modelling Control & Automation, 2008 International Conference on
Conference_Location
Vienna
Print_ISBN
978-0-7695-3514-2
Type
conf
DOI
10.1109/CIMCA.2008.68
Filename
5172723
Link To Document