• DocumentCode
    2664583
  • Title

    A Novel Efficient Classification Algorithm for Search Engines

  • Author

    Alla, H.A.H.M.A. ; Al-Ghreimil, N.

  • Author_Institution
    Inf. Technol. Dept., King Saud Univ., Riyadh, Saudi Arabia
  • fYear
    2008
  • fDate
    10-12 Dec. 2008
  • Firstpage
    773
  • Lastpage
    778
  • Abstract
    In this paper a new classification algorithm of Web documents into a set of categories, is proposed. The proposed technique is based on analyzing relationships between different documents and the terms they contain by producing a set of rules relating the category of the document, its terms and their frequencies. Each document is represented by a graph that correlates its most frequent combined words and its category. The relationships among these graphs and the documentspsila categories are captured. The proposed technique has three phases. The first phase is a training phase where human experts determines the categories of different Web pages and articles and combine these categories with appropriate weighted index. The second phase is the blind categorization phase to build a database that will be categorized according to the result of the first phase. The third phase is applying the proposed graph representation technique on the whole set of documents per category to determine its final graph representation. The third phase will produce better classification rules because the sample size is larger with no additional cost of supervised categorization. Experiments using data sets collected from different Web portals are conducted.
  • Keywords
    document handling; graph theory; search engines; Web documents; Web portals; blind categorization phase; classification algorithm; documents categories; graph representation technique; search engines; training phase; weighted index; Classification algorithms; Database systems; Educational institutions; Humans; Information technology; Portals; Search engines; Web mining; Web pages; World Wide Web; Document Classification.; Information Processing; Supervised Classification;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computational Intelligence for Modelling Control & Automation, 2008 International Conference on
  • Conference_Location
    Vienna
  • Print_ISBN
    978-0-7695-3514-2
  • Type

    conf

  • DOI
    10.1109/CIMCA.2008.68
  • Filename
    5172723