• DocumentCode
    2467181
  • Title

    A frame work for search forms classification

  • Author

    Klassen, Myungsook

  • Author_Institution
    Comput. Sci. Dept., California Lutheran Univ., Thousand Oaks, CA, USA
  • fYear
    2012
  • fDate
    14-17 Oct. 2012
  • Firstpage
    1029
  • Lastpage
    1034
  • Abstract
    The deep web pages provide highly relevant quality content and represent a large sector of online information sources, yet general purpose search engines do not find deep web pages due to difficulties of identifying them. In this paper, we present a frame work to identify search forms from non search forms using a small number of HTML input elements extracted from user input HTML forms and a few keywords. It utilizes pre-query technique and post-query technique in a hierarchical manner. Decision trees and multi layer artificial neural networks were used to obtain the classification rates over 91% to classify search forms and non search forms. In the new frame work, it is proposed to use post query technique additionally at the last step to distinguish suite search forms form deep search forms.
  • Keywords
    Web sites; classification; decision trees; multilayer perceptrons; search engines; HTML; decision trees; deep Web pages; multilayer artificial neural networks; post query technique; search engines; search forms classification; Biological neural networks; Databases; Decision trees; HTML; Search engines; Vegetation; Web pages; classification; deep web; framework; hidden web; mining; multi layer neural networks; random forests; search form;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Systems, Man, and Cybernetics (SMC), 2012 IEEE International Conference on
  • Conference_Location
    Seoul
  • Print_ISBN
    978-1-4673-1713-9
  • Electronic_ISBN
    978-1-4673-1712-2
  • Type

    conf

  • DOI
    10.1109/ICSMC.2012.6377864
  • Filename
    6377864