• DocumentCode
    2407793
  • Title

    Automatic classification of deep web databases with simple query interface

  • Author

    Xian, Xuefeng ; Zhao, Pengpeng ; Fang, Wei ; Xin, Jie ; Cui, Zhiming

  • Author_Institution
    Inst. of Intell. Inf. Process. & Applic., Soochow Univ., Suzhou, China
  • fYear
    2009
  • fDate
    15-16 May 2009
  • Firstpage
    85
  • Lastpage
    88
  • Abstract
    Deep Web database classify is a key operation in organizing Deep Web resources. We address the problem of identifying the domain of Web databases with simple query interface. The existing methods can not effectively classify this type of Web databases, to solve this problem, we propose an new framework that can automatically and accurately classify Web databases with simple query interface based on probing query. The core of this framework is a domain specific classifier(DSC). DSC is constructed by using the features that can be easily extracted from advanced query interfaces(forms) in domain. According to the similar relation among result schemas, interface schemas and global schemas of Web database, Based on its result schemas, a new Web database with simple query interface can be classified by DSC. Experiments running on real structured Web databases collected from the Internet show that our provides an effective and scalable solution for classifying Web databases with simple query interface.
  • Keywords
    Internet; pattern classification; query processing; Internet; Web database; Web resource; automatic classification; domain specific classifier; simple query interface; Application software; Automation; Computer industry; Data mining; Deductive databases; Image databases; Information retrieval; Internet; Mechatronics; Spatial databases; component; deep web; probing query; result schema; simple query interface;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Industrial Mechatronics and Automation, 2009. ICIMA 2009. International Conference on
  • Conference_Location
    Chengdu
  • Print_ISBN
    978-1-4244-3817-4
  • Type

    conf

  • DOI
    10.1109/ICIMA.2009.5156566
  • Filename
    5156566