DocumentCode :
2092644
Title :
A CS Grammar Based Query Form Information Extraction Method
Author :
Liu, Fujiang ; Deng, Shichun ; Wang, Nianbin ; Li, Xinping
Author_Institution :
Coll. of Comput. Sci. & Technol., Harbin Eng. Univ., Harbin, China
Volume :
1
fYear :
2008
fDate :
20-22 Dec. 2008
Firstpage :
394
Lastpage :
398
Abstract :
Nowadays, the number of Web databases has experienced an increase at a surprising rate. Data in the Web databases are hidden behind query forms. As the general reptiles are difficult to search these data, massive resources have been wasted. In order to integrate Web databases and provide a convenience to users´ query, one of important problems in this research area is to understand what a query form says. This paper introduces a form information extraction method which is established on the basis of the analysis. By observing a large number of Web pages containing query forms, we found the basic structure of them and confirmed the existence of a syntax which guides the creation of them. So we established a method to extract query form information, captured the syntax through a derived grammar-Code Sequence grammar and designed an automaton parser to understand query forms automatically.
Keywords :
Web sites; database management systems; grammars; query processing; Web databases; Web pages; grammar-code; query form information extraction method; sequence grammar; users´ query; Automata; Books; Computer science; Data engineering; Data mining; Databases; Educational institutions; Information analysis; Web pages; Web sites; CS grammar; Web database; deep Web; query form;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer Science and Computational Technology, 2008. ISCSCT '08. International Symposium on
Conference_Location :
Shanghai
Print_ISBN :
978-1-4244-3746-7
Type :
conf
DOI :
10.1109/ISCSCT.2008.190
Filename :
4731452
Link To Document :
بازگشت