Title :
An Effective Schema Extraction Algorithm on the Deep Web
Author :
Qiang, Bao-hua ; Xi, Jian-qing ; Zhang, Long
Author_Institution :
Sch. of Comput. Sci. & Eng., South China Univ. of Technol., Guangzhou
Abstract :
The Deep Web, a complex entity that contains information from a variety of source types, has gotten a lot of press in recent years. In order to unlock the vast Deep Web content, effective approaches to extract, index and search the query interfaces from dynamic Web pages should be studied carefully. Based on our previously proposed grouping patterns and pre-clustering algorithm, this paper presents an effective schema extraction algorithm. Three metrics - (LCA) precision, (LCA) recall, and (LCA) Fl are employed to evaluate the performance of schema extraction algorithm. The experimental results indicate that our algorithm can improve the performance of schema extraction of query interfaces on the Deep Web obviously and avoid resulting in the inconsistencies between the subsets by pre-clustering algorithm and those by schema extraction algorithm.
Keywords :
Internet; Web sites; query processing; Deep Web content; dynamic Web pages; preclustering algorithm; query interfaces; schema extraction algorithm; Clustering algorithms; Computer science; Data mining; Databases; Educational institutions; Information science; Merging; Search engines; Web pages; Web sites;
Conference_Titel :
Wireless Communications, Networking and Mobile Computing, 2008. WiCOM '08. 4th International Conference on
Conference_Location :
Dalian
Print_ISBN :
978-1-4244-2107-7
Electronic_ISBN :
978-1-4244-2108-4
DOI :
10.1109/WiCom.2008.2552