Title :
Quality-based data source selection for web-scale Deep Web data integration
Author :
Xian, Xue-Feng ; Zhao, Peng-Peng-Peng ; Fang, Wei ; Xin, Jie ; Cui, Zhi-ming
Author_Institution :
Inst. of Intell. Inf. Process. & Applic., Soochow Univ., Suzhou, China
Abstract :
Deep Web has been an important resource on the Web due to its rich and high quality information, leading to emerging a new application area in data mining and information retrieval and integrates. In Web scale deep Web data integration tasks, where there may be hundreds or thousands of data sources providing data of relevance to a particular domain, It must be inefficient to integrate all available deep Web sources. This paper proposes a data source selection approach based on the quality of deep Web source. It is used for automatic finding the highest quality set of deep Web sources related to a particular domain, which is a premise for effective deep Web data integration. The quality of data sources are assessed by evaluating quality dimensions represent the characteristics of deep Web source. Experiments running on real deep Web sources collected from the Internet show that our provides an effective and scalable solution for selecting data sources for deep Web data integration.
Keywords :
Internet; data mining; Web-scale deep Web data integration; World Wide Web; quality-based data source selection; Cybernetics; Machine learning; Deep Web; data Integration; quality Assessment; source selection;
Conference_Titel :
Machine Learning and Cybernetics, 2009 International Conference on
Conference_Location :
Baoding
Print_ISBN :
978-1-4244-3702-3
Electronic_ISBN :
978-1-4244-3703-0
DOI :
10.1109/ICMLC.2009.5212537