Title :
Improving Similarity Join Algorithms Using Fuzzy Clustering Technique
Author :
Tan, Lisa ; Fotouhi, Farshad ; Grosky, William ; Pop, Horia F. ; Mouaddib, Noureddine
Author_Institution :
Dept. of Comput. Sci., Wayne State Univ., Detroit, MI, USA
Abstract :
In this paper, we propose a pre-processing technique to improve existing string similarity join algorithms using fuzzy clustering. Our approach first identifies groups of related attributes and then, using this information, we apply existing string similarity join algorithms on these attributes. To identify the clustered attributes we use fuzzy techniques. This approach can be applied to the integration of knowledge bases and databases, as well as handle inconsistent values and naming conventions, incorrect or missing data values, and incomplete information from multiple sources with semi-compatible attributes or homogenous attributes. Using an experimental study, we have shown our preprocessing approach improves existing string similarity join algorithms by about 10 percent on precision and recall.
Keywords :
database management systems; fuzzy set theory; knowledge based systems; pattern clustering; string matching; databases; fuzzy clustering technique; homogenous attributes; knowledge bases; naming conventions; pre-processing technique; related attributes; semicompatible attributes; string similarity join algorithms; Clustering algorithms; Computer science; Conferences; Data mining; Detection algorithms; Distributed algorithms; Monitoring; NASA; Space technology; Statistical distributions;
Conference_Titel :
Data Mining Workshops, 2009. ICDMW '09. IEEE International Conference on
Conference_Location :
Miami, FL
Print_ISBN :
978-1-4244-5384-9
Electronic_ISBN :
978-0-7695-3902-7
DOI :
10.1109/ICDMW.2009.50