Title :
An Adjusted-Edit Distance Algorithm Applying to Web Environment
Author :
Zhu, Mingdong ; Shen, Derong ; Nie, Tiezheng ; Kou, Yue
Author_Institution :
Dept. of Comput. Sci. & Eng., Northeastern Univ., Shenyang, China
Abstract :
Identifying similarity of strings is an essential step in data cleaning and data integration processes. However, information on the Web is mostly composed of semi-structured and unstructured data, and mixes with a variety of inaccurate information, such as noise data, repeat characters and the abbreviated name. This makes traditional string similarity algorithms aiming at some particular environment not achieve good results. This paper proposes an improved edit distance-based algorithm-Adjusted-Edit distance algorithm for Web Environment. The experiments have proved the feasibility of the improved algorithm.
Keywords :
Internet; pattern recognition; Web environment; adjusted-edit distance algorithm; data cleaning process; data integration process; string similarity algorithm; Algorithm design and analysis; Application software; Cleaning; Computer science; Costs; Data engineering; Information systems; Sequences; Testing; Working environment noise; Web; edit distance; string similarity algorithm;
Conference_Titel :
Web Information Systems and Applications Conference, 2009. WISA 2009. Sixth
Conference_Location :
Xuzhou, Jiangsu
Print_ISBN :
978-0-7695-3874-7
DOI :
10.1109/WISA.2009.45