Title :
HIMA: A Holistic Data Instance Matching Approach
Author :
Miao, Jiajia ; Chen, Guoyou ; Li, Aiping ; Yan, Jia ; Jiang, Siyu
Author_Institution :
Inst. of Command Autom., PLA Univ. of Sci. & Technol., Nanjing, China
Abstract :
Considering the consistency of instance level, we come up with a Holistic Data Instance Matching Approach (HIMA). Firstly, we measure the similarity of instances with the algorithm of string distances. HIMA makes use of the clustering algorithm, which it can handle, a large scale of data source holistically. In addition, we use the keyword extracting method, which is based on the maximum entropy model, to get rid of the useless information. The experimental results show that the keyword extracting algorithm can get 70% precision, and the condition probabilistic based algorithm is more precise than the token-based algorithm. HIMA method can achieve 83% accuracy.
Keywords :
data analysis; entropy; information retrieval; pattern clustering; string matching; HIMA; clustering algorithm; holistic data instance matching; keyword extracting method; maximum entropy model; string distance; Clustering algorithms; Computational modeling; Computers; Couplings; Data mining; Entropy; Programmable logic arrays; clustering; instance matching; maximum entropy model; string distance;
Conference_Titel :
Electrical and Control Engineering (ICECE), 2010 International Conference on
Conference_Location :
Wuhan
Print_ISBN :
978-1-4244-6880-5
DOI :
10.1109/iCECE.2010.1272