DocumentCode :
2992887
Title :
HIMA: A Holistic Data Instance Matching Approach
Author :
Miao, Jiajia ; Chen, Guoyou ; Li, Aiping ; Yan, Jia ; Jiang, Siyu
Author_Institution :
Inst. of Command Autom., PLA Univ. of Sci. & Technol., Nanjing, China
fYear :
2010
fDate :
25-27 June 2010
Firstpage :
5242
Lastpage :
5245
Abstract :
Considering the consistency of instance level, we come up with a Holistic Data Instance Matching Approach (HIMA). Firstly, we measure the similarity of instances with the algorithm of string distances. HIMA makes use of the clustering algorithm, which it can handle, a large scale of data source holistically. In addition, we use the keyword extracting method, which is based on the maximum entropy model, to get rid of the useless information. The experimental results show that the keyword extracting algorithm can get 70% precision, and the condition probabilistic based algorithm is more precise than the token-based algorithm. HIMA method can achieve 83% accuracy.
Keywords :
data analysis; entropy; information retrieval; pattern clustering; string matching; HIMA; clustering algorithm; holistic data instance matching; keyword extracting method; maximum entropy model; string distance; Clustering algorithms; Computational modeling; Computers; Couplings; Data mining; Entropy; Programmable logic arrays; clustering; instance matching; maximum entropy model; string distance;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Electrical and Control Engineering (ICECE), 2010 International Conference on
Conference_Location :
Wuhan
Print_ISBN :
978-1-4244-6880-5
Type :
conf
DOI :
10.1109/iCECE.2010.1272
Filename :
5630513
Link To Document :
بازگشت