Title :
Using dependencies between attributes to identify and correct the mistakes in SARS data set
Author :
Feng, Hong-Hai ; Liu, Bao-yan ; He, Li-yun ; Yang, Bing-ru ; Chen, Yu-Mei
Author_Institution :
Urban & Rural Constr. Sch., Hebei Agric. Univ., Baoding, China
Abstract :
When handling the SARS data set we find some mistakes. Because the amount of the values is more than 700,000, we cannot identify and correct the mistakes manually. Additionally, as different doctors observe or measure different attributes for different patients, there are large amounts of missing values in the whole information table, and so the conventional methods such as ANN, SVM, Bayesian network, etc. cannot be used to this case. Fortunately, some attributes have been measured together. So we can use rough set to induce the dependencies among them, and use the dependencies to identify and correct the mistakes. Furthermore we give a measure for finding and correcting the mistakes. The values corrected with the algorithm correspond to the ones corrected by medical experts, which indicates that the algorithm is available.
Keywords :
data analysis; data mining; diseases; medical information systems; rough set theory; SARS data set; attribute dependency; data cleaning; data handling; data mining; medical experts; mistake correction; mistake identification; patients; rough set; Agricultural engineering; Bayesian methods; Chemical technology; Data analysis; Data mining; Databases; Helium; Rough sets; Software quality; Support vector machines; Rough set; data cleaning; data mining;
Conference_Titel :
Machine Learning and Cybernetics, 2005. Proceedings of 2005 International Conference on
Conference_Location :
Guangzhou, China
Print_ISBN :
0-7803-9091-1
DOI :
10.1109/ICMLC.2005.1527948