Title :
Web-Based Knowledge Acquisition to Impute Missing Values for Classification
Author :
Tang, Na ; Vemuri, V. Rao
Author_Institution :
University of California, Davis
Abstract :
Machine learning is the science of building predictors from data while accounting for the predictor´s accuracy on future data. Many machine learning classifiers can make accurate predictions when the data is complete. In the presence of insufficient data, statistical methods can be applied to fill in a few missing items. But these methods rely only on the available data to calculate the missing values and perform poorly if the percentage of missing values exceeds a threshold. An alternative is to fill in the missing data by an automated knowledge discovery process via mining the WWW. This novel procedure is applied by first restoring missing information and next learning the parameters of the classifier from the restored data. Using a Bayesian network as a classifier, the parameters, i.e., the probabilities associated with the causal relationships in the network, are deduced using the knowledge mined from the WWW in conjunction with the data available on hand. The method, when tested with heart disease data sets from the UC Irvine Machine Learning Repository [UCI repository of machine learning databases], gave satisfactory results.
Keywords :
Bayesian methods; Cardiac disease; Computer science; Data mining; Information retrieval; Knowledge acquisition; Machine learning; Niobium; Training data; World Wide Web;
Conference_Titel :
Web Intelligence, 2004. WI 2004. Proceedings. IEEE/WIC/ACM International Conference on
Print_ISBN :
0-7695-2100-2
DOI :
10.1109/WI.2004.10114