Title :
Improve Computer-Aided Diagnosis With Machine Learning Techniques Using Undiagnosed Samples
Author :
Li, Ming ; Zhou, Zhi-Hua
Author_Institution :
Nanjing Univ., Nanjing
Abstract :
In computer-aided diagnosis (CAD), machine learning techniques have been widely applied to learn a hypothesis from diagnosed samples to assist the medical experts in making a diagnosis. To learn a well-performed hypothesis, a large amount of diagnosed samples are required. Although the samples can be easily collected from routine medical examinations, it is usually impossible for medical experts to make a diagnosis for each of the collected samples. If a hypothesis could be learned in the presence of a large amount of undiagnosed samples, the heavy burden on the medical experts could be released. In this paper, a new semisupervised learning algorithm named Co-Forest is proposed. It extends the co-training paradigm by using a well-known ensemble method named Random Forest, which enables Co-Forest to estimate the labeling confidence of undiagnosed samples and easily produce the final hypothesis. Experiments on benchmark data sets verify the effectiveness of the proposed algorithm. Case studies on three medical data sets and a successful application to microcalcification detection for breast cancer diagnosis show that undiagnosed samples are helpful in building CAD systems, and Co-Forest is able to enhance the performance of the hypothesis that is learned on only a small amount of diagnosed samples by utilizing the available undiagnosed samples.
Keywords :
cancer; gynaecology; learning (artificial intelligence); medical diagnostic computing; medical expert systems; pattern clustering; tumours; breast cancer diagnosis; co-forest semi supervised learning algorithm; computer-aided diagnosis; machine learning technique; medical data sets; medical expert system; microcalcification cluster detection; random forest ensemble method; routine medical examination; undiagnosed samples; Breast cancer; Cancer detection; Computer aided diagnosis; Labeling; Machine learning; Machine learning algorithms; Medical diagnostic imaging; Semisupervised learning; Supervised learning; Technological innovation; Computer-aided diagnosis (CAD); co-training; ensemble learning; machine learning; microcalcification cluster detection; random forest; semisupervised learning;
Journal_Title :
Systems, Man and Cybernetics, Part A: Systems and Humans, IEEE Transactions on
DOI :
10.1109/TSMCA.2007.904745