Title :
Clinical multi-label free text classification by exploiting disease label relation
Author :
Rui-Wei Zhao ; Guo-Zheng Li ; Jia-Ming Liu ; Xiao Wang
Author_Institution :
Dept. of Control Sci. & Eng., Tongji Univ., Shanghai, China
Abstract :
Clinical data describing a patient´s health status can be multi-labelled. For example, a clinical record describing patient suffering from cough and fever should be tagged with both two disease labels. These co-occurred labels often have interrelation which can be exploited to improve disease classifications. In this work, we treat the categorization of free clinical text as a multi-label learning problem. However, we discover that some commonly used multi-label learning methods might suffer from some severe side effects in exploiting complicated disease label relation, such as over-exploitation of label relation and error-propagation in label prediction. Based on these findings, we propose a novel multi-label learning algorithm called Ensemble of Sampled Classifier Chains (ESCC) to improve clinical text data classification. ESCC automatically learns to select relevant disease information that is helpful to improve classification performance when exploiting possible disease relation. In our conducted experiments, ESCC shows strong advantages over other state-of-the-art multi-label algorithms on medical text data with significant improvement in performance. The proposed algorithm is promising in mining knowledge from a wide range of multi-label medical text data.
Keywords :
classification; diseases; learning (artificial intelligence); medical information systems; text analysis; ESCC; classification performance; clinical multilabel free text classification; clinical multilabelled data; clinical record; clinical text data classification; cooccurred labels; disease classifications; disease information; disease label relation; ensemble of sampled classifier chains; free clinical text categorization; label prediction error-propagation; label relation over-exploitation; multilabel learning algorithm; multilabel learning problem; multilabel medical text data; patient health status; Bioinformatics; Conferences; Diseases; Measurement; Medical diagnostic imaging; Prediction algorithms; Training; clinical text classification; disease relation learning; multi-label learning;
Conference_Titel :
Bioinformatics and Biomedicine (BIBM), 2013 IEEE International Conference on
Conference_Location :
Shanghai
DOI :
10.1109/BIBM.2013.6732508