Title :
Machine Learning Methods for Medical Text Categorization
Author :
Zhang, Qirui ; Tan, Jinghua ; Zhou, Huaying ; Tao, Weiye ; He, Kejing
Author_Institution :
Coll. of Med. Inf. Eng., Guangdong Pharm. Univ., Guangzhou, China
Abstract :
This paper reports a comparative study for medical text categorizations on four machine learning methods: k nearest neighbor (kNN), support vector machines (SVM), naive Bayes (NB) and clonal selection algorithm based on antibody density (CSABAD). CSABAD is an improved immune algorithm proposed by us. According to the clonal selection principle and density control mechanism, only those cells that have higher affinity and lower density are selected to proliferate. In addition, we propose an improved approach, called term frequency, inverted document frequency and inverted entropy (TFIDFIE), to compute term weights in document indexing. It considers the distribution of documents in the training set in which the term occurs. Our experiments show that SVM and CSABAD outperform significantly kNN and naive Bayes, and TFIDFIE is more effective than TFIDF on OHSCAL data set.
Keywords :
document handling; indexing; learning (artificial intelligence); medical computing; support vector machines; text analysis; antibody density; clonal selection algorithm; density control mechanism; document indexing; improved immune algorithm; inverted document frequency; inverted entropy; k nearest neighbor; machine learning methods; medical text categorization; naive Bayes; support vector machines; term frequency; Entropy; Frequency; Immune system; Indexing; Learning systems; Machine learning algorithms; Nearest neighbor searches; Niobium; Support vector machines; Text categorization; document indexing; immune algorithm; machine learning; medical text categorization;
Conference_Titel :
Circuits, Communications and Systems, 2009. PACCS '09. Pacific-Asia Conference on
Conference_Location :
Chengdu
Print_ISBN :
978-0-7695-3614-9
DOI :
10.1109/PACCS.2009.156