DocumentCode :
2115601
Title :
Research on Medical Document Categorization
Author :
Zhang, Qirui ; Xue, Yonggang ; Zhou, Huaying ; Tan, Jinghua
Author_Institution :
Coll. of Med. Inf. Eng., Guangdong Pharm. Univ., Guangzhou
fYear :
2008
fDate :
18-18 Dec. 2008
Firstpage :
437
Lastpage :
440
Abstract :
Medical document categorization is the process of automatically assigning one or more predefined category labels to medical documents. Document indexing plays a very important role in the process of classification. This paper proposes an improved method of computing term weights which is called tfidfie (term frequency, inverted document frequency and inverted entropy). In comparison with the tfidf (term frequency and inverted document frequency) function, the tfidfie function adds an information entropy factor, H, which represents the distribution of documents in the training set in which the term occurs. Then, we discuss the effects of training set in medical document categorization. An imbalanced training set decreases the performance of classifier. Considering the characteristics of medical documents, the medical classifiers are constructed by the methods of Naive Bayes and Rocchio respectively. The experiment results show that tfidfie improves the classification performance and Naive Bayes outperforms Rocchio.
Keywords :
classification; medical information systems; text analysis; Naive Bayes; document indexing; information entropy factor; inverted document frequency; inverted entropy; medical document categorization; term frequency; Biomedical engineering; Educational institutions; Frequency; Indexing; Information entropy; Pharmaceuticals; Research and development; Seminars; Telecommunications; Text categorization; Naïve Bayes; Rocchio; document categorization; document indexing; information entropy; medical information;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Future BioMedical Information Engineering, 2008. FBIE '08. International Seminar on
Conference_Location :
Wuhan, Hubei
Print_ISBN :
978-0-7695-3561-6
Type :
conf
DOI :
10.1109/FBIE.2008.83
Filename :
5076776
Link To Document :
بازگشت