DocumentCode :
3703583
Title :
Patient classification based on expanded query using 5-gram collocation and binary tree
Author :
Jaya Sil;Indrani Bhattacharya
Author_Institution :
Dept. of Computer Science and Technology, Indian Institute of Engineering Science and Technology, Shibpur, India
fYear :
2015
Firstpage :
1
Lastpage :
10
Abstract :
Patients in rural India express their discomfort using keyword as query due to their lack of knowledge about the intended domain. Therefore, there is no scope of automatic revision of the query using feedback mechanism, unlike the existing query expansion methods. The paper aims at developing a primary level disease diagnosis system for the patients of rural India by expanding the query using 5-gram collocation model. First a string of five co-occurred words with respect to each query are obtained by consulting several medical documents. We call the string of five terms as bag of symptom (BoS), representing a concept. For each query there is multiple BoSs from which we select ten only based on their rank, measured using Log-likelihood ratio. However, all the terms in the BoS may not represent the disease symptom but semantically related concept similar to the symptoms of the symptom vocabulary (SV). We propose a novel binary tree based approach to calculate the degree of similarity (DoS) between the terms in the BoS and the symptoms in the SV using topology of the tree and term frequency-inverse document frequency (tf-idf) of the symptoms. The SV with respect to each BoS is encoded with DoS value and framed as feature vectors, which are mostly sparse. To remove sparsity in the feature vectors we apply singular value decomposition (SVD) method. Finally, the patients are classified into four probable diseases using 10-fold cross validation technique where the SV consists optimum no. of symptoms for such diseases. We classify pregnant women separately into two probable diseases and each of the cases the system shows satisfactory performance.
Keywords :
"Diseases","Medical diagnostic imaging","Binary trees","Vocabulary","Temperature sensors","Computer science"
Publisher :
ieee
Conference_Titel :
Data Science and Advanced Analytics (DSAA), 2015. 36678 2015. IEEE International Conference on
Print_ISBN :
978-1-4673-8272-4
Type :
conf
DOI :
10.1109/DSAA.2015.7344864
Filename :
7344864
Link To Document :
بازگشت