Title :
Monitoring human larynx by random forests using questionnaire data
Author :
Verikas, Antanas ; Bacauskiene, Marija ; Gelzinis, Adas ; Uloza, Virgilijus
Author_Institution :
Dept. of Electr. & Control Equip., Kaunas Univ. of Technol., Kaunas, Lithuania
Abstract :
This paper is concerned with noninvasive monitoring of human larynx using subject´s questionnaire data. By applying random forests (RF), questionnaire data are categorized into a healthy class and several classes of disorders including: cancerous, noncancerous, diffuse, nodular, paralysis, and an overall pathological class. The most important questionnaire statements are determined using RF variable importance evaluations. To explore multidimensional data, t-Distributed Stochastic Neighbor Embedding (t-SNE) and multidimensional scaling (MDS) are applied to the RF data proximity matrix. When testing the developed tools on a set of data collected from 109 subjects, 100% classification accuracy was obtained on unseen data coming from two - healthy and pathological - classes. The accuracy of 80.7% was achieved when classifying the data into the healthy, cancerous, and noncancerous classes. The t-SNE and MDS mapping techniques facilitate data exploration aimed at identifying subjects belonging to a ”risk group”. It is expected that the developed tools will be of great help in preventive health care in laryngology.
Keywords :
cancer; data handling; health care; matrix algebra; patient monitoring; pattern classification; stochastic processes; MDS mapping techniques; RF data proximity matrix; RF variable importance evaluations; cancerous disorder class; diffuse disorder class; healthy class; human larynx monitoring; laryngology; multidimensional data; multidimensional scaling; nodular disorder class; noncancerous disorder class; paralysis disorder class; pathological class; preventive health care; questionnaire data categorization; random forests; t-SNE mapping techniques; t-distributed stochastic neighbor embedding mapping technique; Accuracy; Data visualization; Distance measurement; Intelligent systems; Pathology; Radio frequency; Vegetation; Classifier; Data proximity; Human larynx; Random forests; Variable importance; Variable selection;
Conference_Titel :
Intelligent Systems Design and Applications (ISDA), 2011 11th International Conference on
Conference_Location :
Cordoba
Print_ISBN :
978-1-4577-1676-8
DOI :
10.1109/ISDA.2011.6121774