DocumentCode :
586380
Title :
Visualizing high dimensional datasets using parallel coordinates: Application to gene prioritization
Author :
Boogaerts, T. ; Tranchevent, L. ; Pavlopoulos, Georgios A. ; Aerts, Jan ; Vandewalle, Joos
Author_Institution :
Leuven Future Health Dept., Katholieke Univ. Leuven, Leuven, Belgium
fYear :
2012
fDate :
11-13 Nov. 2012
Firstpage :
52
Lastpage :
57
Abstract :
In this paper, we introduce a visualization tool for interactive and efficient exploration of high dimensional data using parallel coordinates. An algorithm is developed to find an optimal permutation of dimensions, which allows the data miner to immediately see the most important features or irregularities in the dataset. This is implemented as a genetic algorithm based on the travelling salesman problem using maximal correlation as fitness. Other features of the tool include selection operators to group the data such as selection by intersection or by angle, orthogonal and density plots complementing the parallel coordinates plot, manual arrangement of permutation order of the dimensions, possibility to show all plots necessary to see all dimensional relations and displaying a certain number of standard deviations for each dimension separately. The tool is applied to multiple gene prioritization cases in search of genes that are relevant to certain genetic disorders. The used datasets are obtained with the MerKator and Endeavour tools and include a Breast cancer, Cataract, Charcoth-Marie-Tooth and Cardiomyopathy dataset, as well as a dataset relating 29 diseases with 22206 genes. Our tool, manual and data can be downloaded from http://www.toomas.be/parcoord/.
Keywords :
cancer; data mining; data visualisation; genetic algorithms; genetics; medical computing; medical disorders; travelling salesman problems; Charcoth-Marie-Tooth; Endeavour tool; MerKator tool; breast cancer; cardiomyopathy dataset; cataract; data grouping; data miner; diseases; fitness; gene prioritization; genetic algorithm; genetic disorders; high-dimensional dataset visualization tool; maximal correlation; optimal dimension permutation; parallel coordinate plot; selection operators; travelling salesman problem; Breast cancer; Correlation; Data visualization; Diseases; Genetic algorithms; Genetics; Proteins; data visualization; gene prioritization; genetic algorithm; parallel coordinates;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Bioinformatics & Bioengineering (BIBE), 2012 IEEE 12th International Conference on
Conference_Location :
Larnaca
Print_ISBN :
978-1-4673-4357-2
Type :
conf
DOI :
10.1109/BIBE.2012.6399706
Filename :
6399706
Link To Document :
بازگشت