Title :
Comparative study of probability distribution distances to define a metric for the stability of multi-source biomedical research data
Author :
Saez, Carlos ; Robles, Montserrat ; Garcia-Gomez, Juan M.
Author_Institution :
Grupo de Inf. Biomed. (IBIME), Univ. Politec. de Valencia, Valencia, Spain
Abstract :
Research biobanks are often composed by data from multiple sources. In some cases, these different subsets of data may present dissimilarities among their probability density functions (PDF) due to spatial shifts. This, may lead to wrong hypothesis when treating the data as a whole. Also, the overall quality of the data is diminished. With the purpose of developing a generic and comparable metric to assess the stability of multi-source datasets, we have studied the applicability and behaviour of several PDF distances over shifts on different conditions (such as uni- and multivariate, different types of variable, and multi-modality) which may appear in real biomedical data. From the studied distances, we found information-theoretic based and Earth Mover´s Distance to be the most practical distances for most conditions. We discuss the properties and usefulness of each distance according to the possible requirements of a general stability metric.
Keywords :
bioinformatics; data handling; information theory; statistical distributions; Earth Mover Distance; data quality; generic comparable metric; information theoretic based distance; metric definition; multimodality; multisource biomedical research data stability; multivariate data; probability density function; probability distribution distances; research biobanks; spatial shift; univariate data; variable types; Bioinformatics; Biomedical measurement; Earth; Numerical stability; Probability density function; Stability analysis;
Conference_Titel :
Engineering in Medicine and Biology Society (EMBC), 2013 35th Annual International Conference of the IEEE
Conference_Location :
Osaka
DOI :
10.1109/EMBC.2013.6610228