• DocumentCode
    260319
  • Title

    An Improved Ratio-Based (IRB) Batch Effects Removal Algorithm for Cancer Data in a Co-Analysis Framework

  • Author

    Shuchu Ham ; Hong Qin ; Dantong Yu

  • Author_Institution
    Comput. Sci. Dept., Stony Brook Univ. (SUNY), Stony Brook, NY, USA
  • fYear
    2014
  • fDate
    10-12 Nov. 2014
  • Firstpage
    212
  • Lastpage
    219
  • Abstract
    Ratio-based algorithms are proven to be effective methods for removing batch effects that exist among micro array expression data from different data sources. They are outperforming than other methods in the enhancement of cross-batch prediction, especially for cancer data sets. However, their overall power is limited by: (1) Not every batch has control samples. The original method uses all negative samples to calculate the subtrahend. (2) Micro array experimental data may not have clear labels, especially in the prediction application, the labels of test data set are unknown. In this paper, we propose an Improved Ratio-Based (IRB) method to relieve these two constraints for cross-batch prediction applications. For each batch in a single study, we select one reference sample based on the idea of aligning probability density functions (pdfs) of each gene in different batches. Moreover, for data sets without label information, we transfer the problem of finding reference sample to the dense sub graph problem in graph theory. Our newly-proposed IRB method is straightforward and efficient, and can be extended for integrating large volume micro array data sets. The experiments show that our method is stable and has high performance in tumor/non-tumor prediction.
  • Keywords
    cancer; data analysis; genetics; graph theory; lab-on-a-chip; medical computing; probability; tumours; cancer data; coanalysis framework; cross-batch prediction; data sources; dense subgraph problem; gene; graph theory; improved ratio-based batch effect removal algorithm; large volume microarray data sets; microarray expression data; nontumor prediction; probability density functions; tumor prediction; Algorithm design and analysis; Bipartite graph; Cancer; Correlation; Gene expression; Lungs; Tumors;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Bioinformatics and Bioengineering (BIBE), 2014 IEEE International Conference on
  • Conference_Location
    Boca Raton, FL
  • Type

    conf

  • DOI
    10.1109/BIBE.2014.47
  • Filename
    7033583