• DocumentCode
    1986593
  • Title

    Clustering binary fingerprint vectors with missing values for DNA array data analysis

  • Author

    Figueroa, Andres ; Borneman, James ; Jiang, Tao

  • Author_Institution
    Dept. of Comput. Sci., California Univ., Riverside, CA, USA
  • fYear
    2003
  • fDate
    11-14 Aug. 2003
  • Firstpage
    38
  • Lastpage
    47
  • Abstract
    Oligonucleotide fingerprinting is a powerful DNA array based method to characterize cDNA and ribosomal RNA gene (rDNA) libraries and has many applications including gene expression profiling and DNA clone classification. We are especially interested in the latter application. A key step in the method is the cluster analysis of fingerprint data obtained from DNA array hybridization experiments. Most of the existing approaches to clustering use (normalized) real intensity values and thus do not treat positive and negative hybridization signals equally (positive signals are much more emphasized). In this paper, we consider a discrete approach. Fingerprint data are first normalized and binarized using control DNA clones. Because there may exist unresolved (or missing) values in this binarization process, we formulate the clustering of (binary) oligonucleotide fingerprints as a combinatorial optimization problem that attempts to identify clusters and resolve the missing values in the fingerprints simultaneously. We study the computational complexity of this clustering problem and a natural parameterized version, and present an efficient greedy algorithm based on minimum clique partition on graphs. The algorithm takes advantage of some unique properties of the graphs considered here, which allow us to efficiently find the maximum cliques as well as some special maximal cliques. Our experimental results on simulated and real data demonstrate that the algorithm runs faster and performs better than some popular hierarchical and graph-based clustering methods. The results on real data from DNA clone classification also suggest that this discrete approach is more accurate than clustering methods based on real intensity values, in terms of separating clones that have different characteristics with respect to the given oligonucleotide probes.
  • Keywords
    DNA; arrays; biology computing; cellular biophysics; data analysis; genetics; microorganisms; molecular biophysics; pattern clustering; probes; proteins; statistical analysis; DNA array data analysis; DNA array hybridization; DNA clone classification; binarization process; cluster analysis; clustering binary fingerprint vectors; fingerprint data; oligonucleotide fingerprinting; oligonucleotide probes; ribosomal RNA gene; Cloning; Clustering algorithms; Clustering methods; DNA; Data analysis; Fingerprint recognition; Gene expression; Libraries; Partitioning algorithms; RNA;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Bioinformatics Conference, 2003. CSB 2003. Proceedings of the 2003 IEEE
  • Print_ISBN
    0-7695-2000-6
  • Type

    conf

  • DOI
    10.1109/CSB.2003.1227302
  • Filename
    1227302