• DocumentCode
    515343
  • Title

    Clustering DNA sequences by selforganizing map and similarity functions

  • Author

    Elhadi, Gamal F. ; Abbas, Mohamed A.

  • Author_Institution
    Comput. Sci. Dept., Menofia Univ., Menouf, Egypt
  • fYear
    2010
  • fDate
    28-30 March 2010
  • Firstpage
    1
  • Lastpage
    8
  • Abstract
    Deoxyribonucleic acid (DNA) is a nucleic acid that contains the genetic instructions used in the development and functioning of all known living organisms and some viruses. Our proposed approach for DNA clustering depends on an algorithm for clustering DNA sequences using self-organizing map (SOM) technique. The main objective of this paper is to analyze biological data and to bunch DNA to many clusters more easily and efficiently. Clustering is a process that groups a set of objects into clusters so that the similarity among objects in the same cluster is high, while that among the objects in different clusters is low. Since the work entails processing huge amounts of incomplete or ambiguous data, the learning ability of neural networks, uncertainty handling capacity of fuzzy sets and the searching potential of genetic algorithms are utilized in this direction. We use the proposed approach to analyze both large and small amount of input DNA sequences. The results show that the similarity of the sequences does not depend on the amount of input sequences. Our approach depends on evaluating the degree of the DNA sequences similarity using the hierarchal representation Dendrogram. Representing large amount of data using hierarchal tree gives the ability to compare large sequences efficiently.
  • Keywords
    DNA; biology computing; fuzzy set theory; genetic algorithms; self-organising feature maps; uncertainty handling; DNA sequence clustering process; SOM technique; fuzzy set theory; genetic algorithms; hierarchal representation Dendrogram; neural networks; self-organizing map technique; similarity functions; uncertainty handling; Clustering algorithms; DNA; Data analysis; Fuzzy sets; Genetics; Neural networks; Organisms; Sequences; Uncertainty; Viruses (medical); DNA sequences; clustering; nucleic acids; self-organizing map;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Informatics and Systems (INFOS), 2010 The 7th International Conference on
  • Conference_Location
    Cairo
  • Print_ISBN
    978-1-4244-5828-8
  • Type

    conf

  • Filename
    5461737