• DocumentCode
    3495459
  • Title

    Storage and Matching Technique in Genomic Sequences for Approximate Motif Searching

  • Author

    Srinivasa, K.G. ; Shashidhara, H.S.

  • Author_Institution
    Data Min. Lab., M S Ramaiah Inst. of Technol., Bangalore
  • Volume
    1
  • fYear
    2008
  • fDate
    11-13 Nov. 2008
  • Firstpage
    968
  • Lastpage
    973
  • Abstract
    Sequence retrieval serves as a preprocess for a number of other processes including motif discovery, in which obtained sequences are scored against a consensus before being recognized as a motif. This depends on the way sequences are stored prior to retrieval. The usage of two bits for representing genomic characters is optimal storage wise, however does not provide any details regarding length of repetitive characters or other details of positional significance. The intent of the paper is to showcase an alternative storage technique for the sequence and its corresponding retrieval technique. We represent our technique with the use of integers for clarity of understanding. With the bit equivalent of the integers used in actual representation we could minimize storage complexity significantly. We give a clear picture of the requirements of a storage technique from a motif discovery perspective before showcasing our proposal.
  • Keywords
    bioinformatics; genetics; information retrieval; molecular biophysics; molecular configurations; approximate motif searching; genomic sequences; matching; sequence retrieval; storage; Arithmetic; Bioinformatics; Data mining; Genomics; Information retrieval; Information technology; Laboratories; Proposals; Redundancy; Sequences; Bioinformatics; DNA; biological data mining; sequence alignment; storage techniques;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Convergence and Hybrid Information Technology, 2008. ICCIT '08. Third International Conference on
  • Conference_Location
    Busan
  • Print_ISBN
    978-0-7695-3407-7
  • Type

    conf

  • DOI
    10.1109/ICCIT.2008.201
  • Filename
    4682157