• DocumentCode
    106942
  • Title

    Optimizing Spaced k -mer Neighbors for Efficient Filtration in Protein Similarity Search

  • Author

    Weiming Li ; Bin Ma ; Kaizhong Zhang

  • Author_Institution
    Dept. of Comput. Sci., Univ. of Western Ontario, London, ON, Canada
  • Volume
    11
  • Issue
    2
  • fYear
    2014
  • fDate
    March-April 2014
  • Firstpage
    398
  • Lastpage
    406
  • Abstract
    Large-scale comparison or similarity search of genomic DNA and protein sequence is of fundamental importance in modern molecular biology. To perform DNA and protein sequence similarity search efficiently, seeding (or filtration) method has been widely used where only sequences sharing a common pattern or “seed” are subject to detailed comparison. Therefore these methods trade search sensitivity with search speed. In this paper, we introduce a new seeding method, called spaced k-mer neighbors, which provides a better tradeoff between the sensitivity and speed in protein sequence similarity search. With the method of spaced k-mer neighbors, for each spaced k-mer, a set of spaced k-mers is selected as its neighbors. These pre-selected spaced k-mer neighbors are then used to detect hits between query sequence and database sequences. We propose an efficient heuristic algorithm for the spaced neighbor selection. Our computational experimental results demonstrate that the method of spaced k-mer neighbors can improve the overall tradeoff efficiency over existing seeding methods.
  • Keywords
    DNA; biology computing; filtration; heuristic programming; molecular biophysics; molecular configurations; optimisation; proteins; query processing; sensitivity; database sequences; efficient filtration; genomic DNA; heuristic algorithm; modern molecular biology; preselected spaced k-mer neighbors; protein sequence; protein similarity search; query sequence; seeding method; sensitivity; spaced k-mer neighbor optimization; spaced neighbor selection; Amino acids; Bioinformatics; DNA; Databases; Frequency modulation; Proteins; Sensitivity; Spaced seeds; homology search; similarity search;
  • fLanguage
    English
  • Journal_Title
    Computational Biology and Bioinformatics, IEEE/ACM Transactions on
  • Publisher
    ieee
  • ISSN
    1545-5963
  • Type

    jour

  • DOI
    10.1109/TCBB.2014.2306831
  • Filename
    6744614