• DocumentCode
    1932327
  • Title

    A Greedy Two-stage Gibbs Sampling Method for Motif Discovery in Biological Sequences

  • Author

    Liu Li-fang ; Jiao Li-Cheng ; Huo Hong-wei

  • Author_Institution
    Sch. of Comput. Sci. & Technol., Xidian Univ., Xian
  • Volume
    1
  • fYear
    2008
  • fDate
    27-30 May 2008
  • Firstpage
    13
  • Lastpage
    17
  • Abstract
    For the motif discovery problem of DNA sequences, a greedy two-stage Gibbs sampling algorithm is presented, and the related software package is called Greedy MotifSAM. Based on position weight matrix (PWM) motif model, a greedy strategy for choosing the initial parameters of PWM is employed. Two sampling methods, site sampler and motif sampler, are used. Site sampler is used to find one occurrence per sequence of the motif in the dataset. Motif sampler is used to find zero or more non-overlapping occurrences of the motif in each sequence. The algorithm is capable of discovering several different motifs with differing numbers of occurrences in a single dataset. We use the binding sites (motif) information of eukaryotic transcription factors stored in TRANSFAC database to test our methods. The prediction accuracy, scalability and reliability are compared to several other methods.
  • Keywords
    DNA; biological techniques; biology computing; cellular biophysics; Greedy MotifSAM; TRANSFAC database; binding sites; biological DNA sequences; eukaryotic transcription factors; greedy two-stage Gibbs sampling algorithm; motif discovery problem; position weight matrix motif model; related software package; Accuracy; Biological system modeling; DNA; Databases; Pulse width modulation; Sampling methods; Sequences; Software algorithms; Software packages; Testing; Binding sites; Gibbs sampling; Motif discovery; Transcription factors;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    BioMedical Engineering and Informatics, 2008. BMEI 2008. International Conference on
  • Conference_Location
    Sanya
  • Print_ISBN
    978-0-7695-3118-2
  • Type

    conf

  • DOI
    10.1109/BMEI.2008.111
  • Filename
    4548627