Title of article :
Sparse nonnegative matrix factorization for protein sequence motif discovery
Author/Authors :
Kim، نويسنده , , Wooyoung and Chen، نويسنده , , Bernard and Kim، نويسنده , , Jingu and Pan، نويسنده , , Yi and Park، نويسنده , , Haesun، نويسنده ,
Issue Information :
روزنامه با شماره پیاپی سال 2011
Pages :
10
From page :
13198
To page :
13207
Abstract :
The problem of discovering motifs from protein sequences is a critical and challenging task in the field of bioinformatics. The task involves clustering relatively similar protein segments from a huge collection of protein sequences and culling high quality motifs from a set of clusters. A granular computing strategy combined with K-means clustering algorithm was previously proposed for the task, but this strategy requires a manual selection of biologically meaningful clusters which are to be used as an initial condition. This manipulated clustering method is undisciplined as well as computationally expensive. In this paper, we utilize sparse non-negative matrix factorization (SNMF) to cluster a large protein data set. We show how to combine this method with Fuzzy C-means algorithm and incorporate bio-statistics information to increase the number of clusters whose structural similarity is high. Our experimental results show that an SNMF approach provides better protein groupings in terms of similarities in secondary structures while maintaining similarities in protein primary sequences.
Keywords :
Sparse non-negative matrix factorization , protein sequence motif , Clustering , Chou–Fasman parameters , Fuzzy C-Means
Journal title :
Expert Systems with Applications
Serial Year :
2011
Journal title :
Expert Systems with Applications
Record number :
2350386
Link To Document :
بازگشت