شماره ركورد كنفرانس :
578
عنوان مقاله :
Fast and Scalable Protein Motif Sequence Clustering based on Hadoop Framework
پديدآورندگان :
Farhangi Erfan نويسنده , Ghadiri Nasser نويسنده , Asadi Mahsa نويسنده , Nikbakht Mohammad Amin نويسنده , Pitre Sylvain نويسنده
تعداد صفحه :
8
كليدواژه :
large-scale fuzzy clustering , Hadoop , MapReduce , Motif sequence clustering , Cluster validity measure
سال انتشار :
1396
عنوان كنفرانس :
سومين كنفرانس بين المللي وب پژوهي
زبان مدرك :
فارسی
چكيده فارسي :
In recent years, we are faced with large amounts of sporadic unstructured data on the web. With the explosive growth of such data, there is a growing need for effective methods such as clustering to analyze and extract information. Biological data forms an important part of unstructured data on the web. Protein sequence databases are considered as a primary source of biological data. Clustering can help to organize sequences into homologous and functionally similar groups and can improve the speed of data processing and analysis. Proteins are responsible for most of the activities in cells. The majority of proteins show their function through interaction with other proteins. Hence, prediction of protein interactions is an important research area in the biomedical sciences. Motifs are fragments frequently occurred in protein sequences. A well-known method to specify the protein interaction is based on motif Clustering. Existing works on motif clustering methods share the problem of limitation in the number of clusters. However, regarding the vast amount of motifs and the necessity of a large number of clusters, it seems that an efficient, scalable and fast method is necessary to cluster such large number of sequences. In this paper, we propose a novel approach to cluster a large number of motifs. Our approach includes extracting motifs within protein sequences, feature selection, preprocessing, dimension reduction and utilizing BigFCM (a large-scale fuzzy clustering) on several distributed nodes with Hadoop framework to take the advantage of MapReduce Programming. Experimental Results show very good Performance of our approach.
شماره مدرك كنفرانس :
4445660
سال انتشار :
1396
از صفحه :
1
تا صفحه :
8
سال انتشار :
1396
لينک به اين مدرک :
بازگشت