مرکز منطقه ای اطلاع رساني علوم و فناوري - Classification of multi-genomic data using MapReduce paradigm

DocumentCode :

719120

Title :

Classification of multi-genomic data using MapReduce paradigm

Author :

Pahadia, Mayank ; Srivastava, Akash ; Srivastava, Divyang ; Patil, Nagamma

Author_Institution :

Dept. of Inf. Technol., Nat. Inst. of Technol. Karnataka, Surathkal, India

fYear :

2015

fDate :

15-16 May 2015

Firstpage :

678

Lastpage :

682

Abstract :

Counting the number of occurences of a substring in a string is a problem in many applications. This paper suggests a fast and efficient solution for the field of bioinformatics. A k-mer is a k-length substring of a biological sequence. k-mer counting is defined as counting the number of occurences of all the possible k-mers in a biological sequence. k-mer counting has uses in applications ranging from error correction of sequencing reads, genome assembly, disease prediction and feature extraction. We provide a Hadoop based solution to solve the k-mer counting problem and then use this for classification of multi-genomic data. The classification is done using classifiers like Naive Bayes, Decision Tree and Support Vector Machine(SVM). Accuracy of more than 99% is observed.

Keywords :

bioinformatics; data handling; feature extraction; genomics; parallel processing; support vector machines; Hadoop; Naive Bayes decision tree; SVM; bioinformatics; biological sequence; disease prediction; error correction; feature extraction; genome assembly; k-length substring; k-mer counting problem; multigenomic data; support vector machine; Accuracy; Bioinformatics; DNA; Decision trees; Genomics; Support vector machines;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Computing, Communication & Automation (ICCCA), 2015 International Conference on

Conference_Location :

Noida

Print_ISBN :

978-1-4799-8889-1

Type :

conf

DOI :

10.1109/CCAA.2015.7148460

Filename :

7148460

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=719120