Title of article
DNA barcoding using particle swarm optimization on apache spark SQL case study: DNA of covid-19
Author/Authors
Septem Riza, Lala Department of Computer Science Education - Universitas Pendidikan Indonesia, Indonesia , Ilham Nurfathiya, Muhammad Department of Computer Science Education - Universitas Pendidikan Indonesia, Indonesia , Kusnendar, Jajang Department of Computer Science Education - Universitas Pendidikan Indonesia, Indonesia , Fariza Abu Samah, Khyrina Airin Faculty of Computer and Mathematical Sciences - University Teknologi MARA Cawangan Melaka Kampus Jasin - Melaka, Malaysia
Pages
12
From page
1561
To page
1572
Abstract
The objective of this research is to design and implement a computational model to determine DNA
barcodes by utilizing the Particle Swarm Optimization (PSO) algorithms implemented on Big Data
Platforms, namely Apache Hadoop and Apache Spark. The steps are as follows: (i) inputting DNA
sequences to Hadoop Distributed File System (HDFS) in Apache Hadoop, (ii) pre-processing data,
(iii) implementing PSO by utilizing the User Defined Function (UDF) in Apache Spark, (iv) collecting
results and saving to HDFS. After obtaining the computational model, two following simulations have
been done: the first scenario is using 4 cores and several worker nodes, meanwhile, the second one
consists of a cluster with 2 worker nodes and several cores. In terms of computational time, the results
show a significant acceleration between standalone and big data platforms with both experimental
scenarios. This study proves that the computational model built on the big data platform shows the
development of features and acceleration of previous research.
Keywords
Big data , Algorithm , Particle swarm optimization , Similarity check , Motif discovery , DNA barcoding
Journal title
International Journal of Nonlinear Analysis and Applications
Serial Year
2021
Record number
2703084
Link To Document