Title :
High-Performance Biomedical Association Mining with MapReduce
Author :
Yanqing Ji ; Yun Tian ; Fangyang Shen ; Tran, John
Author_Institution :
Gonzaga Univ., Spokane, WA, USA
Abstract :
MapReduce has been applied to data-intensive applications in different domains because of its simplicity, scalability and fault-tolerance. However, its uses in biomedical association mining are still very limited. In this paper, we investigate using MapReduce to efficiently mine the associations between biomedical terms extracted from a set of biomedical articles. First, biomedical terms were obtained by matching text to Unified Medical Language System (UMLS) Metathesaurus, a biomedical vocabulary and standard database. Then we developed a MapReduce algorithm that could be used to calculate a category of interestingness measures defined on the basis of a 2×2 contingency table. This algorithm consists of two MapReduce jobs and takes a stripes approach to reduce the number of intermediate results. Experiments were conducted using Amazon Elastic MapReduce (EMR) with an input of 3610 articles retrieved from two biomedical journals. Test results indicate that our algorithm has linear scalability.
Keywords :
data mining; distributed processing; medical computing; Amazon Elastic MapReduce; EMR; MapReduce algorithm; UMLS; biomedical articles; biomedical terms; biomedical vocabulary; data intensive applications; high-performance biomedical association mining; linear scalability; metathesaurus; standard database; text matching; unified medical language system; Biomedical measurement; Clustering algorithms; Data mining; Databases; Servers; Standards; Unified modeling language; Association Mining; Biomedical Literature; High-Performance Computing; MapReduce;
Conference_Titel :
Information Technology - New Generations (ITNG), 2015 12th International Conference on
Conference_Location :
Las Vegas, NV
Print_ISBN :
978-1-4799-8827-3
DOI :
10.1109/ITNG.2015.80