DocumentCode :
2545668
Title :
Ontology-Based Temporal Relation Modeling with MapReduce Latent Dirichlet Allocations for Big EHR Data
Author :
Dingcheng Li ; Cui Tao ; Hongfang Liu ; Chute, C.
Author_Institution :
Biomed. Stat. & Inf., Mayo Clinic, Rochester, NY, USA
fYear :
2012
fDate :
1-3 Nov. 2012
Firstpage :
708
Lastpage :
715
Abstract :
In this paper, we propose a model called Temporal & Co reference Topic Modeling (TCTM) to do automatic annotation with respect to the Time Event Ontology (TEO) for the big-size Electronic Health Record (EHR). TCTM, based on Latent Dirichlet Allocations (LDA) and integrated into MapReduce framework, inherently addresses the twin problem of data sparseness and high dimensionality. As a non-parametric Bayesian model, it can flexibly add new attributes or features. Side information associated with corpora, such as section header, timestamp, sentence distance, event distance or disease category in clinical notes makes latent topics more interpretable and more biased toward co referring events. Furthermore, TCTM integrates Hidden Markov Model LDA (HMM-LDA) to obtain the power of both sequential modeling and exchangeability. A MapReduce based variational method is employed to do parameter estimation and inferences, thus enabling TCTM to overcome the bottleneck brought by big data.
Keywords :
hidden Markov models; inference mechanisms; medical information systems; ontologies (artificial intelligence); parallel processing; MapReduce framework; MapReduce latent Dirichlet allocation; big EHR data; clinical note; data dimensionality; data sparseness; disease category information; electronic health record; event distance information; hidden Markov model LDA; inference; nonparametric Bayesian model; ontology-based temporal relation modeling; parameter estimation; section header information; sentence distance information; sequential exchangeability; sequential modeling; temporal-and-company reference topic modeling; time event ontology; timestamp information; Computational modeling; Data handling; Data storage systems; Hidden Markov models; Information management; Tin; Wireless sensor networks; MapReduce; event coreference resolution; latent Dirichlet allocations; temporal relation annotation;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Cloud and Green Computing (CGC), 2012 Second International Conference on
Conference_Location :
Xiangtan
Print_ISBN :
978-1-4673-3027-5
Type :
conf
DOI :
10.1109/CGC.2012.112
Filename :
6382894
Link To Document :
بازگشت