Title :
A General Purpose Phenotype Algorithm for Venous Thromboembolism Using Billing Codes and Natural Language Processing
Author :
Hinz, Eugenia R McPeek ; Bastarache, Lisa ; Denny, Joshua C.
Author_Institution :
Dept. of Biomed. Inf., Vanderbilt Univ., Nashville, TN, USA
Abstract :
Deep venous thrombosis and pulmonary embolism are diseases associated with significant morbidity and mortality. Well described risk factors for venous thromboembolic disease (VTE) include immobility, trauma and genetic hypercoagulabilty states, still many cases have no known associated antecedent risks. Studies to potentially define the missing risk factors preferably identify all cases of VTE. Defining VTE in the electronic health record is more challenging due to the variable duration of VTE treatment, crossover of therapeutic modalities to other chronic diseases and prevention treatment related to hospitalizations. We designed a general purpose Natural Language (NLP) algorithm to capture acute and historical cases of thromboembolic disease retrospectively in a de-identified electronic health record. Applying the NLP algorithm to a separate evaluation set found a positive predictive value of 84.7% and sensitivity of 95.3% for an F-measure of 0.897, which was similar to the training set of 0.925. Use of the same algorithm on problem lists in patients without VTE ICD-9s resulted in a PPV of 83%. NLP of VTE ICD-9 positive cases and non-ICD-9 positive problem lists provides an effective means for capture of both acute and historical cases of venous thromboembolic disease.
Keywords :
cancer; injuries; medical information systems; natural language processing; patient treatment; risk analysis; F-measure; NLP algorithm; VTE treatment; billing codes; cancer; chronic diseases; deep venous thrombosis; electronic health record; general purpose phenotype algorithm; heritable hypercoagulabilty state; hospitalization; immobility; morbidity; mortality; natural language processing; positive predictive value; pulmonary embolism; risk factors; sensitivity; therapeutic modality; trauma; venous thromboembolic disease; Diseases; Educational institutions; Informatics; Natural language processing; Prediction algorithms; Sensitivity; Training;
Conference_Titel :
Healthcare Informatics, Imaging and Systems Biology (HISB), 2012 IEEE Second International Conference on
Conference_Location :
San Diego, CA
Print_ISBN :
978-1-4673-4803-4
DOI :
10.1109/HISB.2012.74