DocumentCode :
1800043
Title :
Development of large-scale TCM corpus using hybrid named entity recognition methods for clinical phenotype detection: An initial study
Author :
Lizhi Feng ; Xuezhong Zhou ; Haixun Qi ; Runshun Zhang ; YingHui Wang ; Baoyan Liu
Author_Institution :
Sch. of Comput. & Inf. Technol. & Beijing Key Lab. of Traffic Data Anal. & Min., Beijing Jiaotong Univ., Beijing, China
fYear :
2014
fDate :
9-12 Dec. 2014
Firstpage :
1
Lastpage :
7
Abstract :
Clinical data is one of the core data repositories in traditional Chinese medicine (TCM) because TCM is a clinically based medicine. However, most clinical data like electronic medical record in TCM is still in free text. Due to the lack of large-scale annotation corpus in TCM field, in this paper, we aim to develop an annotation system for TCM clinical text corpus. To reduce the manual labors, we implement three named entity recognition methods like supervised machine learning method, unsupervised method and structured data comparison, to assist the batch annotations of clinical records before manual checking. We developed the system using Java and have curated more than 2,000 records of chief complaint in an effective way.
Keywords :
Java; electronic health records; natural language processing; text analysis; unsupervised learning; Java; TCM clinical text corpus; annotation system; batch annotations; clinical data; clinical phenotype detection; clinical records; clinically based medicine; core data repositories; electronic medical record; large-scale annotation corpus; manual checking; named entity recognition methods; structured data comparison; supervised machine learning method; traditional Chinese medicine; unsupervised method; Data mining; Databases; Hidden Markov models; Manuals; Medical diagnostic imaging; Standards; Training; annotation system; clinical records; named entity recognition; traditional Chinese medicine;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computational Intelligence in Big Data (CIBD), 2014 IEEE Symposium on
Conference_Location :
Orlando, FL
Type :
conf
DOI :
10.1109/CIBD.2014.7011532
Filename :
7011532
Link To Document :
بازگشت