DocumentCode :
2766024
Title :
An initial study of full parsing of clinical text using the Stanford Parser
Author :
Xu, Hua ; AbdelRahman, Samir ; Jiang, Min ; Fan, Jung-wei ; Huang, Yang
Author_Institution :
Dept. of Biomed. Inf., Vanderbilt Univ., Nashville, TN, USA
fYear :
2011
fDate :
12-15 Nov. 2011
Firstpage :
607
Lastpage :
614
Abstract :
Full parsing recognizes a sentence and generates a syntactic structure of it (a parse tree), which is useful for many natural language processing (NLP) applications. The Stanford Parser is one of the state-of-art parsers in the general English domain. However, there is no formal evaluation of its performance in clinical text that often contains ungrammatical structures. In this study, we randomly selected 50 sentences in the clinical corpus from 2010 i2b2 NLP challenge and manually annotated them to create a gold standard of parse trees. Our evaluation showed that the original Stanford Parser achieved a bracketing F-measure (BF) of 77% on the gold standard. Moreover, we assessed the effect of part-of-speech (POS) tags on parsing and our results showed that manually corrected POS tags achieved a maximum BF of 81%. Furthermore, we analyzed errors of the Stanford Parser and provided valuable insights to large-scale parse tree annotation for clinical text.
Keywords :
grammars; medical computing; natural language processing; text analysis; 2011; F-measure; Stanford parser; clinical corpus; clinical text parsing; natural language processing application; parse tree; part-of-speech tag; Gold; Guidelines; Manuals; Medical services; Natural language processing; Syntactics; Tagging;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Bioinformatics and Biomedicine Workshops (BIBMW), 2011 IEEE International Conference on
Conference_Location :
Atlanta, GA
Print_ISBN :
978-1-4577-1612-6
Type :
conf
DOI :
10.1109/BIBMW.2011.6112438
Filename :
6112438
Link To Document :
بازگشت