Title :
Automatic segmentation of clinical texts
Author :
Apostolova, Emilia ; Channin, David S. ; Demner-Fushman, Dina ; Furst, Jacob ; Lytinen, Steven ; Raicu, Daniela
Author_Institution :
Coll. of Comput. & Digital Media, DePaul Univ., Chicago, IL, USA
Abstract :
Clinical narratives, such as radiology and pathology reports, are commonly available in electronic form. However, they are also commonly entered and stored as free text. Knowledge of the structure of clinical narratives is necessary for enhancing the productivity of healthcare departments and facilitating research. This study attempts to automatically segment medical reports into semantic sections. Our goal is to develop a robust and scalable medical report segmentation system requiring minimum user input for efficient retrieval and extraction of information from free-text clinical narratives. Hand-crafted rules were used to automatically identify a high-confidence training set. This automatically created training dataset was later used to develop metrics and an algorithm that determines the semantic structure of the medical reports. A word-vector cosine similarity metric combined with several heuristics was used to classify each report sentence into one of several pre-defined semantic sections. This baseline algorithm achieved 79% accuracy. A support vector machine (SVM) classifier trained on additional formatting and contextual features was able to achieve 90% accuracy. Plans for future work include developing a configurable system that could accommodate various medical report formatting and content standards.
Keywords :
data structures; health care; information retrieval; support vector machines; text analysis; baseline algorithm; clinical narratives; clinical text automatic segmentation; free-text clinical narratives; hand-crafted rules; healthcare departments; information extraction; information retrieval; medical report formatting; medical report segmentation system; pathology; radiology; support vector machine classifier; word-vector cosine similarity metric; Algorithms; Artificial Intelligence; Documentation; Information Storage and Retrieval; Medical Records; Natural Language Processing; Pattern Recognition, Automated; Semantics;
Conference_Titel :
Engineering in Medicine and Biology Society, 2009. EMBC 2009. Annual International Conference of the IEEE
Conference_Location :
Minneapolis, MN
Print_ISBN :
978-1-4244-3296-7
Electronic_ISBN :
1557-170X
DOI :
10.1109/IEMBS.2009.5334831