Title :
Data deidentification in medical transcriptions using regular expressions and machine learning
Author :
Joshua Seeger;Aron Culotta;Jason Keller;Patrick van Kessel;Michael Jugovich
Author_Institution :
NORC at the University of Chicago, 1 North State Street, 14th Floor, Chicago, IL 60602
Abstract :
A system is developed to redact personally identifiable information (PII) through a combination of entity recognition, regular expressions, and machine learning with very high precision from millions of medical transcriptions. This system is trained and tested with manually redacted medical transcriptions using an internally developed coding system, providing double blind classification capabilities.
Keywords :
"Medical services","Medical diagnostic imaging","Encoding","Pipelines","Manuals","Floors","Big data"
Conference_Titel :
Big Data (Big Data), 2015 IEEE International Conference on
DOI :
10.1109/BigData.2015.7363889