A machine learning methodology for medical imaging anonymization

Author

Eriksson Monteiro;Carlos Costa;José Luis Oliveira

Author_Institution

Univ. of Aveiro, Portugal

fYear

2015

Firstpage

1381

Lastpage

1384

Abstract

Privacy protection is a major requirement for the complete success of EHR systems, becoming even more critical in collaborative scenarios, where data is shared among institutions and practitioners. While textual data can be easily de-identified, patient data in medical images implies a more elaborate approach. In this work we present a solution for sensitive word identification in medical images based on a combination of two machine-learning models, achieving a F1-score of 0.94. Three experts evaluated the system performance. They analyzed the output of the present methodology and categorized the studies in three groups: studies that had their sensitive words removed (true positive), studies with complete patient identity (false negative) and studies with mistakenly removed data (false positive). The experts were unanimous regarding the relevance of the present tool in collaborative medical environments, as it may improve the exchange of anonymized patient data between institutions.

Keywords

"DICOM","Optical character recognition software","Text recognition","Metadata","Image recognition","Pipelines"

Publisher

ieee

Conference_Titel

Engineering in Medicine and Biology Society (EMBC), 2015 37th Annual International Conference of the IEEE

ISSN

1094-687X

Electronic_ISBN

1558-4615

Type

conf

DOI

10.1109/EMBC.2015.7318626

Filename

7318626