• DocumentCode
    3706619
  • Title

    An HL7 Data Pseudonymization Pipeline

  • Author

    Thusitha Mabotuwana;Christopher S. Hall;Rob van Ommering;Ranjith Tellis;Merlijn Sevenster

  • Author_Institution
    Philips Healthcare, New York, NY, USA
  • fYear
    2015
  • Firstpage
    303
  • Lastpage
    309
  • Abstract
    The increasing uptake of information technology in the healthcare domain has resulted in a large volume of digital health data being generated on a regular basis. Most of the health information systems exchange information using HL7 messages making HL7 a good source for data collection. Despite the large volume of data that is generated within the treatment facilities, sharing this data for research and development purposes is not straightforward due to patient privacy protection legislation. In this paper we present an HL7 data processing pipeline that extracts information from complex HL7 data structures, de-identifies fields containing identifiable patient information, and reconstructs HL7 messages using the de-identified data, while providing a secure mechanism for re-identification. Using a dataset of 149,647 production HL7 messages, we demonstrate how the system has over 99% accuracy, suggesting that this is a feasible approach to de-identify patient data.
  • Keywords
    "Data mining","Hospitals","Pipelines","Data processing","Production","Standards"
  • Publisher
    ieee
  • Conference_Titel
    Healthcare Informatics (ICHI), 2015 International Conference on
  • Type

    conf

  • DOI
    10.1109/ICHI.2015.43
  • Filename
    7349704