Title :
De-identification in natural language processing
Author :
Vincze, Veronika ; Farkas, RichaÌrd
Author_Institution :
MTA-SZTE Res. Group on Artificial Intell., Univ. of Szeged Szeged, Szeged, Hungary
Abstract :
Natural language processing (NLP) systems usually require a huge amount of textual data but the publication of such datasets is often hindered by privacy and data protection issues. Here, we discuss the questions of de-identification related to three NLP areas, namely, clinical NLP, NLP for social media and information extraction from resumes. We also illustrate how de-identification is related to named entity recognition and we argue that de-identification tools can be successfully built on named entity recognizers.
Keywords :
data privacy; natural language processing; NLP areas; NLP systems; data protection; information extraction; natural language processing; privacy protection; social media; textual data; Databases; Educational institutions; Electronic mail; Informatics; Information retrieval; Media; Natural language processing;
Conference_Titel :
Information and Communication Technology, Electronics and Microelectronics (MIPRO), 2014 37th International Convention on
Conference_Location :
Opatija
Print_ISBN :
978-953-233-081-6
DOI :
10.1109/MIPRO.2014.6859768