DocumentCode :
1803924
Title :
t-Plausibility: Semantic Preserving Text Sanitization
Author :
Jiang, Wei ; Murugesan, Mummoorthy ; Clifton, Chris ; Si, Luo
Author_Institution :
Dept. of Comput. Sci., Missouri Univ. of Sci. & Technol., Rolla, MO, USA
Volume :
3
fYear :
2009
fDate :
29-31 Aug. 2009
Firstpage :
68
Lastpage :
75
Abstract :
Text documents play significant roles in decision making and scientific research. Under federal regulations, documents (e.g., pathology records) containing personally identifiable information cannot be shared freely, unless properly sanitized. Generally speaking, document sanitization consists of finding and hiding personally identifiable information. The first task has received much attention from the research community, but the main strategy for the second task has been to simply remove personal identifiers and very sensitive information (e.g., diseases and treatment). It is not hard to see that if important information (e.g., diagnoses and personal medical histories) is completely removed from pathology records, these records are no longer readable, and even worse, they no longer contain sufficient information for research purposes.Observe that the sensitive information "tuberculosis" can be replaced with the less sensitive term "infectious disease". That is, instead of simply removing sensitive terms, these terms can be hidden by more general but semantically related terms to protect sensitive information, without unnecessarily degrading the amount of information contained in the document. Based on this observation, the main contribution of this paper is to provide a novel information theoretic approach to text sanitization,and develop efficient heuristics to sanitize text documents.
Keywords :
data privacy; information dissemination; medical administrative data processing; document sanitization; federal regulation; pathology record; personal identifier removal; personal medical history; personally identifiable information; semantic preserving text sanitization; sensitive information protection; sensitive information removal; t-plausibility; text document; Computer science; Degradation; Diseases; Drugs; History; Medical diagnostic imaging; Medical treatment; Pain; Pathology; Protection; anonymization; privacy; text documents;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computational Science and Engineering, 2009. CSE '09. International Conference on
Conference_Location :
Vancouver, BC
Print_ISBN :
978-1-4244-5334-4
Electronic_ISBN :
978-0-7695-3823-5
Type :
conf
DOI :
10.1109/CSE.2009.353
Filename :
5283278
Link To Document :
بازگشت