• DocumentCode
    3692792
  • Title

    Towards an information type lexicon for privacy policies

  • Author

    Jaspreet Bhatia;Travis D. Breaux

  • Author_Institution
    Carnegie Mellon University, Pittsburgh, Pennsylvania, United States
  • fYear
    2015
  • fDate
    8/25/2015 12:00:00 AM
  • Firstpage
    19
  • Lastpage
    24
  • Abstract
    Privacy policies serve to inform consumers about a company´s data practices, and to protect the company from legal risk due to undisclosed uses of consumer data. In addition, US and EU regulators require companies to accurately describe their practices in these policies, and some laws prescribe how companies should write these policies. Despite these aims, privacy policies are frequently criticized for being vague and uninformative. To support and improve the analysis of privacy policies, we report results from constructing an information type lexicon from manual, human annotations and an entity extractor based on part-of-speech tagging. The lexicon was constructed from 3,850 annotations obtained from crowd workers analyzing 15 privacy policies. An entity extractor was designed to extract entities from these annotations. The extractor succeeds at finding entities in 92% of annotations and the lexicon consists of 725 unique entities. Finally, we measured the terminological reuse across all 15 policies and observed the lexicon has a 31-78% chance of containing a word from any previously seen policy.
  • Keywords
    "Privacy","Crowdsourcing","Data privacy","Natural language processing","Companies","Manuals"
  • Publisher
    ieee
  • Conference_Titel
    Requirements Engineering and Law (RELAW), 2015 IEEE Eighth International Workshop on
  • Type

    conf

  • DOI
    10.1109/RELAW.2015.7330207
  • Filename
    7330207