• DocumentCode
    2052238
  • Title

    LAIR: A Language for Automated Semantics-Aware Text Sanitization Based on Frame Semantics

  • Author

    Hedegaard, Steffen ; Houen, Søren ; Simonsen, Jakob Grue

  • Author_Institution
    Dept. of Comput. Sci., Univ. of Copenhagen (DIKU), Copenhagen, Denmark
  • fYear
    2009
  • fDate
    14-16 Sept. 2009
  • Firstpage
    47
  • Lastpage
    52
  • Abstract
    We present LAIR: A domain-specific language that enables users to specify actions to be taken upon meeting specific semantic frames in a text, in particular to rephrase and redact the textual content. While LAIR presupposes superficial knowledge of frames and frame semantics, it requires only limited prior programming experience. It neither contain scripting or I/O primitives, nor does it contain general loop constructions and is not Turing-complete. We have implemented a LAIR compiler and integrated it in a pipeline for automated redaction of web pages. We detail our experience with automated redaction of web pages for subjectively undesirable content; initial experiments suggest that using a small language based on semantic recognition of undesirable terms can be highly useful as a supplement to traditional methods of text sanitization.
  • Keywords
    Internet; computational linguistics; natural languages; program compilers; text analysis; LAIR compiler; automated Web page redaction; automated semantics-aware text sanitization; domain-specific language; frame semantics; language for automatically inferred redaction; semantic recognition; textual content; Computer science; Data security; Domain specific languages; Government; Hospitals; Information security; Natural languages; Pipelines; Text recognition; Web pages;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Semantic Computing, 2009. ICSC '09. IEEE International Conference on
  • Conference_Location
    Berkeley, CA
  • Print_ISBN
    978-1-4244-4962-0
  • Electronic_ISBN
    978-0-7695-3800-6
  • Type

    conf

  • DOI
    10.1109/ICSC.2009.79
  • Filename
    5298551