• DocumentCode
    117249
  • Title

    Computing on masked data: a high performance method for improving big data veracity

  • Author

    Kepner, Jeremy ; Gadepally, Vijay ; Michaleas, Pete ; Schear, Nabil ; Varia, Mayank ; Yerukhimovich, Arkady ; Cunningham, Robert K.

  • Author_Institution
    MIT Lincoln Lab., Lexington, MA, USA
  • fYear
    2014
  • fDate
    9-11 Sept. 2014
  • Firstpage
    1
  • Lastpage
    6
  • Abstract
    The growing gap between data and users calls for innovative tools that address the challenges faced by big data volume, velocity and variety. Along with these standard three V´s of big data, an emerging fourth “V” is veracity, which addresses the confidentiality, integrity, and availability of the data. Traditional cryptographic techniques that ensure the veracity of data can have overheads that are too large to apply to big data. This work introduces a new technique called Computing on Masked Data (CMD), which improves data veracity by allowing computations to be performed directly on masked data and ensuring that only authorized recipients can unmask the data. Using the sparse linear algebra of associative arrays, CMD can be performed with significantly less overhead than other approaches while still supporting a wide range of linear algebraic operations on the masked data. Databases with strong support of sparse operations, such as SciDB or Apache Accumulo, are ideally suited to this technique. Examples are shown for the application of CMD to a complex DNA matching algorithm and to database operations over social media data.
  • Keywords
    Big Data; cryptography; data integrity; data privacy; linear algebra; Apache Accumulo; CMD; SciDB; associative array; big data variety; big data velocity; big data veracity; big data volume; complex DNA matching algorithm; computing on masked data; cryptographic technique; data availability; data confidentiality; data integrity; database operation; high performance method; linear algebraic operation; social media data; sparse linear algebra; Arrays; Big data; DNA; Databases; Encryption; Sparse matrices; Accumulo; Big Data; D4M; Encryption; Security;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    High Performance Extreme Computing Conference (HPEC), 2014 IEEE
  • Conference_Location
    Waltham, MA
  • Print_ISBN
    978-1-4799-6232-7
  • Type

    conf

  • DOI
    10.1109/HPEC.2014.7040946
  • Filename
    7040946