• DocumentCode
    3663176
  • Title

    Contamination estimation via convex relaxations

  • Author

    Matthew L. Malloy;Scott Alfeld;Paul Barford

  • Author_Institution
    comScore, USA
  • fYear
    2015
  • fDate
    6/1/2015 12:00:00 AM
  • Firstpage
    1189
  • Lastpage
    1193
  • Abstract
    Identifying anomalies and contamination in datasets is important in a wide variety of settings. In this paper, we describe a new technique for estimating contamination in large, discrete valued datasets. Our approach considers the normal condition of the data to be specified by a model consisting of a set of distributions. Our key contribution is in our approach to contamination estimation. Specifically, we develop a technique that identifies the minimum number of data points that must be discarded (i.e., the level of contamination) from an empirical data set in order to match the model to within a specified goodness-of-fit, controlled by a p-value. Appealing to results from large deviations theory, we show a lower bound on the level of contamination is obtained by solving a series of convex programs. Theoretical results guarantee the bound converges at a rate of O(√log(p)/p), where p is the size of the empirical data set.
  • Keywords
    "Computational modeling","Indexes","Computers","Biological system modeling","Monitoring","Biomedical monitoring"
  • Publisher
    ieee
  • Conference_Titel
    Information Theory (ISIT), 2015 IEEE International Symposium on
  • Electronic_ISBN
    2157-8117
  • Type

    conf

  • DOI
    10.1109/ISIT.2015.7282643
  • Filename
    7282643