DocumentCode :
3644235
Title :
Interestingness -- Directing Analyst Focus to Significant Data
Author :
M. Bourassa;J. Fugère;D. Skillicorn
Author_Institution :
Dept. of Math &
fYear :
2011
Firstpage :
300
Lastpage :
307
Abstract :
Faced with a deluge of data, an analyst must ask ``what data records are important?´´ This paper answers that question by first defining a continuous spectrum of data record significance: ``known´´, ``anomalous´´, ``interesting´´, ``novel´´, and ``noise´´. The definition has a geometric interpretation in that the significance of a data record in a predictor system is inversely proportional to it´s distance from the decision boundary of the predictor. Meta-analysis of data means that the performance of the predictor is constantly evaluated to detect cues that the model still valid for the current reality of the data it processes. A principled approach to the meta-analysis of data using the preceding definition was outlined and implemented using a predictor scenario. Support vector machine ensembles were used as novelty, prediction and interestingness models. The system was successfully used to rank the significance of data records and to assess the performance of the predictor for increasingly complex toy and real-world data. A ``NOVINT´´ plot was introduced as a means of visualizing data record significance and drawing an analyst´s attention to significant information. The plot was also shown to be equally useful in providing insight in to both the nature of the data and the performance of the predictor.
Keywords :
"Support vector machines","Predictive models","Data models","Monitoring","Iris","Detectors","Geometry"
Publisher :
ieee
Conference_Titel :
Intelligence and Security Informatics Conference (EISIC), 2011 European
Print_ISBN :
978-1-4577-1464-1
Type :
conf
DOI :
10.1109/EISIC.2011.53
Filename :
6061222
Link To Document :
بازگشت