DocumentCode :
3765330
Title :
Automated variability selection in time-domain imaging surveys using sparse representations with learned dictionaries
Author :
Daniela I. Moody;Przemek R. Wozniak;Steven P. Brumby
Author_Institution :
Descartes Labs, Los Alamos, NM 87544, United States
fYear :
2015
Firstpage :
1
Lastpage :
8
Abstract :
Exponential growth in data streams and discovery power delivered by modern time-domain imaging surveys creates a pressing need for variability extraction algorithms that are both fully automated and highly reliable. The current state of the art methods based on image differencing are limited by the fact that for every real variable source the algorithm returns a large number of bogus “detections” caused by atmospheric effects and instrumental signatures coupled with imperfect image processing. Here we present a new approach to this problem inspired by recent advances in computer vision and train the machine to learn new features directly from pixel data. The training data set comes from the Palomar Transient Factory survey and consists of small images centered around transient candidates with known real/bogus classification. This set of high-dimensional vectors (~1000 features) is then transformed into a linear representation using the so called dictionary, an overcomplete feature set constructed separately for each class. The data vectors are well approximated with a small number of dictionary elements, i.e. the dictionary representation is sparse. We show how sparse representations can be used to construct informative features for any suitable machine learning classifier. Our top level classifier is based on the random forest algorithm (collections of decision trees) with input data vectors consisting of up to 6 computer vision features and 20 additional context features designed by subject domain experts. Machine-learned features alone provide only an approximate classification with a 20% missed detection rate at a fixed false positive rate of 1%. When automatically extracted features are appended to those constructed by humans, the rate of missed detections is reduced from 8% to about 4% at 1% false positive rate.
Keywords :
"Dictionaries","Transient analysis","Training","Classification algorithms","Feature extraction","Convergence","Machine learning algorithms"
Publisher :
ieee
Conference_Titel :
Applied Imagery Pattern Recognition Workshop (AIPR), 2015 IEEE
Electronic_ISBN :
2332-5615
Type :
conf
DOI :
10.1109/AIPR.2015.7444552
Filename :
7444552
Link To Document :
بازگشت