Automated variability selection in time-domain imaging surveys using sparse representations with learned dictionaries

Author

Daniela I. Moody;Przemek R. Wozniak;Steven P. Brumby

Author_Institution

Descartes Labs, Los Alamos, NM 87544, United States

fYear

2015

Firstpage

Lastpage

Abstract

Exponential growth in data streams and discovery power delivered by modern time-domain imaging surveys creates a pressing need for variability extraction algorithms that are both fully automated and highly reliable. The current state of the art methods based on image differencing are limited by the fact that for every real variable source the algorithm returns a large number of bogus “detections” caused by atmospheric effects and instrumental signatures coupled with imperfect image processing. Here we present a new approach to this problem inspired by recent advances in computer vision and train the machine to learn new features directly from pixel data. The training data set comes from the Palomar Transient Factory survey and consists of small images centered around transient candidates with known real/bogus classification. This set of high-dimensional vectors (~1000 features) is then transformed into a linear representation using the so called dictionary, an overcomplete feature set constructed separately for each class. The data vectors are well approximated with a small number of dictionary elements, i.e. the dictionary representation is sparse. We show how sparse representations can be used to construct informative features for any suitable machine learning classifier. Our top level classifier is based on the random forest algorithm (collections of decision trees) with input data vectors consisting of up to 6 computer vision features and 20 additional context features designed by subject domain experts. Machine-learned features alone provide only an approximate classification with a 20% missed detection rate at a fixed false positive rate of 1%. When automatically extracted features are appended to those constructed by humans, the rate of missed detections is reduced from 8% to about 4% at 1% false positive rate.

Keywords

"Dictionaries","Transient analysis","Training","Classification algorithms","Feature extraction","Convergence","Machine learning algorithms"

Publisher

ieee

Conference_Titel

Applied Imagery Pattern Recognition Workshop (AIPR), 2015 IEEE

Electronic_ISBN

2332-5615

Type

conf

DOI

10.1109/AIPR.2015.7444552

Filename

7444552

Link To Document

https://search.isc.ac/dl/search/defaultta.aspx?DTC=49&DC=3765330