We show that multiple machine learning algorithms can match human performance in classifying transient imaging data from the SDSS supernova survey into real objects and artefacts. This is the first step in any transient science pipeline and is currently still done by humans, but future surveys such as LSST will necessitate fully machine-enabled solutions. Using features trained from eigenimage analysis (PCA) of single-epoch g, r, i-difference images we can reach a completeness (recall) of 95%, while only incorrectly classifying 18% of artefacts as real objects, corresponding to a precision (purity) of 85%. In general the k-nearest neighbour and the SkyNet artificial neural net algorithms performed most robustly compared to other methods such as naive Bayes and kernel SVM. Our results show that PCA-based machine learning can match human success levels and can naturally be extended by including multiple epochs of data, transient colours and host galaxy information which should allow for significant further improvements, especially at low signal to noise.
L. Buisson, N. Sivanandam, B. Bassett, et. al.
Thu, 17 Jul 14
Comments: 11 pages, 8 figures