Title :
A Study on Classification in Imbalanced and Partially-Labelled Data Streams
Author :
Lyon, R.J. ; Brooke, J.M. ; Knowles, J.D. ; Stappers, B.W.
Author_Institution :
Sch. of Comput. Sci., Univ. of Manchester, Manchester, UK
Abstract :
The domain of radio astronomy is currently facing significant computational challenges, foremost amongst which are those posed by the development of the world´s largest radio telescope, the Square Kilometre Array (SKA). Preliminary specifications for this instrument suggest that the final design will incorporate between 2000 and 3000 individual 15 metre receiving dishes, which together can be expected to produce a data rate of many TB/s. Given such a high data rate, it becomes crucial to consider how this information will be processed and stored to maximise its scientific utility. In this paper, we consider one possible data processing scenario for the SKA, for the purposes of an all-sky pulsar survey. In particular we treat the selection of promising signals from the SKA processing pipeline as a data stream classification problem. We consider the feasibility of classifying signals that arrive via an unlabelled and heavily class imbalanced data stream, using currently available algorithms and frameworks. Our results indicate that existing stream learners exhibit unacceptably low recall on real astronomical data when used in standard configuration, however, good false positive performance and comparable accuracy to static learners, suggests they have definite potential as an on-line solution to this particular big data challenge.
Keywords :
Big Data; astronomy computing; pattern classification; radioastronomy; radiotelescopes; Big Data challenge; SKA; Square Kilometre Array; all-sky pulsar survey; astronomical data; data processing; imbalanced data stream classification; partially-labelled data stream classification; radio astronomy; radio telescope; signal classification; Accuracy; Labeling; Pipelines; Radio astronomy; Real-time systems; Support vector machines; Telescopes; Astroinformatics; Classification; Data Streams; Imbalanced Learning; Unlabelled Data;
Conference_Titel :
Systems, Man, and Cybernetics (SMC), 2013 IEEE International Conference on
Conference_Location :
Manchester
DOI :
10.1109/SMC.2013.260