• DocumentCode
    3040190
  • Title

    A Study on Classification in Imbalanced and Partially-Labelled Data Streams

  • Author

    Lyon, R.J. ; Brooke, J.M. ; Knowles, J.D. ; Stappers, B.W.

  • Author_Institution
    Sch. of Comput. Sci., Univ. of Manchester, Manchester, UK
  • fYear
    2013
  • fDate
    13-16 Oct. 2013
  • Firstpage
    1506
  • Lastpage
    1511
  • Abstract
    The domain of radio astronomy is currently facing significant computational challenges, foremost amongst which are those posed by the development of the world´s largest radio telescope, the Square Kilometre Array (SKA). Preliminary specifications for this instrument suggest that the final design will incorporate between 2000 and 3000 individual 15 metre receiving dishes, which together can be expected to produce a data rate of many TB/s. Given such a high data rate, it becomes crucial to consider how this information will be processed and stored to maximise its scientific utility. In this paper, we consider one possible data processing scenario for the SKA, for the purposes of an all-sky pulsar survey. In particular we treat the selection of promising signals from the SKA processing pipeline as a data stream classification problem. We consider the feasibility of classifying signals that arrive via an unlabelled and heavily class imbalanced data stream, using currently available algorithms and frameworks. Our results indicate that existing stream learners exhibit unacceptably low recall on real astronomical data when used in standard configuration, however, good false positive performance and comparable accuracy to static learners, suggests they have definite potential as an on-line solution to this particular big data challenge.
  • Keywords
    Big Data; astronomy computing; pattern classification; radioastronomy; radiotelescopes; Big Data challenge; SKA; Square Kilometre Array; all-sky pulsar survey; astronomical data; data processing; imbalanced data stream classification; partially-labelled data stream classification; radio astronomy; radio telescope; signal classification; Accuracy; Labeling; Pipelines; Radio astronomy; Real-time systems; Support vector machines; Telescopes; Astroinformatics; Classification; Data Streams; Imbalanced Learning; Unlabelled Data;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Systems, Man, and Cybernetics (SMC), 2013 IEEE International Conference on
  • Conference_Location
    Manchester
  • Type

    conf

  • DOI
    10.1109/SMC.2013.260
  • Filename
    6722013