• DocumentCode
    2331142
  • Title

    A novel framework to elucidate core classes in a dataset

  • Author

    Soria, Daniele ; Garibaldi, Jonathan M.

  • Author_Institution
    Sch. of Comput. Sci., Univ. of Nottingham, Nottingham, UK
  • fYear
    2010
  • fDate
    18-23 July 2010
  • Firstpage
    1
  • Lastpage
    8
  • Abstract
    In this paper we present an original framework to extract representative groups from a dataset, and we validate it over a novel case study. The framework specifies the application of different clustering algorithms, then several statistical and visualisation techniques are used to characterise the results, and core classes are defined by consensus clustering. Classes may be verified using supervised classification algorithms to obtain a set of rules which may be useful for new data points in the future. This framework is validated over a novel set of histone markers for breast cancer patients. From a technical perspective, the resultant classes are well separated and characterised by low, medium and high levels of biological markers. Clinically, the groups appear to distinguish patients with poor overall survival from those with low grading score and better survival. Overall, this framework offers a promising methodology for elucidating core consensus groups from data.
  • Keywords
    cancer; data handling; data visualisation; medical diagnostic computing; pattern classification; pattern clustering; statistical analysis; biological marker; breast cancer patient; clustering algorithm; consensus clustering; core class elucidation; grading score; histone marker; representative group; statistical technique; supervised classification algorithm; visualisation technique; Algorithm design and analysis; Breast cancer; Clustering algorithms; Educational institutions; Indexes; Partitioning algorithms; Robustness;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Evolutionary Computation (CEC), 2010 IEEE Congress on
  • Conference_Location
    Barcelona
  • Print_ISBN
    978-1-4244-6909-3
  • Type

    conf

  • DOI
    10.1109/CEC.2010.5586331
  • Filename
    5586331