Analysis of correlation structure of data set for efficient pattern classification

Author

Goswami, Saptarsi ; Chakrabarti, Amlan ; Chakraborty, Basabi

Author_Institution

Inst. of Eng. & Manage., Kolkata, India

fYear

2015

fDate

24-26 June 2015

Firstpage

24

Lastpage

29

Abstract

Pattern classification or clustering plays important role in a wide variety of applications in different areas like psychology and other social sciences, biology and medical sciences, pattern recognition and data mining. A lot of algorithms for supervised or unsupervised classification have been developed so far in order to achieve high classification accuracy with lower computational cost. However, some methods or algorithms work well for some of the data sets and perform poorly on others. For any particular data set, it is difficult to find out the most suitable algorithm without some random trial and error process. It seems that the characteristics of the data set might have some influence on the algorithm for classification. In this work, the data set characteristics is studied in terms of intra attribute relationship and a measure MVS (multivariate score) has been proposed to quantify and group different data sets on the basis of the correlation structure into strong independent, weak independent, weak correlated and strong correlated data set. The performance of different feature selection algorithms on different groups of data are studied by simulation experiments with 63 publicly available bench mark data sets. It has been verified that univariate methods lead to significant performance gain for strong independent data set compared to multivariate methods while multivariate methods have better performance for strong correlated data sets.

Keywords

data analysis; feature selection; pattern classification; pattern clustering; MVS; correlation structure analysis; data set characteristics; feature selection algorithms; intra attribute relationship; multivariate methods; multivariate score; pattern classification; pattern clustering; strong correlated data set; strong independent data set; univariate methods; weak correlated data set; weak independent data set; Accuracy; Classification algorithms; Clustering algorithms; Correlation; Data models; Histograms; Iris; Pattern classification algorithm; correlation structure; data set characteristics;

fLanguage

English

Publisher

ieee

Conference_Titel

Cybernetics (CYBCONF), 2015 IEEE 2nd International Conference on

Conference_Location

Gdynia

Print_ISBN

978-1-4799-8320-9

Type

conf

DOI

10.1109/CYBConf.2015.7175901

Filename

7175901