DocumentCode :
3393568
Title :
Analysis of correlation structure of data set for efficient pattern classification
Author :
Goswami, Saptarsi ; Chakrabarti, Amlan ; Chakraborty, Basabi
Author_Institution :
Inst. of Eng. & Manage., Kolkata, India
fYear :
2015
fDate :
24-26 June 2015
Firstpage :
24
Lastpage :
29
Abstract :
Pattern classification or clustering plays important role in a wide variety of applications in different areas like psychology and other social sciences, biology and medical sciences, pattern recognition and data mining. A lot of algorithms for supervised or unsupervised classification have been developed so far in order to achieve high classification accuracy with lower computational cost. However, some methods or algorithms work well for some of the data sets and perform poorly on others. For any particular data set, it is difficult to find out the most suitable algorithm without some random trial and error process. It seems that the characteristics of the data set might have some influence on the algorithm for classification. In this work, the data set characteristics is studied in terms of intra attribute relationship and a measure MVS (multivariate score) has been proposed to quantify and group different data sets on the basis of the correlation structure into strong independent, weak independent, weak correlated and strong correlated data set. The performance of different feature selection algorithms on different groups of data are studied by simulation experiments with 63 publicly available bench mark data sets. It has been verified that univariate methods lead to significant performance gain for strong independent data set compared to multivariate methods while multivariate methods have better performance for strong correlated data sets.
Keywords :
data analysis; feature selection; pattern classification; pattern clustering; MVS; correlation structure analysis; data set characteristics; feature selection algorithms; intra attribute relationship; multivariate methods; multivariate score; pattern classification; pattern clustering; strong correlated data set; strong independent data set; univariate methods; weak correlated data set; weak independent data set; Accuracy; Classification algorithms; Clustering algorithms; Correlation; Data models; Histograms; Iris; Pattern classification algorithm; correlation structure; data set characteristics;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Cybernetics (CYBCONF), 2015 IEEE 2nd International Conference on
Conference_Location :
Gdynia
Print_ISBN :
978-1-4799-8320-9
Type :
conf
DOI :
10.1109/CYBConf.2015.7175901
Filename :
7175901
Link To Document :
بازگشت