DocumentCode
3393568
Title
Analysis of correlation structure of data set for efficient pattern classification
Author
Goswami, Saptarsi ; Chakrabarti, Amlan ; Chakraborty, Basabi
Author_Institution
Inst. of Eng. & Manage., Kolkata, India
fYear
2015
fDate
24-26 June 2015
Firstpage
24
Lastpage
29
Abstract
Pattern classification or clustering plays important role in a wide variety of applications in different areas like psychology and other social sciences, biology and medical sciences, pattern recognition and data mining. A lot of algorithms for supervised or unsupervised classification have been developed so far in order to achieve high classification accuracy with lower computational cost. However, some methods or algorithms work well for some of the data sets and perform poorly on others. For any particular data set, it is difficult to find out the most suitable algorithm without some random trial and error process. It seems that the characteristics of the data set might have some influence on the algorithm for classification. In this work, the data set characteristics is studied in terms of intra attribute relationship and a measure MVS (multivariate score) has been proposed to quantify and group different data sets on the basis of the correlation structure into strong independent, weak independent, weak correlated and strong correlated data set. The performance of different feature selection algorithms on different groups of data are studied by simulation experiments with 63 publicly available bench mark data sets. It has been verified that univariate methods lead to significant performance gain for strong independent data set compared to multivariate methods while multivariate methods have better performance for strong correlated data sets.
Keywords
data analysis; feature selection; pattern classification; pattern clustering; MVS; correlation structure analysis; data set characteristics; feature selection algorithms; intra attribute relationship; multivariate methods; multivariate score; pattern classification; pattern clustering; strong correlated data set; strong independent data set; univariate methods; weak correlated data set; weak independent data set; Accuracy; Classification algorithms; Clustering algorithms; Correlation; Data models; Histograms; Iris; Pattern classification algorithm; correlation structure; data set characteristics;
fLanguage
English
Publisher
ieee
Conference_Titel
Cybernetics (CYBCONF), 2015 IEEE 2nd International Conference on
Conference_Location
Gdynia
Print_ISBN
978-1-4799-8320-9
Type
conf
DOI
10.1109/CYBConf.2015.7175901
Filename
7175901
Link To Document