Author_Institution :
Dept. of Electr. & Comput. Eng., Memphis Univ., TN, USA
Abstract :
We propose a new data transformation approach that facilitates many data mining, interpretation, and analysis tasks. Our approach, called the membershipmap, strives to extract the underlying structure or sub-concepts of each raw attribute automatically, and uses the orthogonal union of these sub-concepts to define a new, semantically richer, space. The sub-concept labels of each point in the original space determine the position of that point in the transformed space. Since sub-concept labels are prone to uncertainty inherent in the original data and in the initial extraction process, a combination of labeling schemes that are based on different measures of uncertainty is presented. In particular, we introduce the crispmap, the fuzzymap, and the possibilisticmap. We outline the advantages and disadvantages of each mapping scheme, and we show that the three transformed spaces are complementary. The proposed transformation is illustrated with several data sets, and we show that it can be used as a flexible pre-processing tool to support such tasks as: sampling, data cleaning, and outlier detection.
Keywords :
data analysis; data mining; fuzzy set theory; pattern clustering; sampling methods; crispmap; data analysis; data cleaning; data interpretation; data mining; data transformation; flexible preprocessing tool; fuzzymap; knowledge discovery; knowledge extraction process; membershipmap method; pattern clustering; possibilisticmap; sampling method; subconcept labels; transformed spaces; Association rules; Cleaning; Data mining; Data preprocessing; Databases; Electronic mail; Labeling; Measurement uncertainty; Phase noise; Sampling methods;