Title :
FAR-HD: A fast and efficient algorithm for mining fuzzy association rules in large high-dimensional datasets
Author :
Mangalampalli, Ashish ; Pudi, Vikramkumar
Author_Institution :
Centre for Data Eng., Int. Inst. of Inf. Technol. (IIIT-H), Hyderabad, India
Abstract :
Fuzzy Association Rule Mining (ARM) has been extensively used in relational or transactional datasets having less-to-medium number of attributes/dimensions. The mined fuzzy association rules (patterns) are not only used for manual analysis by domain experts, but are also leveraged to drive further mining tasks like classification and clustering which automate decision-making. Such fuzzy association rules can also be derived from high-dimensional numerical datasets, like image datasets, in order to train fuzzy associative classifiers or clustering algorithms. Traditional Fuzzy ARM algorithms are not able to mine rules from them efficiently, since such algorithms are meant to deal with datasets with relatively much less number of attributes/dimensions. Hence, in this paper we propose FAR-HD which is a Fuzzy ARM algorithm designed specifically for large high-dimensional datasets. FAR-HD processes fuzzy frequent itemsets in a DFS manner using a two-phased multiple-partition tidlist-based strategy. It also uses a byte-vector representation of tidlists, with the tidlists stored in the main memory in a compressed form (using a fast generic compression method). Additionally, FAR-HD uses Fuzzy Clustering to convert each numerical vector of the original input dataset to a fuzzy-cluster-based representation, which is ultimately used for the actual Fuzzy ARM process. FAR-HD has been compared experimentally with Fuzzy Apriori (7-15 times faster), which is the most popular Fuzzy ARM algorithm, and a Fuzzy ARM algorithm (1.1-4 times faster) which we proposed earlier and which is designed to work with very large but traditional (with fewer attributes) datasets.
Keywords :
data mining; decision making; fuzzy set theory; pattern classification; pattern clustering; DFS; FAR-HD; classification; decision-making; fuzzy ARM algorithms; fuzzy apriori; fuzzy association rule mining; fuzzy associative classifiers; fuzzy frequent itemsets; fuzzy-cluster-based representation; large high-dimensional numerical datasets; relational datasets; tidlists byte-vector representation; transactional datasets; two-phased multiple-partition tidlist-based strategy; Algorithm design and analysis; Association rules; Clustering algorithms; Itemsets; Memory management; Partitioning algorithms; Vectors; Fuzzy Association Rule Mining; Fuzzy Clustering; Fuzzy Partitioning; Fuzzy Relations; High Dimensions; Large Datasets; Partitions; Tidlists;
Conference_Titel :
Fuzzy Systems (FUZZ), 2013 IEEE International Conference on
Conference_Location :
Hyderabad
Print_ISBN :
978-1-4799-0020-6
DOI :
10.1109/FUZZ-IEEE.2013.6622333