مرکز منطقه ای اطلاع رساني علوم و فناوري - A novel data structure for efficient representation of large data sets in data mining

DocumentCode :

3264390

Title :

A novel data structure for efficient representation of large data sets in data mining

Author :

Pai, Radhika M. ; Ananthanarayana, V.S.

Author_Institution :

Nat. Inst. of Technol. Karnataka, Mangalore

fYear :

2006

fDate :

20-23 Dec. 2006

Firstpage :

547

Lastpage :

552

Abstract :

An important goal in data mining is to generate an abstraction of the data. Such an abstraction helps in reducing the time and space requirements of the overall decision making process. It is also important that the abstraction be generated from the data in small number of scans. In this paper, we propose a novel data structure called prefix-postfix structure(PP-structure), which is an abstraction of the data that can be built by scanning the database only once. We prove that this structure is compact, complete and incremental and therefore is suitable to represent dynamic databases. Further, we propose a clustering algorithm using this structure. The proposed algorithm is tested on different real world datasets and is shown that the algorithm is both space efficient and time efficient for large datasets without sacrificing for the accuracy. We compare our algorithm with other algorithms and show the effectiveness of our algorithm.

Keywords :

data mining; data structures; decision making; pattern clustering; very large databases; PP-structure; clustering algorithm; data abstraction; data mining; data structure; decision making process; large data set representation; prefix-postfix structure; Clustering algorithms; Data mining; Data structures; Decision making; Electronic mail; Heuristic algorithms; Image databases; Testing; Transaction databases; Tree data structures; Clustering; Data Mining; Data structure; PC-tree; PPC-tree; Prefix-Postfix structure; dynamic databases; incremental algorithm;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Advanced Computing and Communications, 2006. ADCOM 2006. International Conference on

Conference_Location :

Surathkal

Print_ISBN :

1-4244-0716-8

Electronic_ISBN :

1-4244-0716-8

Type :

conf

DOI :

10.1109/ADCOM.2006.4289952

Filename :

4289952

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3264390