مرکز منطقه ای اطلاع رساني علوم و فناوري - Sampling from databases for rule induction methods based on likelihood ratio test

DocumentCode :

2540502

Title :

Sampling from databases for rule induction methods based on likelihood ratio test

Author :

Tsumoto, Shusaku ; Hirano, Shoji ; Abe, Hidenao

Author_Institution :

Dept. of Med. Inf., Shimane Univ., Izumo, Japan

fYear :

2010

fDate :

7-9 July 2010

Firstpage :

174

Lastpage :

179

Abstract :

One of the most important problems in data mining is how to manage a large amount of data and to extract efficient knowledge from large databases. Although many machine learning methods and statistical methods have been proposed to solve this problem, they are not powerful when we have more than 1000 samples, since the computational complexity of these algorithms is larger than or approximately equal to n². In this paper, we introduce the idea of log-likelihood ratio to measure the similarity between generated training samples and original training samples before rule induction methods are applied to this selected samples. This method was evaluated to three medical domains. The results show that the proposed method selects training samples which reflect the statistical characteristics of the original training samples although the performance with small samples is not so good.

Keywords :

computational complexity; data mining; statistical analysis; computational complexity; data mining; databases; likelihood ratio test; machine learning method; rule induction method; sampling; statistical method; training samples; Accuracy; Data mining; Databases; Equations; Learning systems; Statistical analysis; Training;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Cognitive Informatics (ICCI), 2010 9th IEEE International Conference on

Conference_Location :

Beijing

Print_ISBN :

978-1-4244-8041-8

Type :

conf

DOI :

10.1109/COGINF.2010.5599746

Filename :

5599746

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2540502