• DocumentCode
    1655216
  • Title

    An examination and comparison of conflicting data in granulized datasets: EWI vs. EFI

  • Author

    Wu, Chien-Hsing ; Okuhara, Koji ; Kao, Shu-Chen ; Yang, Cheng Han ; Yang, Chung Han

  • Author_Institution
    Dept. of Inf. Manage., Nat. Univ. of Kaohsiung, Kaohsiung, Taiwan
  • fYear
    2010
  • Firstpage
    1
  • Lastpage
    6
  • Abstract
    Knowledge discovery in Databases (KDD) frequently faces the need of making the granulized dataset consistent for the continuous database. Common techniques used in granulization are equal width interval (EWI) and equal frequency interval (EFI). However, they may produce different results in terms of the number of conflicting records in a granulized dataset. In Wu´s research, the data inconsistency has been found significant in granulized datasets by using EWI. Following this finding, our research conducts an experiment to examine and compare the performance of utilized EWI and EFI. Eighteen continuous datasets are examined. The TCRM model that is introduced by Wu, embedding a technique of database structured query language is utilized to efficiently derive results. Experimental results obtained indicate that (1) of 18 datasets that were examined, 7 were not recognized via EWI while 8 via EFI, implying that almost 40% of the granulized datasets contained conflicting data; (2) comparatively, we did not find a notable difference between EWI and EFI with respect to their granulization performance, and (3) no remarkable tendency of conflicting data production against dataset size, the number of attributes, and the number of classes was found.
  • Keywords
    database management systems; query languages; continuous database; data inconsistency; database structured query language; equal frequency interval; equal width interval; granulized datasets; knowledge discovery in databases; Conferences; Data mining; Database languages; Databases; Machine learning; Presses; Data mining; conflicting data; granulation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computers and Industrial Engineering (CIE), 2010 40th International Conference on
  • Conference_Location
    Awaji
  • Print_ISBN
    978-1-4244-7295-6
  • Type

    conf

  • DOI
    10.1109/ICCIE.2010.5668394
  • Filename
    5668394