• DocumentCode
    14762
  • Title

    On Identifying Critical Nuggets of Information during Classification Tasks

  • Author

    Sathiaraj, David ; Triantaphyllou, Evangelos

  • Author_Institution
    Dept. of Comput. Sci., Louisiana State Univ., Baton Rouge, LA, USA
  • Volume
    25
  • Issue
    6
  • fYear
    2013
  • fDate
    Jun-13
  • Firstpage
    1354
  • Lastpage
    1367
  • Abstract
    In large databases, there may exist critical nuggets-small collections of records or instances that contain domain-specific important information. This information can be used for future decision making such as labeling of critical, unlabeled data records and improving classification results by reducing false positive and false negative errors. This work introduces the idea of critical nuggets, proposes an innovative domain-independent method to measure criticality, suggests a heuristic to reduce the search space for finding critical nuggets, and isolates and validates critical nuggets from some real-world data sets. It seems that only a few subsets may qualify to be critical nuggets, underlying the importance of finding them. The proposed methodology can detect them. This work also identifies certain properties of critical nuggets and provides experimental validation of the properties. Experimental results also helped validate that critical nuggets can assist in improving classification accuracies in real-world data sets.
  • Keywords
    data mining; decision making; pattern classification; classification accuracy improvement; critical information nugget identification; critical nugget isolation; critical nugget validation; critical-unlabeled data record labeling; data mining; decision making; domain-independent method; domain-specific important information; false negative error reduction; false positive error reduction; real-world data sets; search space reduction; Accuracy; Cancer; Complexity theory; Data mining; Data models; Measurement; Switches; Data mining; class boundary; classification; classification accuracy; critical nuggets; duality; outliers;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/TKDE.2012.112
  • Filename
    6205754