• DocumentCode
    866700
  • Title

    Algorithms for finding attribute value group for binary segmentation of categorical databases

  • Author

    Morimoto, Yasuhiko ; Fukuda, Takeshi ; Tokuyama, Takeshi

  • Author_Institution
    IBM Tokyo Res. Lab., Kanagawa, Japan
  • Volume
    14
  • Issue
    6
  • fYear
    2002
  • Firstpage
    1269
  • Lastpage
    1279
  • Abstract
    We consider the problem of finding a set of attribute values that give a high quality binary segmentation of a database. The quality of a segmentation is defined by an objective function suitable for the user\´s objective, such as "mean squared error," "mutual information," or "χ2" each of which is defined in terms of the distribution of a given target attribute. Our goal is to find value groups on a given conditional domain that split databases into two segments, optimizing the value of an objective function. Though the problem is intractable for general objective functions, there are feasible algorithms for finding high quality binary segmentations when the objective function is convex, and we prove that the typical criteria mentioned above are all convex. We propose two practical algorithms, based on computational geometry techniques, which find a much better value group than conventional heuristics.
  • Keywords
    computational geometry; data mining; data reduction; database theory; decision trees; very large databases; attribute value group; binary segmentation; categorical databases; computational geometry; convex objective function; data mining; data reduction; decision tree; heuristics; Computational geometry; Computer Society; Computer errors; Data mining; Decision trees; Marketing and sales; Spatial databases; Testing;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/TKDE.2002.1047767
  • Filename
    1047767