• DocumentCode
    2404003
  • Title

    Attribute classification using feature analysis

  • Author

    Nauman, F. ; Ho, Ching-Tien ; Tian, Xuqing ; Haas, Laura ; Megiddo, Nimrod

  • Author_Institution
    IBM Almaden Res. Center, San Jose, CA, USA
  • fYear
    2002
  • fDate
    2002
  • Firstpage
    271
  • Abstract
    The basis of many systems that integrate data from multiple sources is a set of correspondences between source schemata and a target schema. Correspondences express a relationship between sets of source attributes, possibly from multiple sources, and a set of target attributes. Clio is an integration tool that assists users in defining value correspondences between attributes. In real life scenarios there may be many sources and the source relations may have many attributes. Users can get lost and might miss or be unable to find some correspondences. Also, in many real life schemata the attribute names reveal little or nothing about the semantics of the data values. Only the data values in the attribute columns can convey the semantic meaning of the attribute. Our work relieves users of the problems of too many attributes and meaningless attribute names, by automatically suggesting correspondences between source and target attributes. For each attribute, we analyze the data values and derive a set of features
  • Keywords
    database management systems; pattern classification; Clio integration tool; attribute classification; feature analysis; multiple sources; source attributes; source schemata; target attributes; target schema; Data engineering; Insurance; Semiconductor device manufacture; Spatial databases; Testing; Training data;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Engineering, 2002. Proceedings. 18th International Conference on
  • Conference_Location
    San Jose, CA
  • ISSN
    1063-6382
  • Print_ISBN
    0-7695-1531-2
  • Type

    conf

  • DOI
    10.1109/ICDE.2002.994725
  • Filename
    994725