• DocumentCode
    3603560
  • Title

    Association Discovery in Two-View Data

  • Author

    van Leeuwen, Matthijs ; Galbrun, Esther

  • Author_Institution
    Dept. of Comput. Sci., KU Leuven, Heverlee, Belgium
  • Volume
    27
  • Issue
    12
  • fYear
    2015
  • Firstpage
    3190
  • Lastpage
    3202
  • Abstract
    Two-view datasets are datasets whose attributes are naturally split into two sets, each providing a different view on the same set of objects. We introduce the task of finding small and non-redundant sets of associations that describe how the two views are related. To achieve this, we propose a novel approach in which sets of rules are used to translate one view to the other and vice versa. Our models, dubbed translation tables, contain both unidirectional and bidirectional rules that span both views and provide lossless translation from either of the views to the opposite view. To be able to evaluate different translation tables and perform model selection, we present a score based on the Minimum Description Length (MDL) principle. Next, we introduce three TRANSLATOR algorithms to find good models according to this score. The first algorithm is parameter-free and iteratively adds the rule that improves compression most. The other two algorithms use heuristics to achieve better trade-offs between runtime and compression. The empirical evaluation on real-world data demonstrates that only modest numbers of associations are needed to characterize the two-view structure present in the data, while the obtained translation rules are easily interpretable and provide insight into the data.
  • Keywords
    data mining; feature selection; iterative methods; MDL; TRANSLATOR algorithm; association discovery; bidirectional rule; iterative algorithm; minimum description length; model selection method; two-view data; unidirectional rule; Association rules; Context awareness; Data mining; Data models; Encoding; Itemsets; Association discovery; Association rule mining; Minimum description length; Redescription mining; Two-view data; association rule mining; minimum description length; redescription mining; two-view data;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/TKDE.2015.2453159
  • Filename
    7152902