• DocumentCode
    2866019
  • Title

    Average number of frequent (closed) patterns in Bernoulli and Markovian databases

  • Author

    Lhote, Loïck ; Rioult, François ; Soulet, Arnaud

  • Author_Institution
    GREYC, CNRS - UMR, Univ. de Caen Basse-Normandie, France
  • fYear
    2005
  • fDate
    27-30 Nov. 2005
  • Abstract
    In data mining, enumerate the frequent or the closed patterns is often the first difficult task leading to the association rules discovery. The number of these patterns represents a great interest. The lower bound is known to be constant whereas the upper bound is exponential, but both situations correspond to pathological cases. For the first time, we give an average analysis of the number of frequent or closed patterns. Average analysis is often closer to real situations and gives more information about the role of the parameters. In this paper, two probabilistic models are studied: a Bernoulli and a Markovian. In both models and for large databases, we prove that the number of frequent patterns, for a fixed frequency threshold, is exponential in the number of items and polynomial in the number of transactions. On the other hand, for a proportional frequency threshold, the number of frequent patterns is polynomial in the number of items and does not involve the number of transactions. Finally, we prove in the Bernoulli model that the number of closed patterns, for a proportional frequency threshold, is polynomial in the number of items.
  • Keywords
    data mining; Bernoulli model; Markovian model; association rules discovery; average analysis; closed patterns; data mining; fixed frequency threshold; frequent enumeration; frequent patterns; probabilistic models; proportional frequency threshold; Association rules; Data mining; Frequency; Information analysis; Pathology; Pattern analysis; Polynomials; Sampling methods; Transaction databases; Upper bound;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining, Fifth IEEE International Conference on
  • ISSN
    1550-4786
  • Print_ISBN
    0-7695-2278-5
  • Type

    conf

  • DOI
    10.1109/ICDM.2005.31
  • Filename
    1565764