• DocumentCode
    2410244
  • Title

    About softness for inductive querying on sequence databases

  • Author

    Mitasiunaite, Ieva ; Boulicaut, Jean-François

  • Author_Institution
    INSA Lyon, LIRIS CNRS UMR, Villeurbanne
  • fYear
    0
  • fDate
    0-0 0
  • Firstpage
    77
  • Lastpage
    82
  • Abstract
    In many application domains (e.g., WWW usage mining, telecommunication data analysis, molecular biology), large sequence databases are available and yet under-exploited. The inductive database framework assumes that both such databases and the various patterns holding within them might be queryable. In this setting, queries which return patterns are called inductive queries and solving them is one of the main topics in database mining research. Indeed, constraint-based mining techniques on sequence databases have been studied extensively the last few years and efficient algorithms enable to compute complete collections of patterns (e.g., sequences) which satisfy conjunctions of monotonic and/or anti-monotonic constraints in potentially large sequence databases (e.g., minimal and maximal frequency constraints). Studying new applications of these techniques, we consider that fault-tolerance and softness are extremely important issues for tackling real-life data analysts. In this paper, we address some of the open problems when computing soft occurrences of patterns within database sequences instead of the classical exact matching ones. Such an extension is not trivial since it prevents the clever use of monotonicity for pruning the search space. We describe our proposal and we provide an experimental validation on real-life clickstream data which confirms the added value of this approach
  • Keywords
    constraint handling; data analysis; data mining; fault tolerant computing; query processing; very large databases; antimonotonic constraints; clickstream data; constraint-based mining; data analysis; database mining; fault-tolerance; inductive database; inductive querying; large sequence databases; monotonic constraint; pattern soft occurrences computation; search space pruning; Data analysis; Data mining; Databases; Electronic commerce; Fault tolerance; Frequency; Pattern matching; Proposals; Sequences; World Wide Web;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Databases and Information Systems, 2006 7th International Baltic Conference on
  • Conference_Location
    Vilnius
  • Print_ISBN
    1-4244-0345-6
  • Type

    conf

  • DOI
    10.1109/DBIS.2006.1678478
  • Filename
    1678478