• DocumentCode
    2007758
  • Title

    Detection of Sequential Outliers Using a Variable Length Markov Model

  • Author

    Kam, Cécile Low ; Laurent, Anne ; Teisseire, Maguelonne

  • Author_Institution
    Inst. de Math. et Modelisation de Montpellier, Univ. Montpellier 2, Montpellier
  • fYear
    2008
  • fDate
    11-13 Dec. 2008
  • Firstpage
    571
  • Lastpage
    576
  • Abstract
    The problem of mining for outliers in sequential datasets is crucial to forward appropriate analysis of data. Therefore, many approaches for the discovery of such anomalies have been proposed. However, most of them use a sample of known typical sequences to build the model. Besides, they remain greedy in terms of memory usage. In this paper we propose an extension of one such approach, based on a probabilistic suffix tree and on a measure of similarity. We add a pruning criterion which reduces the size of the tree while improving the model, and a sharp inequality for the concentration of the measure of similarity, to better sort the outliers. We prove the feasibility of our approach through a set of experiments over a protein database.
  • Keywords
    Markov processes; data analysis; database management systems; tree data structures; data analysis; probabilistic suffix tree; protein database; pruning criterion; sequential datasets; sequential outliers; variable length Markov model; DNA; Data analysis; Databases; Genetic mutations; Machine learning; Proteins; Robots; Sequences; Size measurement; Testing; Concentration Inequality; Information Criterion; Outliers; Sequential Databases;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Machine Learning and Applications, 2008. ICMLA '08. Seventh International Conference on
  • Conference_Location
    San Diego, CA
  • Print_ISBN
    978-0-7695-3495-4
  • Type

    conf

  • DOI
    10.1109/ICMLA.2008.137
  • Filename
    4725031