• DocumentCode
    1055915
  • Title

    Algorithms for Storytelling

  • Author

    Kumar, Deept ; Ramakrishnan, Naren ; Helm, Richard F. ; Potts, Malcolm

  • Author_Institution
    Feeva Technol., San Fransisco, CA
  • Volume
    20
  • Issue
    6
  • fYear
    2008
  • fDate
    6/1/2008 12:00:00 AM
  • Firstpage
    736
  • Lastpage
    751
  • Abstract
    We formulate a new data mining problem called storytelling as a generalization of redescription mining. In traditional redescription mining, we are given a set of objects and a collection of subsets defined over these objects. The goal is to view the set system as a vocabulary and identify two expressions in this vocabulary that induce the same set of objects. Storytelling, on the other hand, aims to explicitly relate object sets that are disjoint (and, hence, maximally dissimilar) by finding a chain of (approximate) redescriptions between the sets. This problem finds applications in bioinformatics, for instance, where the biologist is trying to relate a set of genes expressed in one experiment to another set, implicated in a different pathway. We outline an efficient storytelling implementation that embeds the CARTwheels redescription mining algorithm in an A* search procedure, using the former to supply next move operators on search branches to the latter. This approach is practical and effective for mining large data sets and, at the same time, exploits the structure of partitions imposed by the given vocabulary. Three application case studies are presented: a study of word overlaps in large English dictionaries, exploring connections between gene sets in a bioinformatics data set, and relating publications in the PubMed index of abstracts.
  • Keywords
    data mining; redescription mining generalization; set system; storytelling data mining problem; vocabulary; Data mining; Graph and tree search strategies; Mining methods and algorithms; Retrieval models;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/TKDE.2008.32
  • Filename
    4445670