• DocumentCode
    628151
  • Title

    Terms extraction from unstructured data silos

  • Author

    Lomotey, Richard K. ; Deters, Ralph

  • Author_Institution
    Dept. of Comput. Sci., Univ. of Saskatchewan, Saskatoon, SK, Canada
  • fYear
    2013
  • fDate
    2-6 June 2013
  • Firstpage
    19
  • Lastpage
    24
  • Abstract
    The major challenge that the big data era brings to the services computing landscape is debris of unstructured data. The high-dimensional data is in heterogeneous formats, schemaless, and requires multiple storage APIs is some cases. This situation has made it almost impractical to apply existing data mining techniques which are designed for schema-based data sources in a knowledge discovery in database (KDD) process. In this paper, a tool called TouchR is proposed which algorithmically relies on the Hidden Markov Model (HMM) to extract terms from data silos; specifically, distributed NoSQL databases- which we model as network graph. Our use case graph consists of storage nodes such as CouchDB, Neo4J, DynamoDB etc. The evaluation of TouchR shows high accuracy for terms extraction and organization.
  • Keywords
    SQL; data mining; distributed databases; document handling; graph theory; hidden Markov models; network theory (graphs); API; CouchDB; DynamoDB; HMM; KDD process; Neo4J; TouchR tool; data mining techniques; distributed NoSQL database; heterogeneous-schemaless high-dimensional data; hidden Markov model; knowledge discovery-in-database process; network graph; schema-based data sources; storage nodes; term extraction; term organization; unstructured data silos; Data mining; Dictionaries; Distributed databases; Feature extraction; Hidden Markov models; Mathematical model; Hidden Markov Model (HMM); NoSQL; Unstructured data mining; big data; terms extraction;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    System of Systems Engineering (SoSE), 2013 8th International Conference on
  • Conference_Location
    Maui, HI
  • Print_ISBN
    978-1-4673-5596-4
  • Type

    conf

  • DOI
    10.1109/SYSoSE.2013.6575236
  • Filename
    6575236