• DocumentCode
    2054336
  • Title

    A Framework for the Classification of Unstructured Data

  • Author

    Ostrowski, David Alfred

  • fYear
    2009
  • fDate
    14-16 Sept. 2009
  • Firstpage
    373
  • Lastpage
    377
  • Abstract
    Increased sources and quantity of unstructured information has created a further need for categorization and interpretation of their content. This paper describes the design of an interchangeable framework to support learning from an unstructured data source. Our approach supports integration of two or more learning mechanisms with a traditional indexing method. The goal is to identify a higher semantic content and more meaningful keyword combinations, considering both supervised and unsupervised techniques. Within a specific implementation both Bayesian learning as well as clustering are integrated to support a boost parameter towards classification of unstructured text. We find that an implementation of this framework applied towards a set of Reuters news feeds supports a vastly improved recognition rate. Our effort is directed towards making associations between structured and unstructured information.
  • Keywords
    Bayes methods; pattern classification; text analysis; unsupervised learning; Bayesian learning; indexing method; keyword combinations; semantic content; supervised technique; unstructured data classification; unstructured information; unstructured text classification; unsupervised technique; Employment; Engines; Indexing; Learning systems; Machine learning; Machine learning algorithms; Ontologies; Sections; Supervised learning; Technological innovation; Bayesian Learning; Clustering; Lucene Index; Unstructured Data;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Semantic Computing, 2009. ICSC '09. IEEE International Conference on
  • Conference_Location
    Berkeley, CA
  • Print_ISBN
    978-1-4244-4962-0
  • Electronic_ISBN
    978-0-7695-3800-6
  • Type

    conf

  • DOI
    10.1109/ICSC.2009.48
  • Filename
    5298655