DocumentCode
2054336
Title
A Framework for the Classification of Unstructured Data
Author
Ostrowski, David Alfred
fYear
2009
fDate
14-16 Sept. 2009
Firstpage
373
Lastpage
377
Abstract
Increased sources and quantity of unstructured information has created a further need for categorization and interpretation of their content. This paper describes the design of an interchangeable framework to support learning from an unstructured data source. Our approach supports integration of two or more learning mechanisms with a traditional indexing method. The goal is to identify a higher semantic content and more meaningful keyword combinations, considering both supervised and unsupervised techniques. Within a specific implementation both Bayesian learning as well as clustering are integrated to support a boost parameter towards classification of unstructured text. We find that an implementation of this framework applied towards a set of Reuters news feeds supports a vastly improved recognition rate. Our effort is directed towards making associations between structured and unstructured information.
Keywords
Bayes methods; pattern classification; text analysis; unsupervised learning; Bayesian learning; indexing method; keyword combinations; semantic content; supervised technique; unstructured data classification; unstructured information; unstructured text classification; unsupervised technique; Employment; Engines; Indexing; Learning systems; Machine learning; Machine learning algorithms; Ontologies; Sections; Supervised learning; Technological innovation; Bayesian Learning; Clustering; Lucene Index; Unstructured Data;
fLanguage
English
Publisher
ieee
Conference_Titel
Semantic Computing, 2009. ICSC '09. IEEE International Conference on
Conference_Location
Berkeley, CA
Print_ISBN
978-1-4244-4962-0
Electronic_ISBN
978-0-7695-3800-6
Type
conf
DOI
10.1109/ICSC.2009.48
Filename
5298655
Link To Document