DocumentCode :
3723123
Title :
Advancing the Terminological Classification of Semi-structured Documents
Author :
Georgios Stratogiannis;Georgios Siolas;Georgios Stamou;Andreas Stafylopatis;Alexandros Chortaras;Athanasios Tagaris
Author_Institution :
Dept. of Electr. &
fYear :
2015
Firstpage :
333
Lastpage :
339
Abstract :
Usually, documents are given in textual form, accompanied by a set of terminological classifications (metadata), based on vocabularies of domain ontologies. This paper presents a novel method for advancing the above classification, by extracting more properties of the analyzed documents. We first extract additional roles from the textual part and together with roles extracted from the ontology statements, we construct an extended document vector representation. We then introduce a pruning algorithm that, for a given document collection, merges concepts of the ontology to produce classes with a sufficient number of corresponding instances. We then classify the documents to ontology classes using the Stanford linear Classifier. Finally, we propose an algorithm that assigns additional concept labels to documents, using the output of the classifier. Our system is evaluated in a set of real data and ontological descriptions and its performance is measured in terms of various accuracy and specificity measures indicates that the proposed approach for documents classification produces correct labels for the majority of items.
Keywords :
"Ontologies","Semantics","Natural languages","Feature extraction","Data mining","Training","Clothing"
Publisher :
ieee
Conference_Titel :
Tools with Artificial Intelligence (ICTAI), 2015 IEEE 27th International Conference on
ISSN :
1082-3409
Type :
conf
DOI :
10.1109/ICTAI.2015.58
Filename :
7372154
Link To Document :
بازگشت