Title :
Automatic Classification and Taxonomy Generation for Semi-structured Data
Author :
Bernardo Pereira Nunes;Giseli Rabello Lopes;Marco Antonio Casanova
Author_Institution :
Dept. of Inf., UNIRIO Rio de Janeiro, Rio de Janeiro, Brazil
Abstract :
The problem of data classification goes back to the definition of taxonomies covering knowledge areas. With the advent of the Web, the amount of data available increased several orders of magnitude, making manual data classification impossible. This work presents an approach based on the prototype theory to automatically classify semi-structured data, represented by frames, without any previous knowledge about structured classes. Our approach uses a variation of the K-Means algorithm that organizes a set of frames into classes, structured as a strict hierarchy.
Keywords :
"Prototypes","Taxonomy","Libraries","Informatics","Electronic mail","Colon","XML"
Conference_Titel :
Computer and Information Technology; Ubiquitous Computing and Communications; Dependable, Autonomic and Secure Computing; Pervasive Intelligence and Computing (CIT/IUCC/DASC/PICOM), 2015 IEEE International Conference on
DOI :
10.1109/CIT/IUCC/DASC/PICOM.2015.30