DocumentCode :
2049120
Title :
Cataloga: A Software for Semantic-Based Terminological Data Mining
Author :
Elia, Annibale ; Monteleone, Mario ; Postiglione, Alberto
Author_Institution :
Dipt. di Sci. Politiche, Sociali e della Comun., Univ. degli Studi di Salerno, Fisciano, Italy
fYear :
2011
fDate :
21-24 June 2011
Firstpage :
153
Lastpage :
156
Abstract :
This paper is focused on Catalog a, a software package based on Lexicon-Grammar theoretical and practical analytical framework and embedding a ling ware module built on compressed terminological electronic dictionaries. We will here show how Catalog a can be used to achieve efficient data mining and information retrieval by means of lexical ontology associated to terminology-based automatic textual analysis. Also, we will show how accurate data compression is necessary to build efficient textual analysis software. Therefore, we will here discuss the creation and functioning of a software for semantic-based terminological data mining, in which a crucial role is played by Italian simple and compound-word electronic dictionaries. Lexicon-Grammar is one of the most profitable and consistent methods for natural language formalization and automatic textual analysis it was set up by French linguist Maurice Gross during the ´60s, and subsequently developed for and applied to Italian by Annibale Elia, Emilio D´Agostino and Maurizio Martin Elli. Basically, Lexicon-Grammar establishes morph syntactic and statistical sets of analytic rules to read and parse large textual corpora. The analytical procedure here described will prove itself appropriate for any type of digitalized text, and will represent a relevant support for the building and implementing of Semantic Web (SW) interactive platforms.
Keywords :
cataloguing; data compression; data mining; dictionaries; grammars; information retrieval; interactive systems; ontologies (artificial intelligence); semantic Web; text analysis; Cataloga software package; Italian compound-word electronic dictionary; Italian simple-word electronic dictionary; automatic text analysis; compressed terminological electronic dictionary; data compression; information retrieval; lexical ontology; lexicon-grammar practical analytical framework; lexicon-grammar theoretical analytical framework; lingware module; morphosyntactic analytic rule sets; natural language formalization; semantic Web interactive platform; semantic-based terminological data mining; statistical analytic rule sets; terminology-based automatic text analysis; text analysis software; textual corpora; Compounds; Data mining; Dictionaries; Geology; Pragmatics; Semantics; Software; Automatic Textual Analysis; Cataloga software; Information Retrieval; Lexicon-Grammar; Semantic-Based Terminological Data Mining;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Compression, Communications and Processing (CCP), 2011 First International Conference on
Conference_Location :
Palinuro
Print_ISBN :
978-1-4577-1458-0
Electronic_ISBN :
978-0-7695-4528-8
Type :
conf
DOI :
10.1109/CCP.2011.42
Filename :
6061017
Link To Document :
بازگشت