DocumentCode :
2690595
Title :
An Approach to Document Warehousing System Lifecyle from Textual ETL to Multidimensional Queries: A Proof-of-Concept Prototype
Author :
Cembalo, Assuntina ; Pisano, Francesca M. ; Romano, Gianpaolo
Author_Institution :
SoftComputing Lab., C.I.R.A - Italian Aerosp. Res. Center, Capua, Italy
fYear :
2012
fDate :
4-6 July 2012
Firstpage :
828
Lastpage :
835
Abstract :
For years, businessmen made use of ad-hoc technologies in order to analyze huge amount of data related to the domain of interest, aiming at extracting relevant information to elaborate successful company strategies. Such technologies focused essentially on the structured data. In particular Data Warehousing systems represent the decision support systems on which academia and industry focused their attention. It is believed that "about 80% of the information of any organization is contained in unstructured and semi-structured documents"[1], so limiting the analysis to only the structured data, as it has been done so far, is likely to lose a high percentage of potentially useful knowledge. Since text is the primary mean to disseminate information and knowledge, it is necessary to introduce concepts related to text-oriented Business Intelligent and Document Warehousing systems, which could have many useful applications in industries or large domains. In this paper we present a prototype application of a Document Warehousing system, highlighting challenges and solutions for each phase of its lifecycle. The prototype is related to Security and Prevention domain and it is built with a set of open-source tools whose features and limitations are highlighted. As we currently know, organization and setting of the fundamental elements of a Document Warehouse system lifecycle, are issues which have not been deepened yet. Furthermore until now, we have not find an application of Document Warehousing, which has been implemented integrating the open-source tools which we use to implement our prototype yet.
Keywords :
business data processing; competitive intelligence; data analysis; data warehouses; decision support systems; information dissemination; organisational aspects; public domain software; query processing; security of data; text analysis; ad hoc technology; decision support system; document warehousing system lifecyle; information dissemination; information extraction; multidimensional query; open source tool; organization information; prevention domain; prototype application; security domain; semi-structured document; structured data analysis; text oriented business intelligent; textual ETL; unstructured document; Data mining; Drugs; Hypercubes; Open source software; Prototypes; Terminology; Warehousing; Document Warehousing; OLAP; Security and Prevention; Textual ETL;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Complex, Intelligent and Software Intensive Systems (CISIS), 2012 Sixth International Conference on
Conference_Location :
Palermo
Print_ISBN :
978-1-4673-1233-2
Type :
conf
DOI :
10.1109/CISIS.2012.185
Filename :
6245783
Link To Document :
بازگشت