Title :
A Multi-dimensional Analysis and Data Cube for Unstructured Text and Social Media
Author :
Suan Lee ; Namsoo Kim ; Jinho Kim
Author_Institution :
Dept. of Comput. Sci., Kangwon Nat. Univ., Chuncheon, South Korea
Abstract :
Recently, unstructured data like texts, documents, or SNS messages has been increasingly being used in many applications, rather than structured data consisting of simple numbers or characters. Thus it becomes more important to analysis unstructured text data to extract valuable information for usres decision making. Like OLAP (On-Line Analytical Processing) analysis over structured data, Multi-dimensional analysis for these unstructured data is popularly being required. To facilitate these analysis requirements on the unstructured data, a text cube model on multi-dimensional text database has been proposed. In this paper, we extended the existing text cube model to incorporate TF-IDF (Term Frequency Inverse Document Frequrency) and LM (Language Model) as measurements. Because the proposed text cube model utilizes new measurements which are more popular in information retrieval systems, it is more efficient and effective to analysis text databases. Through experiments, we revealed that the performance and the effectiveness of the proposed text cube outperform the existing one.
Keywords :
data analysis; data mining; database management systems; information retrieval; social networking (online); text analysis; LM; OLAP analysis; SNS messages; TF-IDF; data cube; decision making; information retrieval systems; language model; multidimensional analysis; multidimensional text database; online analytical processing; term frequency inverse document frequency; text cube model; unstructured text data analysis; Analytical models; Computational modeling; Data models; Databases; Frequency measurement; Information retrieval; Mathematical model; Multi-dimensional analysis; OLAP; TF-IDF; data cube; information retrieval; language model; text cube; text databases;
Conference_Titel :
Big Data and Cloud Computing (BdCloud), 2014 IEEE Fourth International Conference on
Conference_Location :
Sydney, NSW
DOI :
10.1109/BDCloud.2014.117