• DocumentCode
    262585
  • Title

    A Multi-dimensional Analysis and Data Cube for Unstructured Text and Social Media

  • Author

    Suan Lee ; Namsoo Kim ; Jinho Kim

  • Author_Institution
    Dept. of Comput. Sci., Kangwon Nat. Univ., Chuncheon, South Korea
  • fYear
    2014
  • fDate
    3-5 Dec. 2014
  • Firstpage
    761
  • Lastpage
    764
  • Abstract
    Recently, unstructured data like texts, documents, or SNS messages has been increasingly being used in many applications, rather than structured data consisting of simple numbers or characters. Thus it becomes more important to analysis unstructured text data to extract valuable information for usres decision making. Like OLAP (On-Line Analytical Processing) analysis over structured data, Multi-dimensional analysis for these unstructured data is popularly being required. To facilitate these analysis requirements on the unstructured data, a text cube model on multi-dimensional text database has been proposed. In this paper, we extended the existing text cube model to incorporate TF-IDF (Term Frequency Inverse Document Frequrency) and LM (Language Model) as measurements. Because the proposed text cube model utilizes new measurements which are more popular in information retrieval systems, it is more efficient and effective to analysis text databases. Through experiments, we revealed that the performance and the effectiveness of the proposed text cube outperform the existing one.
  • Keywords
    data analysis; data mining; database management systems; information retrieval; social networking (online); text analysis; LM; OLAP analysis; SNS messages; TF-IDF; data cube; decision making; information retrieval systems; language model; multidimensional analysis; multidimensional text database; online analytical processing; term frequency inverse document frequency; text cube model; unstructured text data analysis; Analytical models; Computational modeling; Data models; Databases; Frequency measurement; Information retrieval; Mathematical model; Multi-dimensional analysis; OLAP; TF-IDF; data cube; information retrieval; language model; text cube; text databases;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Big Data and Cloud Computing (BdCloud), 2014 IEEE Fourth International Conference on
  • Conference_Location
    Sydney, NSW
  • Type

    conf

  • DOI
    10.1109/BDCloud.2014.117
  • Filename
    7034871