DocumentCode :
3461148
Title :
Topics and Terms Mining in Unstructured Data Stores
Author :
Lomotey, Richard K. ; Deters, Ralph
Author_Institution :
Dept. of Comput. Sci., Univ. of Saskatchewan, Saskatoon, SK, Canada
fYear :
2013
fDate :
3-5 Dec. 2013
Firstpage :
854
Lastpage :
861
Abstract :
One of the major challenges of the "Big Data" epoch is unstructured data mining. The problem arises due to the storage of high-dimensional data that has no standard schema. While knowledge discovery in database (KDD) algorithms were designed for data extraction, the algorithms best fit for structured data storages. Moreover, today, at the data storage level, NoSQL databases have been deployed in response to accommodate the unstructured data. However, the over-reliance on multiple APIs by NoSQL storages hampers efficient data extraction from different NoSQL storages. Also, there are limited numbers of tools available that can perform KDD tasks on NoSQL data stores. In this work, we explore the trend in unstructured data mining and detail the future direction and challenges. Then, focusing on topics and terms extraction from NoSQL databases, we propose a tool called TouchR2, which algorithmically relies on bloom filtering and parallelization. Using the CouchDB data storage as the test case, the evaluation of TouchR2 shows high accuracy for terms extraction and organization within a much optimized duration.
Keywords :
application program interfaces; data mining; data structures; software tools; storage management; Big Data epoch; CouchDB data storage; KDD algorithms; NoSQL databases; TouchR2 tool; bloom filtering; data extraction; data storage level; high-dimensional data storage; knowledge discovery in database algorithms; multiple APIs; structured data storages; terms extraction; terms mining; topics extraction; topics mining; unstructured data mining; unstructured data stores; Association rules; Data handling; Data storage systems; Databases; Information management; Information retrieval; Association Rules; Big Data; Bloom Filtering; NoSQL; Terms; Topics; Unstructured Data Mining;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computational Science and Engineering (CSE), 2013 IEEE 16th International Conference on
Conference_Location :
Sydney, NSW
Type :
conf
DOI :
10.1109/CSE.2013.129
Filename :
6755309
Link To Document :
بازگشت