• DocumentCode
    228749
  • Title

    FAST: Near Real-Time Searchable Data Analytics for the Cloud

  • Author

    Yu Hua ; Hong Jiang ; Dan Feng

  • Author_Institution
    Wuhan Nat. Lab. for Optoelectron., Huazhong Univ. of Sci. & Technol., Wuhan, China
  • fYear
    2014
  • fDate
    16-21 Nov. 2014
  • Firstpage
    754
  • Lastpage
    765
  • Abstract
    With the explosive growth in data volume and complexity and the increasing need for highly efficient searchable data analytics, existing cloud storage systems have largely failed to offer an adequate capability for real-time data analytics. Since the true value or worth of data heavily depends on how efficiently data analytics can be carried out on the data in (near-) real-time, large fractions of data end up with their values being lost or significantly reduced due to the data staleness. To address this problem, we propose a near-real-time and cost-effective searchable data analytics methodology, called FAST. The idea behind FAST is to explore and exploit the semantic correlation within and among datasets via correlation-aware hashing and manageable flat-structured addressing to significantly reduce the processing latency, while incurring acceptably small loss of data-search accuracy. The near-real-time property of FAST enables rapid identification of correlated files and the significant narrowing of the scope of data to be processed. FAST supports several types of data analytics, which can be implemented in existing searchable storage systems. We conduct a real-world use case in which children reported missing in an extremely crowded environment (e.g., A highly popular scenic spot on a peak tourist day) are identified in a timely fashion by analyzing 60 million images using FAST. Extensive experimental results demonstrate the efficiency and efficacy of FAST in the performance improvements and energy savings.
  • Keywords
    cloud computing; file organisation; FAST; cloud storage system; correlation-aware hashing; real-time searchable data analytics; semantic correlation; Complexity theory; Correlation; Data analysis; Feature extraction; Real-time systems; Semantics; Vectors; Cloud storage; data analytics; real-time performance; semantic correlation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    High Performance Computing, Networking, Storage and Analysis, SC14: International Conference for
  • Conference_Location
    New Orleans, LA
  • Print_ISBN
    978-1-4799-5499-5
  • Type

    conf

  • DOI
    10.1109/SC.2014.67
  • Filename
    7013049