• DocumentCode
    2938541
  • Title

    Characterizing E-Science File Access Behavior via Latent Dirichlet Allocation

  • Author

    Yusik Kim ; Germain-Renaud, C.

  • Author_Institution
    LRI, Univ. Paris-Sud 11, Orsay, France
  • fYear
    2011
  • fDate
    5-8 Dec. 2011
  • Firstpage
    162
  • Lastpage
    169
  • Abstract
    E-science is moving from grids to clouds. Getting the best of both worlds needs to build on the experience gained by the steady operation of production grids since some years. We propose a new approach for analyzing behavioral traces: as most of them are indeed text documents, state of the art techniques in text mining, and specifically latent Dirichlet allocation, can be exploited. The advantages are twofold: providing some level of explanation inferred from the data, and a relatively scalable way to capture the temporal variability of the behavior of interest, while retaining the full dimensionality of the problem at hand. We experiment the text mining analogy by characterizing file access behavior on data from the steady operation of the largest production grid. We validate the resulting probabilistic model by showing that it is capable of generating synthetic traces statistically consistent with the real ones. The approach would equally apply to wider contexts such as social networks activity or web access.
  • Keywords
    cloud computing; data mining; grid computing; information retrieval; natural sciences computing; social networking (online); text analysis; Web access; behavioral trace analysis; cloud computing; e-science file access behavior characterization; grid computing; latent Dirichlet allocation; probabilistic model; production grid; social network activity; text document; text mining; Correlation; Joints; Maximum likelihood estimation; Measurement; Probabilistic logic; Text mining; Vectors; Graphical Models; Trace Analysis; e-science infrastructures;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Utility and Cloud Computing (UCC), 2011 Fourth IEEE International Conference on
  • Conference_Location
    Victoria, NSW
  • Print_ISBN
    978-1-4577-2116-8
  • Type

    conf

  • DOI
    10.1109/UCC.2011.31
  • Filename
    6123494