Title :
Filtering log data: Finding the needles in the Haystack
Author :
Yu, Li ; Zheng, Ziming ; Lan, Zhiling ; Jones, Terry ; Brandt, Jim M. ; Gentile, Ann C.
Author_Institution :
Illinois Inst. of Technol., Chicago, IL, USA
Abstract :
Log data is an incredible asset for troubleshooting in large-scale systems. Nevertheless, due to the ever-growing system scale, the volume of such data becomes overwhelming, bringing enormous burdens on both data storage and data analysis. To address this problem, we present a 2-dimensional online filtering mechanism to remove redundant and noisy data via feature selection and instance selection. The objective of this work is two-fold: (i) to significantly reduce data volume without losing important information, and (ii) to effectively promote data analysis. We evaluate this new filtering mechanism by means of real environmental data from the production supercomputers at Oak Ridge National Laboratory and Sandia National Laboratory. Our preliminary results demonstrate that our method can reduce more than 85% disk space, thereby significantly reducing analysis time. Moreover, it also facilitates better failure prediction and diagnosis by more than 20%, as compared to the conventional predictive approach relying on RAS (Reliability, Availability, and Serviceability) events alone.
Keywords :
data analysis; failure analysis; feature extraction; information filtering; mainframes; Oak Ridge National Laboratory; Sandia National Laboratory; analysis time reduction; data analysis; data storage; failure diagnosis; failure prediction; feature selection; instance selection; large-scale system troubleshooting; log data filtering; noisy data removal; production supercomputers; redundant data removal; two-dimensional online filtering mechanism; Accuracy; Correlation; Data analysis; Large-scale systems; Measurement; Monitoring; Supercomputers;
Conference_Titel :
Dependable Systems and Networks (DSN), 2012 42nd Annual IEEE/IFIP International Conference on
Conference_Location :
Boston, MA
Print_ISBN :
978-1-4673-1624-8
Electronic_ISBN :
1530-0889
DOI :
10.1109/DSN.2012.6263948