Title :
Survival analysis for HDLSS data with time dependent variables: Lessons from predictive maintenance at a mining service provider
Author :
Hochstein, Axel ; Hyung-Il Ahn ; Ying Tat Leung ; Denesuk, Matthew
Author_Institution :
IBM Almaden Res., San Jose, CA, USA
Abstract :
In gene expression analysis it is often the goal to predict survival given a high-dimensional space of covariates. In corresponding literature models are described that deal with low sample size which is a typical feature of such studies. This is also the case in asset management services where downtime of assets is very costly and thereby replacements are scheduled long before the actual risk of failure increases. Although sometimes good surrogates of the true failure probability are available, it is in practice often the case that a number of weak predictors exist which needed to be filtered from a large set of candidates. Although the challenge is similar to gene expression analysis, a crucial difference is that covariates in condition monitoring are dynamic whereas genes are not. The result is that in gene expression analysis any data in between failure can be omitted, which leads to a potentially high bias in variable selection for condition monitoring. The authors are not aware of any survival models that deal with high dimensional low sample size (HDLSS) data in case of time-dependent covariates. In this paper we evaluate the performance of different modeling techniques in case of HDLSS survival data including the definition of a discrete time model where survival is modeled as a locally independent, binary outcome variable. We thereby study the trade-off between omitting measurements between times of failure and disregarding temporal dependencies. The analysis is based on a real life case study where 39 components of 50 mining haul trucks were monitored in operations over almost 6 years.
Keywords :
data analysis; failure analysis; maintenance engineering; materials handling equipment; mining; road vehicles; HDLSS survival data; discrete time model; failure times; high dimensional low sample size data; locally independent binary outcome variable; mining haul trucks; mining service provider; modeling techniques; predictive maintenance; survival analysis; time dependent variables; time-dependent covariates; Data mining; Data models; Gene expression; Hazards; Hidden Markov models; Input variables; Prognostics and health management; Adaboost; Cox proportional hazards; HDLSS; Predictive maintenance; condition monitoring; feature selection; hidden Markov models; mining operations; sensor data;
Conference_Titel :
Service Operations and Logistics, and Informatics (SOLI), 2013 IEEE International Conference on
Conference_Location :
Dongguan
Print_ISBN :
978-1-4799-0529-4
DOI :
10.1109/SOLI.2013.6611443