DocumentCode :
3106034
Title :
Forecasting Skewed Biased Stochastic Ozone Days: Analyses and Solutions
Author :
Zhang, Kun ; Fan, Wei ; Yuan, Xiaojing ; Davidson, Ian ; Li, Xiangshang
Author_Institution :
Dept. of EECS, Tulane Univ., New Orleans, LA
fYear :
2006
fDate :
18-22 Dec. 2006
Firstpage :
753
Lastpage :
764
Abstract :
Much work on skewed, stochastic, high dimensional, and biased datasets usually implicitly solve each problem separately. Recently, we have been approached by Texas Commission on Environmental Quality (TCEQ) to help them build highly accurate ozone level alarm forecasting models for the Houston area, where these technical difficulties come together in one single problem. Key characteristics of this problem that are challenging and interesting include: 1) the dataset is sparse (72 features, and 2% or 5% positives depending on the criteria of "ozone days"), 2) evolving over time from year to year, 3) limited in collected data size (7 years or around 2500 data entries), 4) contains a large number of irrelevant features, 5) is biased in terms of "sample selection bias", and 6) the true model is stochastic as a function of measurable factors. Besides solving a difficult application problem, this dataset offers a unique opportunity to explore new and existing data mining techniques, and to provide experience and guidance for similar problems. Our main technical focus addresses on how to estimate reliable probability given both sample selection bias and a large number of irrelevant features, and how to choose the most reliable decision threshold to predict the unknown future with different distribution. On the application side, the prediction accuracy of our approach is 20% higher in recall (correctly detects 1 to 3 more ozone days, depending on the year) and 10% higher in precision (15 to 30 fewer false alarm days per year) than state-of-the-art methods used by air quality control scientists, and these results are significant for TCEQ.
Keywords :
data mining; environmental science computing; Texas Commission on Environmental Quality; data mining techniques; forecasting models; sample selection bias; skewed biased stochastic ozone days; Accuracy; Data engineering; Data mining; Geophysical measurements; Predictive models; Quality control; Size measurement; Stochastic processes; Technology forecasting; Time measurement;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining, 2006. ICDM '06. Sixth International Conference on
Conference_Location :
Hong Kong
ISSN :
1550-4786
Print_ISBN :
0-7695-2701-7
Type :
conf
DOI :
10.1109/ICDM.2006.73
Filename :
4053100
Link To Document :
بازگشت