CART Decision-Tree Statistical Analysis and Prediction of Summer Season Maximum Surface Ozone for the Vancouver, Montreal, and Atlantic Regions of Canada.
Prediction of daily maximum surface ozone (O<SUB>3</SUB>) concentration was begun by Environment Canada in the spring of 1993 for the Vancouver, Montreal, and Atlantic regions in order to advise the public of expected air quality. Forecasts have been issued for southern Ontario for many years by the province of Ontario, but this is a new undertaking in other parts of the country, where air quality has become a concern in recent years. There is a need for guidance to prepare the forecasts, particularly for prediction of surface O<SUB>3</SUB> concentration levels near or exceeding the Canadian 1-h maximum acceptable concentration of 82 ppb. Such occurrences are episodic and relatively rare in southern Canada. Probability of occurrence is in the range 0.00 0.08 at the sites in the regions studied here, thus, reliable prediction is difficult without guidance. Mesoscale numerical meteorological photochemical models are not currently available for routine use in operations, but the capability exists for development and use of sophisticated multivariate statistical techniques for prediction of daily maximum O<SUB>3</SUB> concentration. Most statistical ozone forecast procedures to date in Canada have been based on multiple linear regression with a limited number of predictors mainly drawn from surface meteorology and subjective classification of the synoptic meteorological flow pattern. Since the relationship between surface O<SUB>3</SUB> and meteorology is nonlinear, tree-based statistical models with several predictors are appropriate for developing objective forecast guidance.Surface and upper-air meteorological predictors and other predictors were matched with several years of observed daily maximum O<SUB>3</SUB> concentrations for the months of May September at air-monitoring sites in the three regions. A recent nonparametric data-driven tree-based analysis method known as CART (classification and regression trees) was used to analyze the data at each site. The decision trees built by CART were found to fit the data reasonably well, and the rules for node splitting were found to be physically realistic. Some of the important aspects of the analyses are noted. One interesting result was that moisture content of the air plays a limiting role on the maximum surface O<SUB>3</SUB> concentration that can be achieved when other factors point to occurrence of high values.The decision trees can be used to predict maximum surface O<SUB>3</SUB> concentrations if the predictor variables are forecast, thus providing an inexpensive site-specific model for forecasts and climate impact analysis. An estimation of performance with independent data was conducted for the Vancouver lower Fraser River valley and Montreal regions for each of the five years 1988 92. Verification of the ensemble of forecasts in the two regions shows the technique would have reasonably good skill in forecasting surface O<SUB>3</SUB> concentrations near or exceeding acceptable 1-h limits. A computer version of the technique has been provided for use in the regional forecast offices.