world crystal ball
© unknown

Forecasting accuracy of Global Climate Models is something that has been at the very heart of the global warming debate for some time. Leif Svalgaard turned me on to this paper in GRL today:

Reifen, C., and R. Toumi (2009), Climate projections: Past performance no guarantee of future skill?, Geophysical Research Letters, 36, L13704, doi:10.1029/2009GL038082.

PDF available here

It makes a very interesting point about the "stationarity" of climate feedback strengths. In a nutshell, it says that climate models break down after a time because both forcings and feedbacks don't remain static, and the program can't predict such changes.

Gavin Schmidt of NASA GISS says something similar in a recent interview:
The problem with climate prediction and projections going out to 2030 and 2050 is that we don't anticipate that they can be tested in the way you can test a weather forecast. It takes about 20 years to evaluate because there is so much unforced variability in the system which we can't predict - the chaotic component of the climate system - which is not predictable beyond two weeks, even theoretically. That is something that we can't really get a handle on.

From Edge: THE PHYSICS THAT WE KNOW: A Conversation With Gavin Schmidt
Some excerpts from the paper:
The principle of selecting climate models based on their agreement with observations has been tested for surface temperature using 17 of the IPCC AR4 models.


There is no evidence that any subset of models delivers significant improvement in prediction accuracy compared to the total ensemble.

With the ever increasing number of models, the question arises of how to make a best estimate prediction of future temperature change. The Intergovernmental Panel on Climate Change (IPCC) Fourth Assessment Report (AR4) combines the results of the available models to form an ensemble average, giving all models equal weight. Other studies argue in favor of treating some models as more reliable than others [Shukla et al., 2006; Giorgi and Mearns,
2002]. However, determining which models, if any, are superior is not straightforward. The IPCC comments:
''What does the accuracy of a climate model's simulation of past or contemporary climate say about the accuracy of its projections of climate change? This question is just beginning to be addressed. . .''[Intergovernmental Panel on Climate Change, 2007, p. 594].
One key assumption, on which the principle of performance-based selection rests, is that a model which performs better in one time period will continue to perform better in the future. This has been studied in terms of pattern-scaling using the ''perfect model assumption'' [Whetton et al., 2007]. We examine the question in an observational context for temperature here
for the first time. We will also quantify the effect of ensemble size on the global mean, Siberian and European temperature error. [3] The principle of averaging results from different
models to form a multi-model ensemble prediction also has potential problems, since models share biases and there is no guarantee that their errors will neatly cancel out. For this reason groups of models thus combined have been termed ''ensembles of opportunity'' [Piani et al., 2005]. Various studies have showed that multi-model ensembles produce more accurate results than single models [Kiktev et al., 2007; Mullen and Buizza, 2002]. Our examination of ensemble performance aims to address the question in the context of the current generation of climate models.


In our analysis there is no evidence of future prediction skill delivered by past performance-based model selection. There seems to be little persistence in relative model skill, as illustrated by the percentage turnover in Figure 3. We speculate that the cause of this behavior is the non-stationarity of climate feedback strengths. Models that respond accurately in one period are likely to have the correct feedback strength at that time. However, the feedback strength and forcing is not stationary, favoring no particular model or groups of models consistently. For example, one could imagine that in certain time periods the sea-ice albedo feedback is more important favoring those models that simulate sea-ice well. In another period,
El Nino may be the dominant mode, favoring those models that capture tropical climate better. On average all models have a significant signal to contribute.


While the authors of this paper still profess faith in model ensembles, the issues they point out with non-staionarity call into question the ability for any model to remain on-track for an extended forecast period.