Earlier this week, a new paper appeared in GRL by Nicola Scafetta (Scafetta, 2022) which purported to conclude that the CMIP6 models with medium or high climate sensitivity (higher than 3ºC) were not consistent with recent historical temperature changes. Since there have been a number of papers already on this topic, notably Tokarska et al (2020), which did not come to such a conclusion, it is worthwhile to investigate where Scafetta’s result comes from. Unfortunately, it appears to emerge from a mis-appreciation of what is in the CMIP6 archive, an inappropriate statistical test, and a total neglect of observational uncertainty and internal variability.
I, together with John Kennedy and Gareth Jones from the UK Met Office, have put together a short explanation of what we think was done wrong. There are three main points:
- Not taking into account uncertainty in the observational data.
- Not looking at the individual simulations instead of just the model ensemble mean.
- Applying a statistical test that is guaranteed to reject any specific realization of internal variability if the forced signal is well constrained.
The first two points are clearly seen in the following figure.
The pink shading is the uncertainty in the observational temperature difference, and the individual model runs (172 of them) are the black dots (those from the same model are in a horizontal line). The green triangles are the ensemble mean for each of the 37 models (for models with only one ensemble member in the archive, the green triangle lies on top of the black dot). A couple of things are obvious. First, there are 3 models with ECS > 3ºC that are have an ensemble mean consistent the ERA5 change within the uncertainty but more importantly, 49 ensemble members from 18 models are compatible with the ERA5 result. Of those 18 models, half of them have ECS above 3ºC. This is in direct contradiction with Scafetta’s claim that “all models with ECS > 3.0ºC overestimate the observed global surface warming”.
Scafetta’s analysis only used the ensemble mean from each model (the green triangles) (despite claiming to look a single simulation), and totally ignored the ERA5 uncertainty. These choices give a fundamentally misleading result. Curiously, when referencing the ERA5 data, he cites the description paper from ERSSTv5 – an ocean temperature dataset – instead of Hersbach et al (2020).
The error in the second half of his analysis is related to the error made by Douglass et al (2008) and discussed in Santer et al (2008). Scafetta tests the difference between the model ensemble mean (the forced pattern) and the exact observational pattern (which is a combination of a forced signal and a realization of the internal variability), against the uncertainty in the forced pattern. This has the bizarre property that you would be almost guaranteed to eventually reject all of the specific model realizations as the number of ensemble members increases (since the uncertainty on the mean decreases with
). The results from Scafetta’s test are therefore not reliable.
What to do?
GRL does not accept comments on it’s published papers, a situation we have discussed here before. However, it does have a complaints procedure. This involves the issue(s) being submitted to the GRL Editorial Office, and them asking for a response from the author(s). Upon receipt of the response, the Editorial Board will decide how to proceed. This could be anything from doing nothing, to publishing a correction, or ultimately, forcing a retraction. Thus the three of us have formally submitted the note linked above to the GRL Editors. So we will see!
Note that we were able to put together this complaint very quickly because of the availability of public archives of ERA5 data, ECS numbers for the CMIP6 models, and the Climate Explorer site, and the fact that errors like this have been made many times before.
What will this mean?
As we state in the note, just because Scafetta’s analysis is flawed, that doesn’t mean that all CMIP6 models perform skillfully in the historical period. As we’ve discussed previously, the CMIP6 archive needs to be dealt with more carefully than in previous iterations (#NotAllModels, Making predictions with the CMIP6 ensemble). Additionally, the poor performance of a specific model with respect to these kinds of observations might still be a function of incorrect forcings (such as aerosols where there is still a lot of uncertainty).
We will keep people informed of what happens…
N. Scafetta, “Advanced Testing of Low, Medium, and High ECS CMIP6 GCM Simulations Versus ERA5‐T2m”, Geophysical Research Letters, vol. 49, 2022. http://dx.doi.org/10.1029/2022GL097716
K.B. Tokarska, M.B. Stolpe, S. Sippel, E.M. Fischer, C.J. Smith, F. Lehner, and R. Knutti, “Past warming trend constrains future warming in CMIP6 models”, Science Advances, vol. 6, 2020. http://dx.doi.org/10.1126/sciadv.aaz9549
H. Hersbach, B. Bell, P. Berrisford, S. Hirahara, A. Horányi, J. Muñoz‐Sabater, J. Nicolas, C. Peubey, R. Radu, D. Schepers, A. Simmons, C. Soci, S. Abdalla, X. Abellan, G. Balsamo, P. Bechtold, G. Biavati, J. Bidlot, M. Bonavita, G. Chiara, P. Dahlgren, D. Dee, M. Diamantakis, R. Dragani, J. Flemming, R. Forbes, M. Fuentes, A. Geer, L. Haimberger, S. Healy, R.J. Hogan, E. Hólm, M. Janisková, S. Keeley, P. Laloyaux, P. Lopez, C. Lupu, G. Radnoti, P. Rosnay, I. Rozum, F. Vamborg, S. Villaume, and J. Thépaut, “The ERA5 global reanalysis”, Quarterly Journal of the Royal Meteorological Society, vol. 146, pp. 1999-2049, 2020. http://dx.doi.org/10.1002/qj.3803
D.H. Douglass, J.R. Christy, B.D. Pearson, and S.F. Singer, “A comparison of tropical temperature trends with model predictions”, International Journal of Climatology, vol. 28, pp. 1693-1701, 2008. http://dx.doi.org/10.1002/joc.1651