the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Towards an ensemble-based evaluation of land surface models in light of uncertain forcings and observations
Vivek K. Arora
Christian Seiler
Libo Wang
Sian Kou-Giesbrecht
Download
- Final revised paper (published on 06 Apr 2023)
- Preprint (discussion started on 29 Jul 2022)
Interactive discussion
Status: closed
-
RC1: 'Comment on egusphere-2022-641', Anonymous Referee #1, 26 Aug 2022
Review of egusphere-2022-641:
Towards an ensemble-based evaluation of land surface models in light of uncertain forcings and observations
Summary
The authors present an evaluation of their updated land model, CLASSIC. The primary update from a previous model is a new nitrogen cycle, although new land cover reference data have been implemented also. Notably, they evaluate the model using two different meteorological driving data sets, and also comparing new vs old land cover data and with/without the new nitrogen cycle. Their evaluation system effectively compares model scores to benchmark scores that are based on observational uncertainty. They conclude that their new model is reasonable compared to both observation and other models, and also that present-day land atmosphere co2 flux is independent from the initial land carbon state, with respect to the variations included in this experiment.
Overall review
I appreciate the authors’ more comprehensive approach to evaluating their updated land model. The evaluation system provides clear results. However, in the end it isn’t clear that the model is better, but it appears that the GSWP3 forcing gives better results than the CRU forcing. I also think that the conclusions regarding the independence of flux from initial land state are overstated, mainly because this is a highly constrained case where the land start and end points are shifted by a similar amount and the land change trajectory is nearly identical (but shifted) between the two cases. I recommend the following main revisions (see below for additional details):
1) The different land cover cases need to be redefined. They are not different reconstructions. They just reflect an update in the present-day reference land cover data that are used to anchor the model’s land trajectory. While this is a reasonable update, it does not represent the uncertainty of land use/cover change.
2) Qualify your conclusions regarding the robustness of model fluxes under different initial carbon states. This is a very specific, highly constrained comparison and there are many factors and uncertainties, particularly in the land space, that are not considered here but have substantial impacts on carbon flux and storage estimates.
3) Complete your background and comparisons with literature on land data uncertainties. Some suggestions are below, but there is more out there showing the complexity of this problem.
4) Swap the more useful appendix figures for the unreadable paper figures. Try to make the figures more readable.
Specific comments/suggestions
Abstract
line 6:
I would not call these two land cover sets different historical reconstructions. You just replaced your current day reference land cover with newer data. There are many other factors that affect the reconstruction, most notably the land use data and the assumptions used to apply land use to land cover.
line 12:
awkward sentence transition. probably do not need the beginning of this sentence; just start with “Simulated area burned…”
Introduction
lines 90-94:
there are other studies on this topic. the most relevant one is probably this one because it addresses uncertainty in land cover in conjunction with the effects of co2, nitrogen deposition, and climate:
A.V. Di Vittorio, J. Mao, X. Shi, L. Chini, G. Hurtt, and W.D. Collins, “Quantifying the effects of historical land cover uncertainty on global carbon and climate estimates”, Geophysical Research Letters. doi: 10.1002/2017GL075124.
This one looks at land change emissions across several land cover representations:
Peng, S., P. Ciais, F. Maignan, W. Li,
J. Chang, T. Wang, and C. Yue (2017),
Sensitivity of land use change emission
estimates to historical land use and land
cover mapping, Global Biogeochem.
Cycles, 31, 626–643, doi:10.1002/
2015GB005360.
but this one is also relevant:
A.V. Di Vittorio, X. Shi, B. Bond-Lamberty, K. Calvin, A. Jones, 2020, “Initial land use/cover distribution substantially affects global carbon and local temperature projections in the integrated Earth system model”, Global Biogeochemical Cycles. doi: 10.1029/2019GB00683.
CLASSIC modeling framework
lines 178-180:
Some folks may disagree here. Different types of trees have different leaf/canopy shapes, orientations, and colors that may affect interception and also radiative processes.
Driving data
line 230:
This may be true in some cases, but the later step of creating the reconstruction by applying the land use trajectory to this static cover map can generate greater uncertainty, not to mention the additional uncertainty in the land use data. See the papers above. See comment below. And also these:
Di Vittorio, A.V., L.P. Chini, B. Bond-Lamberty, J. Mao, X. Shi, J. Truesdale, A. Craig, K. Calvin, A. Jones, W.D. Collins, J. Edmonds, G.C. Hurtt, P. Thornton, and A. Thomson (2014). From land use to land cover: restoring the afforestation signal in a coupled integrated assessment - earth system model and the implications for CMIP5 RCP simulations, Biogeosciences, 11:6435-6450, 2014, doi: 10.5194/bg-11-6435-2014.
Meiyappan, P., and A. K. Jain (2012), Three distinct global estimates of historical land-cover change and land-use conversions for over
200 years, Front. Earth Sci., 6(2), 122–139, doi:10.1007/s11707-012-0314-2.
lines 266-286:
Figure 1 indicates that changing your reference cover map does not generate the greatest uncertainty. The range of vegetation across the other models is much greater than the difference you show for your data. What is the nominal year for your data sets? What about for the other models? What is actually driving the variability in these data across trendy? Are they all using different reference rs data? Or are other factors contributing?
lines 358-387:
This section is unclear, particularly with respect to how the benchmark scores are calculated (the ones comparing the obs). You show the benchmark scores in figure 10, but it isn’t clear how these are calculated. I do like this benchmarking system, though.
line 393:
figures a2-a16 are much more useful than figures 3-9, which are unreadable.
Results
lines 558-559:
It is unclear how the benchmark scores are determined.
Conclusion
lines 649-651:
The key word here is “present-day flux.” The cumulative emissions over time are dependent on the land cover change trajectory, which you do not alter in these scenarios. This is also why your previous statement regarding model response being independent of initial land state makes sense here; you do not have a different transient land path that would change the outcome. Your two land covers are both estimates of “present-day” cover, and as such are not that different from each other. And using the same land cover backcasting your initial state changes by a similar amount. So the flux is also constrained by similarly adjusted endpoints.
Note that there are grammatical typos throughout.
Figures and Tables
Figures 3-5:
I cannot determine which simulations are where. A couple of colors are clear, but the groups of lines are muddled together and I cannot tell which simulations are in which group. If you colored them by output group it would be easier to tell which sims have similar results. Using a temporal average may also help (without the annual values shown, which make it messy).
Figures 6-9:
These are difficult to read. Since output groupings are less apparent here, I suggest selecting colors that reflect the experiment groupings. Temporal averages may also help here (without the annual values shown).
Figures A2-A16:
make sure the axis scales match across all panels in each figure.
Citation: https://doi.org/10.5194/egusphere-2022-641-RC1 -
AC1: 'Reply to Referee #1', Vivek Arora, 07 Sep 2022
We thank Referee #1 for their helpful comments. Our replies to their comments are shown in bold below.
SummaryThe authors present an evaluation of their updated land model, CLASSIC. The primary update from a previous model is a new nitrogen cycle, although new land cover reference data have been implemented also. Notably, they evaluate the model using two different meteorological driving data sets, and also comparing new vs old land cover data and with/without the new nitrogen cycle. Their evaluation system effectively compares model scores to benchmark scores that are based on observational uncertainty. They conclude that their new model is reasonable compared to both observation and other models, and also that present-day land atmosphere co2 flux is independent from the initial land carbon state, with respect to the variations included in this experiment.
Overall review
I appreciate the authors’ more comprehensive approach to evaluating their updated land model. The evaluation system provides clear results. However, in the end it isn’t clear that the model is better, but it appears that the GSWP3 forcing gives better results than the CRU forcing. I also think that the conclusions regarding the independence of flux from initial land state are overstated, mainly because this is a highly constrained case where the land start and end points are shifted by a similar amount and the land change trajectory is nearly identical (but shifted) between the two cases. I recommend the following main revisions (see below for additional details):
Thank you for your overall positive review of our manuscript.
1) The different land cover cases need to be redefined. They are not different reconstructions. They just reflect an update in the present-day reference land cover data that are used to anchor the model’s land trajectory. While this is a reasonable update, it does not represent the uncertainty of land use/cover change.
Thank you for noting this. Yes, it is correct that the change in crop area over the historical period in both land cover data sets in our study is the same. In this sense, we agree, that it is not entirely correct to call these land cover data sets two reconstructions. The distinction here is between land use change (LUC) and land cover. We treat LUC similarly despite the differences in the two land cover cases. If we are given the opportunity to revise our manuscript, we will clarify this distinction and redefine the two land cover cases.
2) Qualify your conclusions regarding the robustness of model fluxes under different initial carbon states. This is a very specific, highly constrained comparison and there are many factors and uncertainties, particularly in the land space, that are not considered here but have substantial impacts on carbon flux and storage estimates.
The response of the terrestrial biosphere over the historical period is driven by four primary global change drivers – increasing CO2, changing climate, land use change (LUC), and N deposition. We agree that in our current framework we haven’t taken into account the uncertainty associated with LUC and yes there are several other uncertainties as well. However, our simulations do allow us to evaluate how the response to the three other global change drivers is dependent on two driving meteorological data, two land cover cases, and two model variations (with and without an interactive N cycle). We will clarify this point when revising our manuscript.
Please also note that our statement about little dependence on the initial land carbon state is in the context of the NET atmosphere-land CO2 flux. The reason why this happens is that the model is first spun up to equilibrium conditions and then forced with time-variant forcings. So while the absolute fluxes (Gross primary productivity and respiratory fluxes) are different, the NET atmosphere-land CO2 flux is similar across simulations in that the net flux from all simulations lies within the uncertainty range from the Global Carbon Project.3) Complete your background and comparisons with literature on land data uncertainties. Some suggestions are below, but there is more out there showing the complexity of this problem.
Thank you for pointing to the additional references that highlight the uncertainty associated with LUC emissions. These additional references will help us highlight and clarify that our framework does not account for uncertainty associated with LUC.
4) Swap the more useful appendix figures for the unreadable paper figures. Try to make the figures more readable.
We were not sure if the figures showing the spread across the simulations or the figures showing the effect of land cover, meteorological data, and the inclusion or the absence of the N cycle separately were more useful. We will swap the figures between the appendix and the main text.
Specific comments/suggestions
Abstract
line 6: I would not call these two land cover sets different historical reconstructions. You just replaced your current day reference land cover with newer data. There are many other factors that affect the reconstruction, most notably the land use data and the assumptions used to apply land use to land cover.
Yes, we agree as mentioned above and we will redefine the two land cover cases.
line 12: awkward sentence transition. probably do not need the beginning of this sentence; just start with “Simulated area burned…”
Thanks for your suggestion.Introduction
lines 90-94: there are other studies on this topic. the most relevant one is probably this one because it addresses uncertainty in land cover in conjunction with the effects of co2, nitrogen deposition, and climate:
A.V. Di Vittorio, J. Mao, X. Shi, L. Chini, G. Hurtt, and W.D. Collins, “Quantifying the effects of historical land cover uncertainty on global carbon and climate estimates”, Geophysical Research Letters. doi: 10.1002/2017GL075124.
This one looks at land change emissions across several land cover representations:
Peng, S., P. Ciais, F. Maignan, W. Li, J. Chang, T. Wang, and C. Yue (2017), Sensitivity of land use change emission estimates to historical land use and land cover mapping, Global Biogeochem. Cycles, 31, 626–643, doi:10.1002/2015GB005360.
but this one is also relevant:
A.V. Di Vittorio, X. Shi, B. Bond-Lamberty, K. Calvin, A. Jones, 2020, “Initial land use/cover distribution substantially affects global carbon and local temperature projections in the integrated Earth system model”, Global Biogeochemical Cycles. doi: 10.1029/2019GB00683.
Thank you for mentioning these references that will help us highlight and clarify that our framework does not account for uncertainty associated with LUC.
CLASSIC modeling framework
lines 178-180: Some folks may disagree here. Different types of trees have different leaf/canopy shapes, orientations, and colors that may affect interception and also radiative processes.
This statement was made in the context of current formulations used in land surface models which typically only use leaf area index (or plant area index) and a PFT-dependent parameter to calculate the storage capacity of leaves for calculating how much precipitation is intercepted. Such an approach is used in CLASSIC. In this context, the PFT-dependent parameter accounts for leaf shape and orientation but not the underlying deciduous or evergreen phenology of the leaves. We will reword our statement to clarify this.
Driving data
line 230: This may be true in some cases, but the later step of creating the reconstruction by applying the land use trajectory to this static cover map can generate greater uncertainty, not to mention the additional uncertainty in the land use data. See the papers above. See comment below. And also these:
Di Vittorio, A.V., L.P. Chini, B. Bond-Lamberty, J. Mao, X. Shi, J. Truesdale, A. Craig, K. Calvin, A. Jones, W.D. Collins, J. Edmonds, G.C. Hurtt, P. Thornton, and A. Thomson (2014). From land use to land cover: restoring the afforestation signal in a coupled integrated assessment - earth system model and the implications for CMIP5 RCP simulations, Biogeosciences, 11:6435-6450, 2014, doi: 10.5194/bg-11-6435-2014.
Meiyappan, P., and A. K. Jain (2012), Three distinct global estimates of historical land-cover change and land-use conversions for over 200 years, Front. Earth Sci., 6(2), 122–139, doi:10.1007/s11707-012-0314-2.
We take your point. In fact, there are two sets of uncertainties here. The first is converting 20-40 land cover classes to a much smaller set of plant functional types (PFTs) that a model simulates, and as we showed in our manuscript this affects the pre-industrial state of vegetation and soil carbon (899 vs 1171 Pg C in our case) but also the magnitude of the current terrestrial sink. The second set of uncertainties is related to incorporating LUC data into a model’s land cover over the historical period (as you highlighted) which is what leads to uncertainties in LUC emissions and therefore also affects the terrestrial sink. These two sets of uncertainties affect model behaviour differently. This latter set of uncertainties is not taken into account in our framework and we will revise our manuscript to clarify this.
lines 266-286: Figure 1 indicates that changing your reference cover map does not generate the greatest uncertainty. The range of vegetation across the other models is much greater than the difference you show for your data. What is the nominal year for your data sets? What about for the other models? What is actually driving the variability in these data across trendy? Are they all using different reference rs data? Or are other factors contributing?
Yes, while in terms of total vegetated area the GLC2000 and ESA-CCI based land covers are not that different, the difference is large for the area of grasses and this leads to the 899 vs 1171 Pg C soil carbon difference. This is noted in the manuscript. The data are averaged over the period 1992-2018 for CLASSIC and all TRENDY models. The reason for the variability in vegetated area, and area of trees and grasses across models, is the subjectiveness in the process of mapping/reclassifying 20-40 land cover classes in land cover products to a selected number of PFTs in land models and that land modelling groups use different land cover products. We will clarify this when revising our manuscript.
lines 358-387: This section is unclear, particularly with respect to how the benchmark scores are calculated (the ones comparing the obs). You show the benchmark scores in figure 10, but it isn’t clear how these are calculated. I do like this benchmarking system, though.
The benchmarking scores and how they are calculated are explained in the following paper.
Seiler, C., Melton, J. R., Arora, V., Sitch, S., Friedlingstein, P., Arneth, A., Goll, D. S., Jain, A., Joetzjer, E., Lienert, S., Lombardozzi, D., Luyssaert, S., Nabel, J. E. M. S., Tian, H., Vuichard, N., Walker, A. P., Yuan, W., and Zaehle, S. 2022. Are terrestrial biosphere models fit for simulating the global land carbon sink? Journal of Advances in Modeling Earth Systems, p.e2021MS002946. https://doi.org/10.1029/2021MS002946.Short of including the whole description in the main text we will include the details in our appendix when revising our manuscript so that a reader doesn’t have to refer to the above paper.
line 393: figures a2-a16 are much more useful than figures 3-9, which are unreadable.We will flip the figures in the main text versus the appendix when revising our manuscript.
Results
lines 558-559: It is unclear how the benchmark scores are determined.
An additional section in the appendix of the revised manuscript will explain the benchmarking process in more detail as mentioned above.
Conclusion
lines 649-651: The key word here is “present-day flux.” The cumulative emissions over time are dependent on the land cover change trajectory, which you do not alter in these scenarios. This is also why your previous statement regarding model response being independent of initial land state makes sense here; you do not have a different transient land path that would change the outcome. Your two land covers are both estimates of “present-day” cover, and as such are not that different from each other. And using the same land cover backcasting your initial state changes by a similar amount. So the flux is also constrained by similarly adjusted endpoints.
Since we do not take into account different LUC trajectories, it is also implied that had we not taken LUC into account at all (i.e. simulations were driven with only increasing CO2, changing climate, and increasing N deposition) even then the net atmosphere-land CO2 flux would have been similar across different simulations. This suggests that the present-day net land-atmosphere CO2 flux indeed is largely independent of the pre-industrial land carbon state in so far as the response to three other global drivers is concerned (CO2, climate, and N deposition). As mentioned earlier, we agree that the caveat related to LUC emissions has to be made more clear in our manuscript, and we note your feedback that we have overstated our conclusion related to the independence of present-day net atmosphere-land CO2 flux. We will reword this conclusion and tone down the message.
If we were to plot the cumulative emissions from 1960 onwards so that they can be compared to estimates from the Global Carbon Project (GCP), similar to Figure 9a in the manuscript, even then all eight simulations lie within the uncertainty of estimates from the GCP as shown below.Note that there are grammatical typos throughout.
Thank you for noting these. We will address all these when revising our manuscript.
Figures and Tables
Figures 3-5:
I cannot determine which simulations are where. A couple of colors are clear, but the groups of lines are muddled together and I cannot tell which simulations are in which group. If you colored them by output group it would be easier to tell which sims have similar results. Using a temporal average may also help (without the annual values shown, which make it messy).
Figures 6-9:
These are difficult to read. Since output groupings are less apparent here, I suggest selecting colors that reflect the experiment groupings. Temporal averages may also help here (without the annual values shown).
The purpose of these figures is to give the reader an idea of the spread across the eight simulations. Even if the colours are wisely chosen there will be lines that overlap other lines for some variables. As suggested by you we will move these figures to the appendix, and the appendix figures to the main text.
Figures A2-A16:
make sure the axis scales match across all panels in each figure.
Thank you for noting this. We will make the y-axis scale similar for all sub-panels of a figure.
Citation: https://doi.org/10.5194/egusphere-2022-641-AC1
-
AC1: 'Reply to Referee #1', Vivek Arora, 07 Sep 2022
-
RC2: 'Comment on egusphere-2022-641', Anonymous Referee #2, 02 Sep 2022
General comments:
Arora et al. evaluate land model uncertainty using an ensemble of simulations with different model structure, forcings, and observations. This type of model study is useful to understanding quantities like the land carbon sink in the context of these uncertainties. The results show that biogeophysical variables like runoff and sensible heat flux are most impacted by meteorological forcing, while biogeochemical variables like vegetation biomass are most impacted by having an interactive nitrogen cycle. This is not necessarily surprising, but useful to have summarized here. The results on net atmosphere-land CO2 flux being independent of land carbon state are interesting and could be highlighted more visually. The benchmarking is also useful, and hopefully the AMBER tool can be shared more widely.
The premise that each of the 8 simulations is “equally probable” (abstract) or “equally likely” (introduction) needs more explanation. For example, is the model simulation without carbon-nitrogen coupling as likely as the simulation with this coupling? Are the different datasets equally plausible representations given the details discussed in sections 3.1-3.2? Some discussion on these points would be helpful. While the NBP results (Figure 9) show that the simulations are all within the historical uncertainty range, it is unclear how you would know that a priori to determine which structure, forcings, and observations to sample in an ensemble like this.
There are also a large number of figures and condensing or selecting the most salient results as main text figures would help with length and clarity. For example, do the full timeseries plots of all variables need to be included in the main text? Especially since there are clear groupings of variables that show more sensitivity to, for example, meteorological forcing vs. N cycle. I also felt that some of the ensemble mean figures (i.e., Figures A2-A16) were more interesting than the timeseries plots with all 8 simulations (i.e., Figures 3-9) because they nicely summarize the strongest effects for different variables. In general, the figure organization, number of figures, and placement of figure in main vs. appendix could be improved to highlight the most interesting results.
In general, the presentation quality needs improvement to better communicate the results before I can recommend this paper for publication. I have highlighted some areas in the specific comments below.
Specific comments:
Line 33 and following: Some useful references to include here would be:
- Fisher and Koven 2020, https://doi.org/10.1029/2018MS00145
- Kyker-Snowman et al. 2022, https://doi.org/10.1111/gcb.15894
- Bonan and Doney 2018, https://doi.org/10.1126/science.aam8328
Lines 110-111: Is AMBER available to the community? Here it is listed as “open-source”, but following the link in Seiler et al. (2021b) leads to a dead end: https://cran.r-project.org/web/packages/amber/index.html. Suggest adding an updated link for AMBER to the Code/data availability section in this manuscript.
Lines 177-180: There are other physical processes that could benefit from using a larger number of PFTs, for example, sensible and latent heat flux calculations.
Section 3.1: Here I started to get a little confused with specific land cover datasets. Line 219 states “two observation-based data sets are used”, a remotely-sensed product (assuming that is ESA CCI) and the “LUH product as part of TRENDY”. The next paragraph describes the process of generating land cover data with the “older” GLC 2000 product, with some information from LUH. Figure 1 compares these three datasets and a fourth one which is based on ESA CCI. Then line 337 (section 3.4) states that the two land cover reconstructions used in the model simulations are GLC 2000 and ESA CCI. Please clarify in the text which datasets are used for land cover in this study and how they are used.
Line 255: What land cover data is used for years prior to 1992 in the case of the simulations with ESA CCI?
Lines 349-351: Do the different end dates for the simulations with different meteorological forcings affect the analysis?
Lines 367-368: Some justification for the doubled weighting of S_rmse would be helpful here, even if a brief sentence/reference. One could argue the other scores also have “importance”.
Lines 450-452: This sentence should be rephrased/expanded on since it doesn’t add much on its own. Or it could be removed, as Table 3 summarizes the differences in cv values.
Line 529: Curious why the simulations show a land carbon source in the 1930s? Is that realistic?
Line 571-573: Is there a reason to include the SG250m dataset here since the model compares better with HWSD?
Line 574 and following: More discussion of Figure 11 is needed – there are a lot of model/data comparisons here that are summarized very briefly as “the model is overall able to capture the latitudinal distribution of most land surface quantities”. For example, the aboveground biomass observations are very different from each other, and different from the model spread.
Line 599: The fact that the interactive N cycle degrades model performance for certain variables is an interesting result that merits some discussion. Some readers may be surprised that something that is essentially a model improvement for more realistic process representation doesn’t necessarily improve performance.
Lines 601-602: Thanks for including the full AMBER results. This sentence and link should probably be moved (or repeated) in the Code/data availability section.
Conclusions section: The first two paragraphs of this section could benefit from linking back to specific results in this study with the results placed in the context of other studies (as is done in the third paragraph). Especially for the second paragraph, since model tuning was not covered in detail in the introduction.
Line 642 and following: Curious why the effect of the interactive N cycle is discussed here but the other factors are not?
Code/data availability: There are no references to the code/data used in this manuscript (e.g., the simulation output, how to access the observational datasets, or the code used to generate the analysis and figures.)
Table 2: Suggest also grouping by variables, so it is easy for the reader to see which variables have multiple globally gridded and/or in situ sources. This relates to the calculation of benchmark scores in Lines 383-385 where “at least two sets of observation-based data for a given quantity” are needed.
Table 3: Suggest adding the dominant source(s) of spread for each variable (e.g., met forcing, land cover data, N cycle) to summarize that information across variables. Figure 12 does some of this, but it could be improved for presentation quality as noted below. In addition, there are 14 variables listed in this table, while Figure 12 includes 16 variables and the text mentions 19 variables used for benchmarking and calculating scores. Is there a reason for these differences?
Figure 3 (and following analogous figures): Suggest specifying the exact years shown in these timeseries plots. I believe the end years are different for the different metrological forcings (e.g., 2016 vs. 2019) but it is difficult to see because the lines are very small. Also, please describe in the figure caption what the numbers are in the upper right part of the plot. What is the difference between the bold colored lines and the lighter/less bold lines?
Figures 3-5 (and A2-A5): The data in these figures for 1701-1900 is very repetitive, given the fact that these years use the meteorological forcing from 1901-1925 repeated. The timeseries plots could be shortened to show only 1900 onwards to focus on the most interesting parts of the historical timeseries.
Figure 9: Please add something about the TRENDY models / grey boxes in panel a) to the caption here.
Figure 10: Some additional explanation (in figure caption or text) on how the horizontal and vertical whiskers were calculated would be helpful.
Figure 11: The colors are confusing here. The caption says the model mean is in “dark purple” but it looks more like magenta/purple-red and the dark purple line looks like it is showing an observational dataset (e.g., GEOCARBON in panel a)). Line 577 lists additional colors in regard to this figure. Suggest adding dashes to the observational lines to better distinguish from the model and avoid relying on interpreting specific color choices. Please also explain the box plots in the figure caption.
Figure 12: This figure is a helpful summary of the results, but it needs improvement for presentation quality. For example, the “GLC2000-ESACCI”, etc. does not need to be repeated down the column and may not even need to be included since the orange shading denotes that section as “Effect of land cover”. The average scores could be incorporated visually to “provide context” (which was somewhat unclear from the figure caption). Please explain the error bars in the figure caption.
Technical corrections:
Line 87: Community Land Model should be capitalized.
Line 93: Tian et al. (2004) used CLM2 coupled to CAM2, whereas Lawrence and Chase (2007) used CLM3 within CCSM3. Suggest rephrasing this to clarify.
Line 107: Should be “simulated” instead of “simulate”.
Lines 353-354: Missing units for grid size (km?).
Line 407: Here the supplemental figures started being referred to as Figure SX instead of Figure AX – please adjust to match.
Lines 518-520: Move “or the net atmosphere-land CO2 flux” up to the first mention of NBP/Figure 9 since Figure 9 uses the latter term and not NBP.
Line 546: Should be “differences” instead of “difference”.
Lines 622-624: Check figure references here, I believe both are incorrect.
Line 624: Should be “20 years” instead of “20-year”.
Citation: https://doi.org/10.5194/egusphere-2022-641-RC2 -
AC2: 'Reply to Referee #2', Vivek Arora, 14 Sep 2022
Reviewer # 2
We thank Referee #2 for their helpful comments. Our replies to their comments are shown in bold below.
Arora et al. evaluate land model uncertainty using an ensemble of simulations with different model structure, forcings, and observations. This type of model study is useful to understanding quantities like the land carbon sink in the context of these uncertainties. The results show that biogeophysical variables like runoff and sensible heat flux are most impacted by meteorological forcing, while biogeochemical variables like vegetation biomass are most impacted by having an interactive nitrogen cycle. This is not necessarily surprising, but useful to have summarized here. The results on net atmosphere-land CO2 flux being independent of land carbon state are interesting and could be highlighted more visually. The benchmarking is also useful, and hopefully the AMBER tool can be shared more widely.
Thank you for your overall positive feedback. That the net atmosphere-land CO2 flux is largely independent of the land carbon state is downplayed by referee #1 and they suggested that this conclusion is overstated. While our framework studies land cover uncertainty, it does not take into account land use change (LUC) uncertainty (as noted by referee #1) and therefore we would like to both clarify this caveat to our conclusions and err on the side of caution and not highlight this conclusion more visually.
The premise that each of the 8 simulations is “equally probable” (abstract) or “equally likely” (introduction) needs more explanation. For example, is the model simulation without carbon-nitrogen coupling as likely as the simulation with this coupling? Are the different datasets equally plausible representations given the details discussed in sections 3.1-3.2? Some discussion on these points would be helpful. While the NBP results (Figure 9) show that the simulations are all within the historical uncertainty range, it is unclear how you would know that a priori to determine which structure, forcings, and observations to sample in an ensemble like this.
If given the opportunity to revise our manuscript, we will expand on this. This is indeed the conundrum, it is not known a priori which model structure, forcing, and observation to sample. It is difficult to conclude which meteorological driving data are more reliable, which land cover is more realistic (as the large spread in the vegetated, tree, and grass area from TRENDY models illustrates), and which model version is indeed better. Hence, the conclusion is that an ensemble-based approach allows a more robust evaluation of a model.
In our case, some general conclusions can be made. For example, we have more confidence in the ESA-CCI based land cover than in the GLC2000 based land cover because the reclassification of 37 ESA-CCI land cover classes to CLASSIC’s nine plant functional types has been more thoroughly vetted against high-resolution land cover data compared to the GLC2000. We will, however, remove the phrase “equally probable” since it is difficult to defend and include discussion around this when revising our manuscript.There are also a large number of figures and condensing or selecting the most salient results as main text figures would help with length and clarity. For example, do the full time-series plots of all variables need to be included in the main text? Especially since there are clear groupings of variables that show more sensitivity to, for example, meteorological forcing vs. N cycle. I also felt that some of the ensemble mean figures (i.e., Figures A2-A16) were more interesting than the time-series plots with all 8 simulations (i.e., Figures 3-9) because they nicely summarize the strongest effects for different variables. In general, the figure organization, number of figures, and placement of figure in main vs. appendix could be improved to highlight the most interesting results.
Swapping figures in the appendix with those in the main text is also suggested by referee #1. We will swap these figures, condense the total number of figures by combining zonal and time series plots for a given variable, and drop some less interesting variables.In general, the presentation quality needs improvement to better communicate the results before I can recommend this paper for publication. I have highlighted some areas in the specific comments below.
Specific comments:
Line 33 and following: Some useful references to include here would be:
Fisher and Koven 2020, https://doi.org/10.1029/2018MS00145
Kyker-Snowman et al. 2022, https://doi.org/10.1111/gcb.15894
Bonan and Doney 2018, https://doi.org/10.1126/science.aam8328
Thank you for pointing out these additional references which we will include.Lines 110-111: Is AMBER available to the community? Here it is listed as “open-source”, but following the link in Seiler et al. (2021b) leads to a dead end: https://cran.r-project.org/web/packages/amber/index.html. Suggest adding an updated link for AMBER to the Code/data availability section in this manuscript.
We have now replaced the CRAN link with the following link using Zenodo and will include this when revising our manuscript.
https://doi.org/10.5281/zenodo.5670387
The link provides the source code as well as the scripts required for reproducing the computational environment, which takes care of all dependencies with other R-packages.
Lines 177-180: There are other physical processes that could benefit from using a larger number of PFTs, for example, sensible and latent heat flux calculations.
Yes, in theory, a larger number of PFTs should allow the modelling of PFT-dependent processes more realistically. However, for CLASSIC (and for other LSMs too) latent heat flux is primarily a function of available energy and precipitation. In CLASSIC, large changes in leaf area index (LAI) do not change total latent heat flux considerably since the partitioning of evapotranspiration into its sub-components (transpiration, soil evaporation, and evaporation/sublimation of intercepted rain/snow) changes. For example, a decrease in transpiration and evaporation of intercepted precipitation, due to a decrease in LAI, is compensated by an increase in soil evaporation. As such then biogeochemical processes benefit more in terms of realism than physical processes when the number of PFTs is increased. We will add this clarification around these sentences when revising our manuscript.Section 3.1: Here I started to get a little confused with specific land cover datasets. Line 219 states “two observation-based data sets are used”, a remotely-sensed product (assuming that is ESA CCI) and the “LUH product as part of TRENDY”. The next paragraph describes the process of generating land cover data with the “older” GLC 2000 product, with some information from LUH. Figure 1 compares these three datasets and a fourth one which is based on ESA CCI. Then line 337 (section 3.4) states that the two land cover reconstructions used in the model simulations are GLC 2000 and ESA CCI. Please clarify in the text which datasets are used for land cover in this study and how they are used.
The methodology for generating land cover from 1700 to the present day requires a remotely sensed snapshot of present-day land cover (e.g. GLC 2000 (which represents the year 2000) and ESA CCI (1992-2018) land cover products with their 20-40 land cover classes that need to be remapped/reclassified to model’s nine PFTs) and an estimate of the change in crop area over the historical period (1700-2018) (LUH). These are the three data sets used in our study. Figure 1 in the manuscript shows the model’s total vegetated, tree, and grass area when using the GLC 2000 (blue line) and ESA CCI (red line) land cover products compared against Li et al. (2018) (dotted black line) and other TRENDY models (grey lines). We can see how this can be confusing and will clarify this when revising our manuscript.
Line 255: What land cover data is used for years prior to 1992 in the case of the simulations with ESA CCI?
As mentioned above the 1992-2018 ESA CCI land cover provides a snapshot of the present-day land cover. Although there is some interannual variability between these years overall the total vegetated area doesn’t change much. We chose the year 1992 from these data, reclassified its 37 land cover classes to CLASSIC’s nine PFTs, replaced the crop PFTs for the present day with those from the LUH product, adjusted the natural PFTs accordingly, and then went back in time to 1700 using crop area from the LUH data. This yields a reconstruction of the historical land cover with CLASSIC’s nine PFTs based on the ESA CCI land cover and crop area changing over the historical period from the LUH product. We will clarify this when revising our manuscript.
Lines 349-351: Do the different end dates for the simulations with different meteorological forcings affect the analysis?
No, they do not. However, we will make the time period the same for a consistent comparison.Lines 367-368: Some justification for the doubled weighting of S_rmse would be helpful here, even if a brief sentence/reference. One could argue the other scores also have “importance”.
We agree that the decision to give twice as much weight to S_rmse is somewhat subjective. This follows from Collier et al. (2018) but we will make a note of subjectiveness in this decision.
Collier, N. et al. 2018. “The International Land Model Benchmarking (ILAMB) System: Design, Theory, and Implementation.” Journal of Advances in Modeling Earth Systems 10 (11): 2731–54.
Lines 450-452: This sentence should be rephrased/expanded on since it doesn’t add much on its own. Or it could be removed, as Table 3 summarizes the differences in cv values.
We will remove this sentence when revising our manuscript.
Line 529: Curious why the simulations show a land carbon source in the 1930s? Is that realistic?
In Figure 9a, before about 1970 the model simulates both a land carbon sink and source in response to interannual variability in meteorological data.Line 571-573: Is there a reason to include the SG250m dataset here since the model compares better with HWSD?
There are multiple observations available for certain variables. There are times when it is obvious which observation-based data set is better or more appropriate but there are times when it is not. In the case of soil carbon, since CLASSIC and most land models do not include processes to represent peatland and permafrost carbon at high latitudes, it is clear that the HWSD data set is more appropriate for comparison with the model output. The idea behind using both data sets is to illustrate this concept and we will clarify this.Line 574 and following: More discussion of Figure 11 is needed – there are a lot of model/data comparisons here that are summarized very briefly as “the model is overall able to capture the latitudinal distribution of most land surface quantities”. For example, the aboveground biomass observations are very different from each other, and different from the model spread.
Thank you for pointing this out. We will discuss all panels of Figure 11 in the revised manuscript. In the context of aboveground biomass, the GEOCARBON data set uses two products, one for the extratropics and the other for the tropics to create a global aboveground biomass product. The Zhang product is based on 10 biomass maps. Both products are described in detail in section 2.3.3. of the following paper. We will include this information when revising our manuscript.
Seiler, C., et al. (2022) Are terrestrial biosphere models fit for simulating the global land carbon sink? Journal of Advances in Modeling Earth Systems, p.e2021MS002946. https://doi.org/10.1029/2021MS002946.Line 599: The fact that the interactive N cycle degrades model performance for certain variables is an interesting result that merits some discussion. Some readers may be surprised that something that is essentially a model improvement for more realistic process representation doesn’t necessarily improve performance.
Yes, we will include a discussion about why the inclusion of the N cycle degrades the model performance for some variables. The inclusion of the N cycle changes the maximum photosynthetic rate (Vcmax) to a prognostic variable for each PFT as opposed to being specified based on observations. This is analogous to running an atmospheric model with specified sea surface temperatures (SST) and sea ice concentrations (SIC) as opposed to using a full 3D ocean. Using a dynamic ocean allows future projections (since future SSTs and SICs are not known) but invariably degrades a model’s performance for the present day since simulated SSTs and SICs will have their biases. Similarly, using an interactive N cycle allows to project future changes in Vcmax (based on changes in N availability) but also degrades CLASSIC’s performance for the present day since simulated Vcmax has its own biases.Lines 601-602: Thanks for including the full AMBER results. This sentence and link should probably be moved (or repeated) in the Code/data availability section.
We will move the AMBER link to the Code/data availability section.Conclusions section: The first two paragraphs of this section could benefit from linking back to specific results in this study with the results placed in the context of other studies (as is done in the third paragraph). Especially for the second paragraph, since model tuning was not covered in detail in the introduction.
Thank you for this suggestion. We will cover model tuning early on in the manuscript and link it to the second paragraph in the conclusions.Line 642 and following: Curious why the effect of the interactive N cycle is discussed here but the other factors are not?
The effect of the interactive N cycle is discussed explicitly to outline model limitations and room for improvement when the interactive N cycle is switched on. We will also include a discussion of meteorological forcings and land cover when revising our manuscript.Code/data availability: There are no references to the code/data used in this manuscript (e.g., the simulation output, how to access the observational datasets, or the code used to generate the analysis and figures.)
The model code and documentation are available at https://cccma.gitlab.io/classic_pages/ and this is mentioned in the manuscript. Observation-based data sets are available from their respective sources. If providing model output and the scripts used for analysis is a requirement for Biogeosciences we will upload these to Zenodo and provide a DOI.Table 2: Suggest also grouping by variables, so it is easy for the reader to see which variables have multiple globally gridded and/or in situ sources. This relates to the calculation of benchmark scores in Lines 383-385 where “at least two sets of observation-based data for a given quantity” are needed.
Thanks for your suggestion. We will group by variable and reorganize Table 2 according to whether the data are globally gridded and/or in situ, or make note of this in an additional column.Table 3: Suggest adding the dominant source(s) of spread for each variable (e.g., met forcing, land cover data, N cycle) to summarize that information across variables. Figure 12 does some of this, but it could be improved for presentation quality as noted below. In addition, there are 14 variables listed in this table, while Figure 12 includes 16 variables and the text mentions 19 variables used for benchmarking and calculating scores. Is there a reason for these differences?
Indicating the dominant source of spread for each variable is a good suggestion. The reason for the different numbers of variables in Table 2 and Figure 12 is that the variables are bundled slightly differently. In Table 2 heterotrophic and autotrophic respiration are evaluated separately but in Figure 12 they are bundled in ecosystem respiration for comparison with observations. In addition, some variables are repeated in Figure 12 (e.g. ecosystem respiration and net ecosystem exchange) because their scores are statistically different when evaluating the effect of land cover, meteorological forcing, and an interactive N cycle. We will make note of this when revising our manuscript. The full list of 19 variables also includes the albedo and leaf area index. We did not include albedo since it shows very little variability across the eight simulations (cv=0.007) and the leaf area index is somewhat similar to vegetation biomass. We will add these two variables to Table 2 for clarity and completeness. In the current manuscript Figures A18 and 11 do compare zonally-averaged values of albedo and leaf area index, respectively, with observations. We will clarify this when revising our manuscript and when discussing Figure 11 in more detail.
Figure 3 (and following analogous figures): Suggest specifying the exact years shown in these timeseries plots. I believe the end years are different for the different metrological forcings (e.g., 2016 vs. 2019) but it is difficult to see because the lines are very small. Also, please describe in the figure caption what the numbers are in the upper right part of the plot. What is the difference between the bold colored lines and the lighter/less bold lines?
We will make the time periods consistent for comparison for GSWP3 and CRU-JRA driven runs, and make it explicitly clear that the GSWP3 and CRU-JRA end in different years in the figure caption. We will also mention in the figure caption that the thin lines are individual years and the thick line is their 10-year running mean. We will also clarify the numbers in the upper right part of the plot (which are the mean from 1700-1720, the mean over the last 20 years of the historical period, and the difference between these values).
Figures 3-5 (and A2-A5): The data in these figures for 1701-1900 is very repetitive, given the fact that these years use the meteorological forcing from 1901-1925 repeated. The timeseries plots could be shortened to show only 1900 onwards to focus on the most interesting parts of the historical timeseries.
Yes, we agree to make this change.Figure 9: Please add something about the TRENDY models / grey boxes in panel a) to the caption here.
We will clarify in the figure caption that the grey boxes are the estimates based on the Global Carbon Project.Figure 10: Some additional explanation (in figure caption or text) on how the horizontal and vertical whiskers were calculated would be helpful.
The vertical whiskers show the range of eight model scores when a given variable from all eight model simulations is compared to an observation-based data set. The horizontal whiskers show the range when three or more observation-based are compared to each other. When only two observation-based data sets are compared to each other there is only one benchmark score, and therefore there is no range. We will clarify this when revising our manuscript.
Figure 11: The colors are confusing here. The caption says the model mean is in “dark purple” but it looks more like magenta/purple-red and the dark purple line looks like it is showing an observational dataset (e.g., GEOCARBON in panel a)). Line 577 lists additional colors in regard to this figure. Suggest adding dashes to the observational lines to better distinguish from the model and avoid relying on interpreting specific color choices. Please also explain the box plots in the figure caption.
We will modify these figures to make the colours more obvious and/or use lines with different patterns.Figure 12: This figure is a helpful summary of the results, but it needs improvement for presentation quality. For example, the “GLC2000-ESACCI”, etc. does not need to be repeated down the column and may not even need to be included since the orange shading denotes that section as “Effect of land cover”. The average scores could be incorporated visually to “provide context” (which was somewhat unclear from the figure caption). Please explain the error bars in the figure caption.
We will modify Figure 12 to remove GLC2000-ESACCI and other similar wordings. We will also think about how best to incorporate the average scores visually in the figure. The error bars denote the 95% confidence interval. We will add this info to the figure caption.Technical corrections:
Line 87: Community Land Model should be capitalized.
Line 93: Tian et al. (2004) used CLM2 coupled to CAM2, whereas Lawrence and Chase (2007) used CLM3 within CCSM3. Suggest rephrasing this to clarify.
Line 107: Should be “simulated” instead of “simulate”.
Lines 353-354: Missing units for grid size (km?).
Line 407: Here the supplemental figures started being referred to as Figure SX instead of Figure AX – please adjust to match.
Lines 518-520: Move “or the net atmosphere-land CO2 flux” up to the first mention of NBP/Figure 9 since Figure 9 uses the latter term and not NBP.
Line 546: Should be “differences” instead of “difference”.
Lines 622-624: Check figure references here, I believe both are incorrect.
Line 624: Should be “20 years” instead of “20-year”.
Thank you for noting these minor corrections. We will incorporate these when revising our manuscript.
Citation: https://doi.org/10.5194/egusphere-2022-641-AC2