Modeling soil organic carbon dynamics in temperate forests with Yasso07

. In a context of global changes, modeling and predicting the dynamics of soil carbon stocks (CSs) in forest ecosystems are vital but challenging. Yasso07 is considered to be one of the most promising models for such a purpose. We examine the accuracy of its prediction of soil carbon dynamics


Introduction
The current global carbon stock (CS) in soils, including forest litter and peatlands, is 1500 to 2400 GtC and thus greatly exceeds stocks in vegetation, found mainly in forests (350 to 550 GtC) and in the atmosphere (829 GtC in 2011;IPCC, 2014). Soils share a common interface with all the other spheres and play a key role in driving the global carbon cycle. Soil CS dynamics are directly related to the greenhouse gas emissions (notably carbon dioxide; CO 2 ) that are leading to the global warming effect (IPCC, 2014). An accurate estimation of soil CS dynamics would allow us to better understand the turnover rate and fate of soil carbon flux at both local and global geographical scales. In the context of global changes, accurate estimation is essential in evaluating the climate change mitigation potential of forests and supporting environmental policy decisions.
Significant challenges exist when attempting to accurately estimate changes in soil CSs. Current soil monitoring networks are generally not able to detect changes on timescales of less than 10 years (Saby et al., 2008). To obtain estimates for changes in soil CSs at shorter intervals, as is, for example, required for annual reporting to the United Nations Framework Convention on Climate Change and the Kyoto Protocol, using models is encouraged (IPCC, 2011). Numerous models have been elaborated to evaluate soil carbon dynamics (Manzoni and Porporato, 2009). The vast majority of terrestrial soil carbon models developed at the global or the plot scale (e.g., CENTURY in Parton et al., 1987;RothC in Coleman and Jenkinson, 1996;and ORCHIDEE in Krinner et al., 2005) assume that decomposition is the first-order decaying process, which accounts for the size of soil carbon pools. However, the assumption has been criticized, and it has been argued that a priming effect and the associated carbon pool interactions should also be considered in model algorithms (Wutzler and Reichstein, 2013). The dynamics of carbon pools depend on the quantity and quality of litter inputs and on temperature, soil moisture and other soil parameters, e.g., texture, structure, chemical richness, pH, etc. (Todd-Brown et al., 2012). Incorporating explicit mechanisms such as microbial activities or carbon protection by the soil matrix into soil carbon models has repeatedly been suggested in recent years (Schmidt et al., 2011;Lehmann and Kleber, 2015). However, for forest ecosystems, refined mechanistic input data often remain limited. Accordingly, the typical time step for litter input demanded by most forest soil carbon models is yearly, rather than monthly (but see RothC, Coleman and Jenkinson, 1996) or daily (but see Romul in Chertov et al., 2001;Didion et al., 2016). At this yearly timescale, it is common to consider microbial communities and processes to be relatively stable factors (Todd-Brown et al., 2012); in this case, the assumption that carbon dynamics are governed by first-order decay may therefore be reasonable. This is the choice made by the group who built the Yasso (Liski et al., 2005) and Yasso07 (Tuomi et al., 2009(Tuomi et al., , 2011a models. Yasso07 is an improved version of Yasso with more refined carbon pooling and abundant data for calibration. The model developers' intention was to make their models suitable for general forestry applications by taking into account the limited availability of forest soil and litter data (Liski et al., 2005). Yasso07 explicitly defines several pools of chemical compounds in litter carbon (Tuomi et al., 2011b) and possesses well defined, biologically meaningful and measurable parameters. Thanks to these qualities, Yasso or Yasso07 has been applied in more than 150 case studies (https://en.ilmatieteenlaitos.fi/yasso-publications, last access: 21 April 2019) in forest ecosystems in the Northern Hemisphere, with generally high satisfaction levels when compared with measured carbon values (e.g., Karhu et al., 2011;Rantakari et al., 2012;Ortiz et al., 2013;Lu et al., 2015;Wu et al., 2015). Yet so far most of these applications have been limited to local case studies, especially in cold forests with limited tree species diversity (e.g., boreal or montane forests). Rarely have previous studies validated Yasso07 based on data (i) from long-term observations (here defined as >10 years), (ii) from temperate forests with a much higher diversity of tree species or (iii) on changes in CSs (in tC ha −1 yr −1 ). This is partially due to the lack of extensive long-term soil carbon monitoring in forest ecosystems, which differs in climatic and soil conditions and species and stretches over large territorial scales. Nevertheless, Yasso07 is considered to be one of the potentially appropriate models for evaluating national and continental inventories of the forest carbon balance in Europe (Hernández et al., 2017). It is therefore of considerable interest to assess the ability of Yasso07 to reflect the carbon balance in different European forest ecosystems at large spatio-temporal scales. Moreover, as a carbon-pool-based model, Yasso07 shares certain principles with other prevailing soil carbon models in the same genre (e.g., RothC, CENTURY, etc.). Applying Yasso07 as an example model in this case study may also allow us to improve future carbon modeling for temperate forests in general.
The recorded field data for CSs and litter quantity dynamics from the RENECOFOR network (http://www.onf.fr/ renecofor/@@index.html, last access: 21 April 2019), National Forests Office (ONF), France, offered us a valuable opportunity for model validation. The 101 forest sites included in this study are located all over the French metropolitan territory and cover the most common forest types and tree species. For each site, annual measurements of litterfall were available in addition to two inventories of soil organic CSs with an average interval of 15 years (minimum of 12 years and maximum of 20 years). These data allowed us to use site-specific observed soil CSs and above-ground litterfall dynamics as model input data. Approximations in model input data have been identified as a major source of uncertainty for estimates in models for changes in soil CSs (Ortiz et al., 2013). By ensuring solid input data, we were able to min-imize this source of uncertainty and focus on the inherent model structure.
We hope to contribute to the further development of soil carbon modeling by (i) testing and characterizing the ability of Yasso07 to model soil CS dynamics for temperate forests, (ii) identifying limitations and providing suggestions for a better adaptation of the model for C dynamics in both deciduous and evergreen temperate forests, and (iii) discussing the perspectives based on the current state of the art in soil carbon modeling. Associated with the above aims, our null hypotheses are as follows: (i) Yasso07 predicts accurate and unbiased CS changes at the national scale, and (ii) the model's fit residuals (predicted data minus observed data) have null relationships with site characteristics (e.g., location, climate, forest type, soil type and initial CS).

The Yasso07 model
The Yasso07 dynamic soil carbon model is based on the general assumption that the soil CS is driven by the decomposition of different litter types, which may differ in quantity and quality and by climatic conditions. Litter carbon quality is represented by four chemical compound groups with different decomposition rates (Tuomi et al., 2009). Soil organic carbon is divided into these four relatively labile carbon pools and one recalcitrant pool called "humus" (H) (Fig. S1 in the Supplement). The five pools differ in specific mass loss rates and mass flows. As in many other pool-based models, the H pool is considered to be the oldest and most stable carbon pool, although recent studies have thrown doubt on its stability and even its physical existence (see Lehmann and Kleber, 2015). Some mass flows correspond to CO 2 release (microbial respiration). The mean residence time of carbon in these pools varies, lasting several months (i.e., water soluble compounds -W), a few years (i.e., acid-hydrolyzable compounds -A; non-polar solvent, ethanol or dichloromethane compounds -E), several decades (i.e., non-soluble and nonhydrolyzable compounds -N) or even several centuries (i.e., H).
Mathematically, the kernel equation of Yasso07 can be written as follows: where symbols in bold capital letters denote either vectors or matrices, while those in small letters in parentheses denote scalars, X (t) is the vector describing the masses of the five carbon pools (A, W, E, N and H) at time t,Ẋ(t) is the vector describing carbon mass changes in soil at time t, A p is the mass flow matrix describing carbon allocation among pools, K(c) is the decomposition matrix describing the decomposition rates as a function of climatic conditions (c), and I (t) is litter input to the soil and is equal to 0 for the last pool, since "H" does not exist in litter form (Eq. 1a) and can be expressed in a more detailed form. This form is where p F →T is the relative mass flow parameter between two pools (from F to T ; F and T can be any two pools among A, W, E, N and H) in the soil (dimensionless, p F →T ∈[0, 1]). Temperature and precipitation are assumed not to affect mass flow p but do influence mass loss rate k i (i = A, W, E, N or H) according to the following: where α i is the mass loss rate parameter of the chemical pool i; and β 1 , β 2 and γ are parameters related to temperature (T ; in • C) and precipitation (P a ; in mm).
To take into account the effect of litter size on the litter decomposition rate, k i was multiplied by a litter size factor (h s ), which makes it possible to distinguish between different types of litter (e.g., foliage, coarse woody debris, stems, etc.), differing in diameter (d; in mm): where ϕ 1 , ϕ 1 and r are parameters related to litter size. Yasso07 has 44 parameters calibrated according to the Markov chain Monte Carlo (MCMC) method with the Metropolis-Hastings algorithm (Tuomi et al., 2011a). Currently, several calibrated parameter sets for Yasso07 are available, including the two most recent sets published by Tuomi et al. (2011) and Rantakari et al. (2012). Compared with the Rantakari (2012) set, the Tuomi (2011) set was calibrated using a wider range of observed foliage and root decomposition data. It is based on a combination of three sources of data: (i) a global dataset (n>9000) of litterbags for mass loss of non-woody litter from approximately 100 sites in Europe and North and Central America covering a wide range of climate and soil conditions, forest types and tree species; (ii) a dataset (n>2000) for mass loss of decomposing woody litter measured in northern Europe; and (iii) measured accumulation rates of soil carbon pools in forest sites along a 5300-year soil chronosequence in southern Finland to determine the residence time of the H carbon pool. The Tuomi (2011) parameter set contains 10 000 parameter vectors (each vector contains the values of all 44 Yasso07 parameters), which are randomly generated to take into account the Figure 1. Geographical distribution of the sites in the RENECO-FOR network used to test Yasso07 performance (see also Jonard et al., 2017). Forested areas are represented in green. Each circle represents one site; the color represents the dominant tree species on the plot. The species abbreviation and number of sites per species are indicated in parentheses. One site (Picea abies 39b) was not used due to data incompletion. stochastic effect. In this study, we adapted the Tuomi (2011) set to the RENECOFOR dataset.

RENECOFOR network
The RENECOFOR network is part of the Level II network of the International Co-operative Programme on Assessment and Monitoring of Air Pollution Effects on Forests (ICP Forests). The 101 sites ( Fig. 1) considered in this study cover the most common types of forest ecosystems in France, including evenly aged forests in plains, pine plantations and unevenly aged mountain forests. They also host most of the tree species in France and central Europe, including Quercus robur, Quercus petraea, Pseudotsuga Menziesii, Picea abies, Fagus sylvatica, Pinus pinaster, Pinus sylvestris and Abies alba. At each forest site, annual woody and non-woody litter quantities are either directly measured or estimated based on existing dendrometric data.

Soil carbon and soil physical and chemical properties
At each site, soil CSs were measured twice at an interval of approximately 15 years (1993-1995 for the first assessment and 2007-2012 for the second one). At each site and for each assessment, soils were sampled to a depth of 0.4 m at five points selected in each of five subplots, and the samples were divided into different layers (0-0.1, 0.1-0.2 and 0.2-0.4 m), including both organic and mineral soil layers. The temporal changes in soil CSs to a depth of 0.4 m were analyzed by Jonard et al. (2017). Composite samples were produced for each layer and subplot then analyzed for mass, bulk density, soil organic carbon, and physical and chemical properties, including texture (proportion of clay, silt and sand; in %); pH value; total nitrogen stock (in t ha −1 ), the carbon : nitrogen ratio (dimensionless); total phosphorus stock (in t ha −1 ); and stocks of exchangeable aluminum (Al), calcium (Ca), potassium (K) and magnesium (Mg; in kmol ha −1 ). We used the soil physical and chemical property data measured during the first assessment (1993)(1994)(1995) for residual analyses (see Sect. 2.7) Regarding the CSs from 0.4-1.0 m in depth, only data from the first assessment (1993)(1994)(1995) were available. Soil samples were obtained from only one soil profile per site at two mineral layers (0.4-0.8 and 0.8-1.0 m). Bulk density and carbon concentrations measured at these layers were used to estimate soil CSs to a depth of 1.0 m. Table 2 summarizes the data source for each of the 101 sites in the RENECOFOR network (http://www.onf.fr/renecofor/sommaire/renecofor/ reseau/20090119-130815-828957/@@index.html, last access: 21 April 2019). More detailed information about each site and the soil sampling procedure is available in Supplement I (Table S1) and Jonard et al. (2017).

Climate data
The climate data required by Yasso07 include annual mean precipitation (mm) and annual maximum, mean and minimum temperatures ( • C). These measured data were obtained from the national Météo-France meteorological stations (http://www.meteofrance.com, last access: 21 April 2019) nearest to each RENECOFOR site.

Litter quantity
Litter input (in tC ha −1 yr −1 ) comes from several sources (see Table 2). We assumed a 0.5 conversion factor between biomass (dry matter) and carbon (Thomas and Martin, 2012).
Above-ground litter input from living trees includes leaves for broadleaves and needles for conifers, small branches, fruits and miscellaneous items (e.g., flowers, buds, etc.). Above-ground litterfall mass was measured annually between 1994 and 2008. For sites where litter quantity data from 1992-1993 and 2009-2012 were lacking, we used the mean litter quantity of all the other years at the same site. The observed branch size in this above-ground category was less than 2 cm (fine branches). Branches and stems bigger than 2 cm due to natural mortality were rare (since they can be salvaged) and were therefore not included in our calculations.
Coarse woody litter input from harvesting residue or storms were estimated from full inventories performed by the ONF since 1991. Missing years of litter input for this category were gap-filled with the average over the period. On average, 3 years are missing per site, though there are considerable differences amongst sites. The mode was 1 year, Biogeosciences, 16, 1955Biogeosciences, 16, -1973Biogeosciences, 16, , 2019 www.biogeosciences.net/16/1955/2019/ and six sites had 10-11 missing years. We assumed that the residues due to harvesting or storms would be coarse branches (> 4 cm in diameter, confirmed with the ONF) based on above-ground tree characteristics. The quantities were estimated on the basis of repeated stand inventory data and species-specific height-girth relationships and biomass. Litter input from stems was set to 0, since in most cases, stem wood was removed from the site after storm damage. Litter input from coarse woody roots was considered to be equal to total root biomass, which was estimated through metaanalysis-based allometric equations proposed by Cairns et al. (1997). More detailed information about forest inventories and storm events occurring at each site is available in Supplement I (Table S1). Litter input from fine roots (here defined as roots of ≤ 5 mm in diameter), especially the finest ones with a diameter ≤ 2 mm, can significantly contribute to carbon sequestration in soils (Brunner et al., 2013;Kögel-Knabner et al., 2002;Berg and McClaugherty, 2008). Fine root litter was assumed to be proportional to that of foliage, which was measured in the RENECOFOR sites (see the following paragraph). Jonard et al. (2017) suggested using the generic equation published by Raich and Nadelhoffer (1989) and, simultaneously, adopted the hypothesis that fine-root litter production represents about one-third of the carbon allocated to roots (Raich and Nadelhoffer, 1989): (4) where I fine root and I foliage are the litter input of fine roots and foliage, respectively (in tC ha −1 yr −1 ).
The relationship between fine-root and foliage litter inputs can be highly variable depending on tree species, stand characteristics and climate, and the generic equation may not reliably represent such variability. To counter this, we carried out a sensitivity analysis to investigate the response of the model fit to the choice of fine-root-to-foliage ratio varying from 0.1 to 4.0 (see Sects. 2.6 and 3.2). Yet, when applying the equation of Raich and Nadelhoffer (1989;Eq. 4) over all the RENECOFOR sites, we found that fine-root-to-foliage ratios had a median of 1.0 and a mean of 1.0-1.1 for both coniferous and broadleaved sites (Fig. S2). Hence, we chose to use the 1.0 ratio over all the RENECOFOR sites to present the outcomes of the model fit and residual analyses from the simulations (see Sect. 3.3). This facilitated our evaluation of site factors (e.g., dominant tree functional type, climatic and soil features) without adding a source of variability introduced by fine-root-to-foliage ratio.

Litter carbon quality
In the RENECOFOR network, there are no measured data for litter carbon quality, defined as the relative amount of litter carbon belonging to four different carbon pools (A, W, E and N). Therefore, we carried out a meta-analysis of the data collected in the literature where authors used chemical fractioning procedures or near-infrared spectroscopy (NIRS) techniques to measure litter carbon quality. The data were restricted to non-tropical areas. Chemical data on litter composed of coarse tree organs (e.g., stems, coarse branches) are relatively scarce, so we used the tree stem wood data compiled in Pettersen (1984), Rowell et al. (2005) and Rowell (2012). These three studies cover a wide range of temperate tree species from North America, Japan and Russia. Data on foliage and root litter carbon quality were taken either from networks such as CIDET (Trofymow, 1998) and LIDET (http://andrewsforest.oregonstate.edu/research/ intersite/lidet.htm, last access: 21 April 2019) or from independent studies in the Northern Hemisphere. The database we used for our meta-analysis is available in Supplement II. The root diameter or branching order can play a significant role in modifying the composition of soil chemical compounds (Fahay et al., 1988;Tingey et al., 2003;Guo et al., 2004). All the measurements included in our meta-analysis on roots refer to fine roots (diameter ≤ 5.0 mm), although in several studies, e.g., Aber et al. (1990), Aulen et al. (2011) and Stump and Binkley (1993), root size was not clearly indicated. Yet we still included the data from the latter studies, as data are less abundant for roots than foliage. The coarse-root data in the literature were too few for a meaningful metaanalysis; we therefore used stem wood values instead.
We then used the resulting litter carbon quality database to describe the quality of litter input at each site in our study. We portioned the litter input into biochemical classes in the following order of priority: (i) values for the target species, when available in the database; (ii) mean values of the species from the same genus, if data for the target species were absent; and (iii) mean values of the species from the same tree functional type (conifers versus broadleaves), if no data were available at either species or genus level for the target species (see Table 1).

Initialization of soil carbon quantity and quality
To initialize Yasso07, both the quantity and the quality of the soil carbon are required. Here, the initial CSs were fixed as the soil CSs measured during the first RENECOFOR soil carbon assessment (i.e., model input). Measurement uncertainties of the soil CSs were not considered to be a source of the stochastic effect when Yasso07 was fed, as we were more interested in the output uncertainties related to the model per se (i.e., the choice of the model's parameter set) and in carbon quality settings in the model initialization (see below).
Soil carbon quality, defined as the relative amount of soil carbon in pools A, W, E, N and H in relation to their sum, can be initialized in two ways: with a complete steady-state assumption or with a partial, or transient, steady-state assumption. The classical approach is based on the assumption that carbon quality at the initial state is identical to that at the complete steady state, which can be calculated with the analytical matrix inversion approach based on Eq. (1a). For Table 1. Litter carbon quality of the species present in the French RENCOFOR network, estimated based on the literature. In the column "Case," each number corresponds to one case of data availability in the literature: 1at least one dataset of complete chemical composition (i.e., for AWEN) exists at species level; 2at least one dataset of incomplete chemical composition (only for A, N and the sum of W and E) exists at species level, and in this case, the mean proportion of W and E at genus level is used; 3no data are available at species level, but at least one complete dataset of chemical composition exists at genus level; 4no data are available at species level, but at least one incomplete dataset of chemical composition exists at genus level, and in this case, the mean proportion of W and E at tree functional type level is used; 5no data are available at either species or genus level, and in this case, the mean AWEN composition at tree functional type level is used. Cases 1 to 5 are in descending order of priority. Note: Aacid-hydrolyzable compounds; Wwater soluble compounds; Enon-polar solvent, ethanol or dichloromethane compounds; NAnot available. Biogeosciences, 16,[1955][1956][1957][1958][1959][1960][1961][1962][1963][1964][1965][1966][1967][1968][1969][1970][1971][1972][1973]2019 www.biogeosciences.net/16/1955/2019/ Table 2.
A summary of the data used for Yasso07 simulations in the present study. In the "year" columns, the following abbreviations are used: M -measured data; E -estimated data based on measured data; and 0 -noted, but the contribution to litter is negligible. For soil CS measurements, zones in dashed line denote the inventory duration. For each year, each symbol (M and E) only accounts for the general case, hence it is possible that measurements were occasionally omitted at some sites. * Litter input caused by harvest or storms were included (after they had occurred), SD is standard deviation, and litter input is dry matter. Diameters used for defining each litter type are the following: ≤ 2 cm for fine branches, > 4 cm for coarse woody branches, > 5 mm for coarse woody roots and ≤ 5 mm for fine roots.
The estimated carbon quality of the steady-state CS X (t s ) to the depth of 1.0 m (also noted as CS steady-state ; in tC ha −1 ) was then applied to the observed CSs to split it into various carbon pools.
The complete steady-state assumption is commonly used in the literature despite considerable controversy, since the assumption does not take the difference in stabilization among the various pools into account (Elliot et al., 1996;Foereid et al., 2012). Soil carbon pools (especially those at sites that have undergone disturbances in recent centuries) may not have achieved a complete steady state but may still be in a transient or partial steady state. In such states, the slow-cycling pools can still be accumulating carbon, while the relatively rapid-cycling pools have already recovered a dynamic equilibrium (Wutzler and Reichstein, 2007). In this study, we equally adopted the partial steady-state assumption to mimic such a circumstance. More precisely, we assumed that the rapid-cycling pools such as A, W and E were at steady state at the first soil survey, while the slow-cycling N and H pools might not yet have reached the steady state. Accordingly, we directly considered the steady-state CS obtained from matrix inversion for A, W and E, but we revised amounts for the N and H pools by calculating the difference between estimated and observed CSs to a depth of 1.0 m. In most cases, the sum of steady-state A, W, E and N was lower than the observed CS; the revised H was then equal to the difference between the latter and the former. Very occasionally, the sum of steady-state A, W, E and N could be greater than the observed CS; the revised N was then calculated as the difference between the sum of steady-state A, W and E and the observed CSs and pool H was forced to zero. The new carbon quality, corresponding to the proportions among the steady-state A, W and E pools and the revised N and H pools, was used to split the observed CS into five pools in real simulations.

Sensitivity analyses on the impact of initial soil and litter settings on model output
It is important to gain a general idea of the magnitude of the impact on model output and fit of our choices for initial soil and litter settings in the process of model initialization. To this end, we carried out a sensitivity analysis to assess how assumptions on carbon quality (complete steady state versus partial steady state) and carbon quantity as a function of In addition, we conducted another sensitivity analysis to fully explore the effects of all the theoretically possible initial soil carbon quality distributions and that of simulated duration on model outputs. We created a virtual site where climatic conditions and litter input were constant and equal to the average values of all the RENECOFOR sites. By fixing its initial soil CS to 100 tC ha −1 , we permuted the initial percentage of the soil carbon pools, with minimum and maximum percentages fixed at 5 % and 80 %, respectively. We used four levels of simulated duration (1, 10, 100, 1000 and 10 000 years) for each combination of soil carbon quality distribution. Based on averaged soil and litter carbon data from the RENECOFOR sites, the simulations were carried out for both broadleaved and coniferous forest types. Here, we present only the results for the virtual broadleaved stands, as the results between conifers and broadleaves did not change much, especially over the long term.

Running Yasso07 and statistical analyses
We used the same FORTRAN code as Yasso07 version 1.0.1 in Didion et al. (2014) for all the model simulations. For each type of analysis (both RENECOFOR site-specific and sensitivity analyses), we conducted 10 simulations. In each simulation, one parameter vector was randomly chosen from the 10 000 parameter vectors.
For each site, we calculated ACC (in tC ha −1 yr −1 ), i.e., the difference in CSs between the two national RENECO-FOR assessments standardized by the temporal interval (t 2 − t 1 ), as follows: ACC obs = CS obs,t 2 − CS obs,t 1 / (t 2 − t 1 ) ACC sim = CS sim,t 2 − CS obs,t 1 / (t 2 − t 1 ) , where, CS sim,t 2 , CS obs,t 2 and CS obs,t 1 are, respectively, the simulated CS to a depth of 1.0 m at year t 2 and the observed CS at years t 2 and t 1 , which are around 1994 and 2010, depending on the site. To compute ACC sim (Eq. 7), some previous studies used a simulated CS at the starting year instead of an observed one (e.g., Ortiz et al., 2013). In such a case, it is of primary importance to judge a "steady-state year" prior to the starting year for which observed data are available. From the estimated steady-state year, a spin-up or real model simulation is then followed to obtain a simulated CS at the starting year. In our simulations, the observed soil CS at t 1 served as the model input to set initial soil quantity and to calculate ACC (Eq. 7). This allows avoiding such a judgement on steady-state year, which can be sometimes subjective. This allowed us to better focus on the effect of initialized soil carbon quality, for which we calculated both complete and partial steady-state assumptions (see Sect. 2.5).
Two reasons support our choice to compare ACC sim with ACC obs instead of comparing CS sim,t 2 with CS obs,t 2 . First, the parameter sets of Yasso07 were calibrated for a soil depth of 1.0 m, while CS data from the two RENECOFOR assessments were only available to 0.4 m (because no data from 0.4-1.0 m in depth were available from the second assessment). It is therefore reasonable to assume that the observed CS data are not comparable with Yasso07 estimates. However, focusing on carbon changes instead of CSs may largely erase this bias. Indeed, previous studies have evidenced that carbon dynamics are much less active at deep soil layers than at superficial layers Balesdent et al., 2018). Second, ACC indicates if a site is gaining or losing soil carbon and whether this information is sometimes more important than the site's CS value. Using a metric standardized by the year, such as ACC, can also facilitate comparing results in future studies. One exception was for our sensitivity analysis on the effect of initial soil carbon quality (Sect. 2.6), where we chose CS sim,t 2 instead of ACC sim , since the initial soil CS was fixed at 100 tC ha −1 . Despite our primary focus on ACC, we also compared the simulated steady-state CS (CS steady-state , in tC ha −1 ) obtained from the initialization procedure (see Sect. 2.5) with the CS obs,t 1 down to 1 m in depth; this was to check if Yasso07 was able to predict stocks to 1.0 m depth that indeed reached the level of observed stocks (see Fig. S4).
To test the performance of Yasso07 in estimating changes in soil carbon at the RENECOFOR sites, we used an analysis of variance (ANOVA) to analyze the residuals of the changes in carbon, here defined as the difference between the simulated and observed values. The following environmental and biological factors were tested: site geographical location (latitude, longitude and altitude), the climatic condition (temperature and precipitation), soil type, tree functional type and tree species. Before each ANOVA, we tested the normality of the data with a Shapiro-Wilk test. For the sensitivity analyses, we performed loess regressions (Fox and Weisberg, 2019) to characterize the variation in soil CSs as a function of the initial soil CS settings and simulated duration (1-10 000 years). Statistical analyses were performed with R 2.13.0 (R Core Team, 2013).

Litter carbon quality of northern temperate tree species
Our meta-analysis (Fig. 2) showed that the litter carbon quality, i.e., carbon composition, of northern temperate tree species significantly differed between tree organs. For woody litter (stem data alone), the A carbon pool reached up to 80 % of the total carbon pool; the sum of the A and N carbon  pools corresponded to at least 75 % and, in most cases, was greater than 90 %. W and E accounted for only small percentages of the carbon composition (Fig. 2a). Nevertheless, this dominance of A and N over W and E was much less pronounced in foliage and root litter types ( Fig. 2b and c). Generally, the different tree organs can be ranked according to the sum of the proportions of A and N as follows: woody debris (> 90 %) > roots (70 %-80 %) > foliage (60 %-70 %; Fig. 2d). The effect of tree functional type on litter carbon quality strongly interacted with that of the tree organ. For wood, broadleaves and conifers had clearly shifted point clouds for the relationship between A and N carbon pools: there was a greater proportion of A and a lower proportion of N in broadleaves than in conifers. In foliage and root litter, the effect of tree functional type on the proportions of A and W was less pronounced than for wood. The main difference be-tween broadleaves and conifers occurred in N rather than in A (Fig. 2d). Broadleaved litter had a smaller N proportion than coniferous litter, regardless of the tree organ (Fig. 2d). The proportions of A and N relative to those of E and W were quite stable between broadleaves and conifers regardless of the tree organ (Fig. 2d). Figure S3 shows the impact of different settings of litter and carbon quantity and quality on the model fit for the RENECOFOR sites. For soil carbon quality, the partial steady-state assumption ( Fig. S3c and d) achieved significantly better model fits (with lower model root-mean-squareerror) than the complete steady-state assumption ( Fig. S3a and b). Next, we found that model fits were better when an observed CS to 0.4 m in depth ( Fig. S3a and c) was used as the initial carbon quantity than when a CS to 1.0 m in depth was used ( Fig. S3b and d). Nevertheless, it remained more advantageous to use the CS to 1.0 m observed during the first assessment as the model input because Yasso07 is calibrated to predict the CS down to 1.0 m in depth (Rantaraki et al., 2012). Different choices of fine-root-to-foliage ratio for fine-root litter input also significantly influenced Yasso07's performance in predicting changes in soil C (Fig. S3). Ratios of 0.1-0.8 for broadleaves and 1.8-3.0 for conifers achieved the best fits between simulated and observed changes in the soil CS according to different scenarios (Fig. S3). Using a constant value of 1.0 for both broadleaved and coniferous sites seems to be an acceptable compromise between the two tree functional types, even though the choice is not optimal for each functional type taken individually.

Sensitivity analyses on the impact of initial soil and litter settings on model output
As a result of the above diagnoses, we only show fit and residual analysis results for the simulations based on the partial steady-state assumption, observed CS to 1.0 m and fineroot-to-foliage ratio of 1.0 (see Fig. S3d and Sect. 3.3). Figure S4 visualized all the theoretically possible final CSs by varying initial CSs and simulated duration (from 1 to 10 000 years). Initial soil carbon quality had a pronounced impact on final soil organic CSs at the annual and decennial scales. For example, when the initial proportion of the A pool increased from 0 % to 80 %, the final proportion of A could increase by +30 to +40 tC ha −1 (Fig. S4a) and the final total CSs could decrease by approximately −20 to −30 tC ha −1 (Fig. S4u) at annual and decennial scales. When simulations were performed for a millennium timescale, the initial soil carbon quality no longer impact final soil carbon quality. In other words, the same final soil carbon quality was obtained regardless of the initial soil quality (Fig. S4).

Simulated versus observed carbon data
Using only mean litter input, the theoretical CSs (CS steady-state ) simulated from the initialization method and the observed CS obs,t 1 to 1.0 m in depth shared the same order of magnitude and were quite comparable (Fig. S5). However, the CSs were overestimated for most coniferous stands and underestimated for broadleaved stands (Fig. S5).
When simulated ACCs were plotted against the observed ones, the point clouds were distributed around the 1 : 1 diagonal line despite fairly high dispersion (Fig. 3). The correlation between predicted and measured ACC remained weak (R 2 <0.1). The mean observed and simulated ACCs for all sites were, respectively, +0.34±0.06 tC ha −1 yr −1 (+0.20±0.06 tC ha −1 yr −1 for broadleaved stands and +0.48 ± 0.10 tC ha −1 yr −1 for coniferous stands) and +0.00 ± 0.07 tC ha −1 yr −1 (+0.28 ± 0.09 tC ha −1 yr −1 for broadleaved stands and −0.28 ± 0.11 tC ha −1 yr −1 for coniferous stands); 23 % of the broadleaved stands and 39 % of the coniferous stands showed significant differences between observed and simulated ACCs (Fig. 3a). In only approximately 17 % of the sites, ACCs were significantly different from 0 for both simulated and observed results (i.e., case 3 in Fig. 3b). Here, there was a significant effect of tree functional type on the observed and simulated values. The model tended to overestimate ACC in broadleaved stands but to underestimate ACC in coniferous stands. Approximately two-thirds of all the sites showed predicted and observed changes in CSs of the same trend (i.e., data points in zones I, III, IV and VI; Fig. 3), while approximately one-third of the sites were in the remaining zones (II and V) where the predicted trend was contrary to the observed trend. From the residual distribution, we also found that the model where carbon quality was set with the partial steady-state assumption (Fig. 3) had a better fit than the model set with the complete steady-state assumption (Fig. S6).
The simulated ACCs exhibited a negative linear relationship with the initial soil CSs (Fig. 4b), whereas this trend was not found for the observed ACCs (Fig. 4a). Storm damage and soil type could not clearly explain these trends in the residuals. For coniferous stands only, the residuals showed significant differences among the three major types of soil (n of sites > 5): cambisol > luvisol > podzol (Fig. S7). The coniferous stands tended to be younger than the broadleaved stands. Neither tree age nor the interaction between tree age and tree functional type had any significant effect on residuals. For all the sites together, the residuals became higher with increasing latitude, indicating that simulated ACCs were more overestimated in northern zones (analysis of covariance -ANCOVA; F = 11.2, P < 0.001). This pattern was particularly strong for broadleaved stands (Fig. S8a). No similar trend was found for coniferous stands (Fig. S8e). Both residual signs were generally present for all of the main species (Fig. S8b,c,d,f,g and h). Broadleaved and coniferous stands differed in their responses to environmental factors: for coniferous stands, neither temperature nor precipitation had much effect on residuals, while for broadleaves, precipitation was negatively correlated with residuals (AN-COVA, F = 10.8, P < 0.001).
Regarding soil physical and chemical properties, total soil nitrogen stocks were significantly correlated with residuals for both broadleaved and coniferous stands (Fig. 5). Soil texture (proportion of clay and sand) and exchangeable magnesium and potassium were significantly correlated with residuals only for broadleaved stands (Figs. 5 and S9; Table S2). The remaining tested variables, exchangeable aluminum and calcium, pH, total phosphorus and carbon-to-nitrogen ratio, showed no relationships with the residuals (Table S2). . Comparison between simulated and observed changes in annual carbon stocks (ACCs; in tC ha −1 yr −1 ). Circles and triangles represent sites dominated by broadleaves and conifers, respectively. The partial steady-state assumption was used when initializing CS quality to 1.0 m in depth. The fine-root-to-foliage ratio for broadleaves and conifers is 1.0. To facilitate readability, Roman numerals (I-VI) denote the six zones in which data points are distributed. In (a), error bars represent standard errors; hollow and filled points represent non-significant and significant differences, respectively, between simulated and observed ACCs according to the t test (at a 95 % confidence interval). In (b) the case of significance is as follows: 1 -no significant difference from 0 for either observed or simulated ACC; 2 -a significant difference from 0 for either observed or simulated ACC; and 3 -a significant difference from 0 for both observed and simulated ACC.

Agreement between simulated and observed annual changes in soil carbon stock
Testing widely popularized soil carbon models on a large dataset is highly meaningful work that enables researchers not only to assess the model's predictive ability over various climatic and ecosystem types but also to provide lessons and implications for future modeling work. Here, compared with observed CS data to 1.0 m in soil depth from the RENECOFOR network, we found that the simulated stocks (CS steady-state versus CS obs,t 1 ) to 1.0 m showed the same order of magnitude and validated Yasso07's ability to predict average CSs at the scale of the French territory. This solid performance at the national scale supports Yasso's aim to be generalizable and is consistent with previous studies (see Ortiz et al., 2013;Lehtonen et al., 2016;Hernández et al., 2017). Nevertheless, the observed CS to 1.0 m in depth at t 1 already exceeded the CS steady-state for most coniferous stands (Fig. 5S), suggesting, to some extent, that the model parameters were not adapted to the RENECOFOR dataset. Such inadaptability may simply be due to setting an overly high decomposition rate for the slow carbon pools in the model. As the coniferous stands were on average younger being af-forested more recently than the broadleaved stands (Jonard et al., 2017), the model may also not have been able to account for historic land use changes when the soil organic CS was calculated at the steady state. Figure S5 shows that for most broadleaved stands, observed stocks were lower than their CS steady-state , possibly indicating that steady-state equilibrium had not yet been reached at these sites. Furthermore, compared with the average observed ACCs over the 15-year interval between the two assessments, we found that the simulated ACCs were significantly biased for more than one-third of the French RENECOFOR sites. Particularly, Yasso07 generally overestimated the ACC in the broadleaved stands located in the north of France (Fig. S8ad); this overestimation was sometimes exacerbated by lower precipitation. On the other hand, Yasso07 tended to underestimate the ACCs in the coniferous stands. Nevertheless, we expected Yasso07 to perform slightly better in the coniferous stands than in the broadleaved ones, since the model's estimates have shown good correspondence to measurements (of stocks and/or changes) in coniferous forests, especially Nordic boreal forests (e.g., Karhu et al., 2011;Ortiz et al., 2013). Probably due to the younger age of the coniferous stands in our study, the observed ACCs in the coniferous stands were greater than those in the broadleaved stands ( Fig. 3; Jonard et al., 2017). Again, Yasso07 was unable to re- produce this observed effect of tree functional type on ACC, as the model does not take into account changes in land use history, as for the case of steady-state CSs mentioned above.
Except for tree functional type and geographical location (e.g., latitude, which is correlated with climatic variables), qualitative ecological variables that are assumed to be key factors influencing carbon sequestration processes (e.g., soil type -except for coniferous stands, storm damage and stand age range) showed limited ability to explain residuals. Note that these factors were not fully crossed for the 101 sites, making it difficult to test each single factor.
Simulated ACCs showed a strongly negative correlation with the observed initial soil CSs (CS obs,t 1 ), with an overestimation of ACC at sites with lower CS obs,t 1 and an underestimation at sites with higher CS obs,t 1 (Figs. 4 and S9). This is logical in view of the model's inherent mechanism. With in-creasing initial CS, there is an increase in the quantity of the easily decomposable compounds in the soil, i.e., A, W and E, which triggers a more substantial mass loss at a decennial scale. However, the data on observed changes in CSs did not support this trend.
Several quantitative soil physical and chemical properties showed clear correlations (especially for broadleaved stands) with ACC residuals (Fig. 5). Also, in the principle component analyses (Fig. S9), the arrows representing soil variables are slightly closer to the pivoting axis of "initial CS and ACC residuals" than those representing climatic and geographic variables, notably for broadleaved stands. These results highlight the potential interest of incorporating soil properties into new versions of the Yasso model, which currently lacks, or only implicitly incorporates, soil parameters. Indeed, there is considerable evidence that soil physical and chemical properties can greatly influence soil carbon dynamics and storage capacity (Beare et al., 2014;Dignac et al., 2017;Rasmussen et al., 2018).
Despite Yasso07's significant prediction bias at a number of sites, it is unreasonable to simply attribute the bias to the model per se, since multiple uncertainties affecting the quality of the model's input data can be identified (see ). These uncertainties can occur not only with Yasso07 but also with other prevailing models, highlighting large knowledge gaps in ecology and soil carbon modeling.

Setting soil carbon quality for model initialization: a recurrent challenge in soil carbon modeling
Great uncertainty is associated with model initialization in terms of soil carbon quality, as it is usually estimated, not measured, for example, through matrix inversion with the assumption that the litter input has been the same for decades. Compared to measuring total soil CSs, measuring soil carbon quality is much more labor-intensive and time-consuming. Moreover, soil carbon quality data from different sources may be partly or totally incompatible due to the use of different chemical pools or fractionation protocols (Blair et al., 1995). Therefore, measured data for soil carbon quality are generally lacking at the worldwide scale. This lack of information is a recurrent issue for soil carbon dynamics modeling (see Elliot et al., 1996, who have discussed the issue of "measuring the modelable"). Many prevailing soil carbon models require setting carbon quality in addition to carbon quantity, e.g., Romul (Chertov et al., 2001), RothC (Coleman and Jenkinson, 1996), CENTURY versions (Parton et al., 1987;Metherell et al., 1993) and CBM-CFS3 (Kurz et al., 2009). Setting carbon quality in models inappropriately may greatly change CS predictions (Wutzler and Reichstein, 2007;Carvalhais et al., 2008Carvalhais et al., , 2010. In this study, soil carbon quality data were unavailable at the French RENECOFOR sites. We therefore tested both complete and partial steady-state assumptions for setting the initial carbon quality. Compared to the complete steady-state Biogeosciences, 16, 1955Biogeosciences, 16, -1973Biogeosciences, 16, , 2019 www.biogeosciences.net/16/1955/2019/ assumption, the partial steady-state assumption made it possible to account for slow-cycling pools, which could still be accumulating carbon, and fast cycling pools in equilibrium (Wutzler and Reichstein, 2007). We did not use the precise method proposed by Wutzler and Reichstein (2007) to estimate initial carbon quality due to a lack of information necessary for the decomposition-accumulation dynamics of the H pool. Instead, while we followed the same partial steadystate assumption, we revised the proportions of the N and H pools and assumed that the A, W and E pools were in equilibrium and equal to the simulated values. We also assumed that the sum of all pools at t 1 was equal to the observed stock. We found that our partial steady-state assumption gave rise to generally better model fits than the complete steady-state assumption ( Fig. S3; see also Figs. 3 and S6), indicating its good suitability to the RENECOFOR sites. When plotting CS stead-state against CS obs (Fig. S5), we found a discrepancy: while the CS obs values of most of the broadleaved stands were smaller than CS stead-state , the CS obs of most of the coniferous stands were greater than CS stead-state . This discrepancy was brought into the ACC fit when the complete steady-state assumption was adopted (Fig. S6). Nevertheless, the partial steady-state assumption can, to some extent, mitigate such a discrepancy. For broadleaved stands, the revised proportions of the A + W + E pools became higher than those at the complete steady state ( Fig. 6; with 70 % of stands above the steady-state line), thus reducing the model's overestimation of ACC. For coniferous sites, the proportions of the A + W + E pools were often compressed ( Fig. 6; with < 50 % of the stands above the steady-state strip), thus reducing the model's underestimation of ACC at the steady state. For future work, it would definitely be worthwhile to compare both assumptions for several prevailing carbon models (e.g., Yasso07, RothC, Century, etc.), as studies comparing Figure 6. Distribution of estimated carbon quality based on the partial steady-state assumption (box plots) versus those based on the complete steady-state assumption (whose ranges are all very narrow and are expressed with strips in color: 13 %-15 % for the sum of A, W and E -cyan; 49 %-53 % for N -brown; and 33 %-36 % for H -purple). For each box plot, the low and top edges of the box correspond to the 25th and 75th percentile data points, respectively; the line inside the box represents the median, and there are no outlier points in this case. Br. -broadleaved stands; Co. -coniferous stands.
initialization assumptions still remain scarce compared to those on model comparisons.
In order to gain a global overview of Yasso07's sensitivity to initial soil carbon quality, we also conducted a sensitivity analysis that computed the final soil CSs for all pos-sible chemical-pool compositions. This sensitivity analysis confirmed the high influence of initial soil carbon quality on soil CS estimates (Fig. S4), notably at short temporal scales (i.e., yearly and decennial). This result is in line with previous CS modeling studies Kelly et al., 1997;Smith et al., 2009;Foereid et al., 2012), confirming that initialization is a crucial step for all chemical-pool-based carbon models. Our sensitivity analysis further showed that the effect of initial CS composition would gradually vanish with increasing length of time, especially in the case of several centuries or millennia. Our analysis provides new insights on the sensitivity of CS estimates to the method and assumptions used in model initialization. This analysis can be transposed to other carbon models to test their theoretical performance and robustness at different temporal scales and to compare models.
Finally, testing different initialization assumptions and performing sensitivity analyses are not enough to solve the predictability issues related to uncertainties in soil carbon quality. Based on ground truth data, Balesdent et al. (2018) showed that carbon age strongly reflects soil depth and ecosystem type. It appears to be highly necessary for future modeling work to capture better indicators of carbon stabilization mechanisms in the model initialization procedure. Yasso07's particular model configuration, i.e., based on measurable chemical pools, may make it possible to use measured soil carbon quality data for model initialization instead of steady-state assumptions. Future measurements of radiocarbon age for soil organic matter at the RENECOFOR sites may offer an ideal opportunity to compare the impact of initial soil carbon quality on Yasso07's predictions.

A precise estimation of root litter quantity helps improve Yasso07 predictions
An important source of uncertainty in the estimates of litter quantity at the RENECOFOR sites concerned fine-root litter input. Many studies have revealed that fine roots are a major source contributing to total litter quantity due to their fast turnover rates (Brunner et al., 2013;Kögel-Knabner et al., 2002;Berg and McClaugherty, 2008). In some forest ecosystems, the proportion of fine-root litter is even comparable to that of foliage (Freschet et al., 2013;Xia et al., 2015). However, estimating fine-root litter input is, again, a timeconsuming and challenging task. For this reason, to our best knowledge, probably no nationwide forest inventory projects have ever incorporated direct measurements of the dynamics of fine-root litter input (and this information is also lacking for the RENECOFOR network). Fine root turnover for forest species varies depending on climate, tree species and management scenarios (Kögel-Knabner et al., 2002;Litton et al., 2003;Mokany et al., 2006), and this makes choosing model input values highly subjective and difficult. By testing variable fine-root-to-foliage ratios of litter input, we observed a significant shift in the ACCs predicted by Yasso07 (Fig. S3). This finding not only highlights the importance of precisely quantifying fine-root litter input but also suggests that broadleaves and conifers may have a different fineroot litter input ratio with regard to that of foliage, although we chose the same ratio for both broadleaved and coniferous stands in this study. It should be noted that using one ratio per tree functional type (conifers versus broadleaves) can only change the overall prediction baseline and cannot reduce data dispersion. Consequently, it is of great interest to estimate fine-root litter input quantity at the species level through direct measurements and then couple the specific data with Yasso07. Potentially important litter input may also come from the shrubby and herbaceous understory species, which we did not take into account in this study due to data unavailability. The herb and shrub layers are typically not included in forest inventories, though they can contribute significantly to the annual litter production in forests (e.g., de Wit et al., 2006;Gilliam, 2007;Lehtonen et al., 2016). Muukkonen and Mäkipää (2006) estimated that the carbon input from herbaceous and shrub vegetation in Finnish forests was from 0.50 to 0.66 tC ha −1 yr −1 . This is quite high, as the value represents of the mean total tree litter input for all the RENECO-FOR sites combined (Table 1). This is in line with data from Didion et al. (2018), who estimated that understory vegetation at the sites of the Swiss National Forest Inventory contributed on average 10 %-15 % to the total observed annual non-woody carbon turnover with a range from a few percent to more than 50 %.
Finally, Yasso07's parameter set was calibrated based on one of the richest litterbag datasets in the world in terms of number of observations. The state of the art of soil carbon modeling assumes that litter input and decomposition processes are the driving forces in soil carbon accumulation. However, other important sources of biological carbon input exist, e.g., soil fauna and rhizodeposition; unfortunately, our ability to take them into account in modeling processes remains poor. Whether, and to what extent, the bias found in our Yasso07 results is related to these alternative sources of biological carbon input is unknown.

Suggestions for future modeling improvements
First, we found the Yasso07 model structure and algorithm solid, clear and simple to operate, in agreement with the positive remarks in the literature (Rantakari et al., 2012;Didion et al., 2014;Lu et al., 2015;Wu et al., 2015). Regarding its mass flow parameters, Fig. S1 only shows the mass flows that are statistically significant in the case of the Tuomi (2011) parameter set. Yasso07 keeps all the theoretical mass flow possibilities in the A p matrix in (Eq. 1b). However, a mass flow parameter with a statistical significance does not signify that it is biologically meaningful. For example, we can quote the flow N → A in the model (Fig. S1), for which the modeler assigned an astonishingly high percentage: p N→A = 83 %.
Biogeosciences, 16,[1955][1956][1957][1958][1959][1960][1961][1962][1963][1964][1965][1966][1967][1968][1969][1970][1971][1972][1973]2019 www.biogeosciences.net/16/1955/2019/ This quantity is disputable in light of soil biochemistry, because lignin, the major component constituting the N carbon pool, is not likely to pass into the A pool but would instead probably condense with other nearby phenols, peptides or saccharides (Burns et al., 2013). As a model for predicting soil carbon dynamics, Yasso07 is still overly simple in the description of some soil variables that are known to strongly impact decomposition processes. For example, soil mineralogy or aggregation is yet to be accounted for in Yasso07. Indeed, the model has often been applied on soils fairly rich in organic matter (e.g., Karhu et al., 2011), where the consideration of soil mineral properties was not particularly relevant and where the authors' assumption that litter quantity is a good proxy for soil properties was reasonable. In addition, when Yasso, Yasso07's prototype, was published in 2005 (Liski et al., 2005), information on mineral soil properties in the various forest soil horizons was not commonly available. Nowadays, however, it is easier to obtain, although there is still not enough detailed data for consistent application across large regions or at the national scale (Didion et al., 2016).

Conclusions
We tested the performance of the Yasso07 soil carbon model on decennial-scale French nationwide forest data collected through the RENECOFOR network. We also compiled a meta-analysis database for litter carbon quality and carried out sensitivity analyses to characterize the effect of initial litter input and soil carbon quality on the model's predictions. We showed that, while the model's estimates of CS to 1.0 m in depth and of ACC stayed within the same order of magnitude as observed values, the accordance between the observed and simulated ACCs at the site scale remained weak. There was a bias in the model's predicted trends for changes in CS at more than one-third of the French sites. As we have shown for Yasso07, the performance of soil carbon models should be examined before their application to management guidelines and policymaking for forest ecosystems at any scales.
Biases can be attributed to multiple factors concerning model input, such as (i) uncertainty in the measurement data for soil CSs and changes, (ii) a lack of information on initial soil carbon quality at the site level, and (iii) a lack of information on below-ground litter production. These factors are valid for the state-of-the-art soil carbon modeling, regardless of the model that one uses. Our sensitivity analyses explicitly confirmed the importance of factors (i) and (ii) above. Appropriately setting soil carbon quality is one of the most crucial steps to guarantee the model's fit. We found that the partial steady-state assumption gives rise to a significantly better model fit than does the complete steady-state assumption, when setting soil carbon quality. Some of the model's parameters governing the transfer among soil pools are statistically derived and not directly measured and thus may poorly represent actual biochemical decomposition processes. Residual analysis also suggests a potentially important role of physical and chemical soil properties in explaining the model's prediction ability.
Our findings allow us to provide modelers, users and policymakers with the following suggestions: -We suggest that Yasso07 modelers keep the current model structure, algorithm and parameters but incorporate some more refined biochemical processes: for example, that they (i) revise certain mass flows to achieve both statistically and biologically meaningful processes (especially the N → A flow), (ii) refine the decomposition process (i.e., the residence times between the A, W and E soil carbon pools); and, possibly, (iii) explicitly incorporate easily measured soil parameters to better represent biophysical and biochemical interactions in soil carbon cycling.
-We suggest that Yasso07 users work in conjunction with modelers in order to better reduce the uncertainties in model initialization for soil CSs. We also suggest measuring forest carbon quality and quantity and belowground fine-root litter to better feed the model.
-We suggest that policymakers remain prudent toward diagnoses based on a single carbon model, especially when a long-term trend is predicted. Predictions from multiple models should be cross-validated for both global and local areas.
This study, involving decennial observations at sites spread over a large spatial scale and covering different ecosystems, provides a good opportunity to facilitate future model calibration, improvement and reassessment. Finally, with Yasso07 as an example, this work highlighted the bottleneck in soil carbon modeling caused by the lack of knowledge or data on soil and litter carbon quality and on fine-root litter quantity, which creates high uncertainties for model initialization. Simultaneously, we demonstrated methodologies for testing other soil carbon models via sensitivity analyses to better enable us to understand the limits of the model and of the input data and to plan future improvements in soil organic carbon modeling. In this study, we used the model structure and parameters published in Tuomi et al. (2011a) without any modifications. Further work on sensitivity analyses incorporating modifications in both the carbon quality and litter input settings and Yasso07's configuration and parameters is needed to confirm the reliability of the current diagnoses.
Data availability. The plant and soil data used for model fit and validation are those published in Jonard et al. (2017). These data can be accessible via contacting the RENECOFOR network (http: //www.onf.fr/renecofor (Office national des forêts, 2019)., last access: 21 April 2019).

Name Meaning
Carbon stock (CS) Quantity of organic carbon stocked in the soil (in tC ha −1 ) Carbon stock change Increment (positive value) or decrement (negative value) of organic carbon stocked in the soil between year t 1 and year t 2 (in tC ha −1 ) Annual carbon stock change (ACC) Change in CSs per year (in tC ha −1 yr −1 ) Carbon pools The Yasso07 model contains a series of organic compounds involved in decomposition processes differing in solubility and mean residence time: water soluble compounds (W), acid-hydrolyzable compounds (A), non-polar solvent, ethanol or dichloromethane compounds (E), and non-soluble and non-hydrolyzable compounds (N). For soil, there is an additional recalcitrant pool called "humus" (H). Note that in this paper, "N" only denotes non-soluble and non-hydrolyzable compounds; nitrogen is spelled in full when mentioned.
Coarse woody litter Litter originating from either coarse above-ground residues due to either harvests or storms (including coarse branches of > 4 cm in diameter and miscellaneous residues) and coarse roots of >5 mm in diameter Fine non-woody litter Litter originating from either natural above-ground litterfall (leaves, small branches) or fine-root activities Litter carbon quality Litter carbon (in %) belonging to the A, W, E and N carbon pools (see "carbon pools" above) Author contributions. DD, MD and LSA designed the study as well as the structure of the paper. ZM, MD, MN and LSA collected or provided the data. JL provided technical help for model use. ZM performed model simulation, data analysis and wrote the first draft. DD, MD, TE and LSA participated in data analysis and result interpretation. All the co-authors participated in paper preparation.
Competing interests. The authors declare that they have no conflict of interest.