Information content in time series of litter decomposition studies and the transit time of litter in aridlands

. Plant litter decomposition stands at the intersection between carbon (C) loss and sequestration in terrestrial ecosystems. Organic matter during this process experiences chemical and physical transformations 10 that affect decomposition rates of distinct components with different transformation fates. However, most decomposition studies only fit one-pool models that consider organic matter in litter as a single homogenous pool and do not incorporate the dynamics of litter transformations and transfers in their framework. To this extent, compartmental dynamical systems are sets of differential equations that can be used to represent both the heterogeneity in decomposition rates of organic matter, and the transformations it can undergo. Further, a 15 metric that can be used to compare models with different structures is the transit time, that is, the mean age of particles when they are released from a compartmental system. In this study, we first asked what model structures are more appropriate to represent decomposition from a publicly available database of decomposition studies in aridlands: aridec . For this purpose, we fit one- and two-pool decomposition models with parallel and series structures and used model averaging as a multi-model inference approach. We then 20 asked what the potential ranges of the median transit times of C in aridlands are and what are their relationships with environmental and chemical variables. Hence, we calculated transit time for those models and explored patterns in the data with respect to mean annual temperature and precipitation, solar radiation, the Global Aridity Index, and one litter chemistry trait, the initial lignin content. Median transit time was 1.9 years for the one- and two-pool model with parallel structure, and five years for the two-pool series model. The information 25 in our datasets supported all three models in a relatively similar way, thus our decision to use a multi-model inference approach. After model-averaging, median transit time had values of around three years for all datasets. Exploring patterns of transit time in relation to environmental variables yielded weak correlation coefficients, except for mean annual temperature, which was moderate and negative. Overall, our analysis suggests that the information content in litter decomposition studies often holds little information on the 30 heterogeneity of litter and its transformation rate. Nevertheless, the multi-model inference framework proposed here can help to reconcile theoretical expectations with the information content from field studies and can further help to design field experiments that better represent the complexity of the litter decomposition process.

compartmental dynamical systems, which are sets of differential equations that can be used to represent both 50 the heterogeneity of organic matter chemical quality, and the transformations plant residues can undergo . This is achieved with the inclusion of different pools that decompose at different rates. This allows to model the dynamics of labile C compounds that are more readily available for microbial consumption like sugars, and other compounds that have a longer persistence in the litter pool like tannins or lignin. Additionally, it is possible to include interactions between these pools like C transfers from one pool to 55 another. This mass transfer between pools represents the transformation of molecules in litter without the actual loss of C from the litter system (Prescott and Vesterdal, 2021). Compartmental models of decomposition have been successfully applied for decades (Parton et al., 1987;Tuomi et al., 2009), and it has been proven many times that they can be an improvement from the traditional one-pool model (Adair et al., 2008;Cornwell and Weedon, 2014;Manzoni et al., 2012).

60
Despite the richness of information that can be learned from compartmental models, there still exist limitations for their widespread application. One main limitation is parameter identifiability. This happens because more complex models usually have more parameters and, in some cases, the information contained in time series of litter mass loss may not be enough to estimate those parameters unambiguously (Brun et al., 2001). Depending on the resolution and extension of the time series, it might be possible to obtain different number of parameters 65 from the available data (Sarquis et al., 2022a;. Consequently, different studies developed under different methodologies and sampling schemes may provide information on different model structures. Further, this limits the application of compartmental models to data from extensive heterogenous databases, since not all parameters might be identifiable for all datasets (Sarquis et al., 2022). It is common to compare model parameters like the decomposition constant when the same model has been 70 applied to many datasets. But, comparing the behavior of models with different structures in the same way is not possible, because decomposition constants of single homogenous pools are not comparable to decomposition constants of specific pools, such as those in compartmental models. Thus, a metric that can be used to compare models with different structures is the transit time of mass in a complex heterogeneous system. Transit time represents the mean age of particles when they are released from a system (Sierra et al., 2017). In 75 the context of litter decomposition studies, transit time can tell us about how long it takes C to exit litter since the start of an experiment. Transit time is a random variable with its own probability distribution, and thus mean and median transit times can be calculated (Sierra et al., 2018). Unlike a single decomposition rate, transit time can be calculated for the bulk of litter when using compartmental models. Transit time contains information from all different C compartments (Lu et al., 2018), and so, it becomes a more useful parameter when making 80 comparisons from models that have different structures. In this study we used the aridec database, which is an open access database of published decomposition studies in aridlands from around the world (Sarquis et al., 2022a). The focus of this database on aridlands stems from how widespread aridlands are, since around 41% of the land surface is classified as arid to some extent (Safriel and Adeel, 2005). This large area represents a wide range of diverse ecosystems, with many shared functional 85 characteristics. For instance, aridlands are usually more sparsely vegetated (Guttal and Jayaprakash, 2007) and this produces a shift in the importance of decomposition drivers in comparison to humid ecosystems. Plant litter under these conditions is more susceptible to solar radiation (Austin and Vivanco, 2006) and desiccation by wind (D'Odorico et al., 2019). Further, water sources other than rain can become more relevant when mean annual precipitation is low (Evans et al., 2020). These unique traits of arid ecosystems probably explain why 90 decomposition rates are not correlated to mean annual precipitation in these systems (Austin, 2011), contrary to what was proposed in the traditional literature (e.g., Meentemeyer, 1978). Furthermore, aridland processes are thought to become more widespread in the future because of aridland expansion (Feng and Fu, 2013) and drought-intensification of humid ecosystems (Grünzweig et al., 2022). Hence, we used the aridec database to address the following questions: given the information content in time 95 series of litter decomposition studies, what model structures are more appropriate to represent decomposition from arid ecosystems? From the set of obtained models, what are the potential ranges of the median transit times of C? Moreover, what are the potential relationships between median transit time and environmental variables? We fit one-and two-pool decomposition models with parallel and series structures and used model averaging as a multi-model inference approach. We further calculated transit times for those models and 100 explored patterns in the data in relation to environmental and litter chemical variables. First, we used the aridec database to fit a group of potential decomposition models. The aridec database is a publicly available database of decomposition studies from aridlands across the world (Sarquis et al., 2022a). This 105 database contains bulk litter mass loss data, but it lacks mass loss dynamics of different litter organic matter pools that decompose at different rates (e.g.: soluble carbohydrates, cellulose, lignin). Because of this, we took an inverse-modelling approach that allowed us to estimate the parameters of these unknown pools by fitting the models to mass loss data. This model calibration procedure constitutes a non-linear optimization problem, where the objective is to find parameter values that minimize a measure of badness of fit, like a weighted sum 110 of squared residuals (Soetaert and Petzoldt, 2010). Following this procedure, we obtained a set of parameters for each dataset and fit the dynamics of mass loss for different pools. We did this with the SoilR (Sierra et al., 2012) and the FME (Soetaert and Petzoldt, 2010) packages in R (R Core Team, 2020). SoilR is a modelling framework that contains a wide set of functions and tools to model soil organic matter decomposition within the R computing platform. Organic matter decomposition in SoilR is represented by 115 systems of linear differential equations that generalize most compartment-based models. A simple general structure to represent litter decay with no inputs follows Equation 1: Where C(t) is a m x 1 vector with m pools of litter mass observed at time t, and A is a square m x m matrix that contains decomposition rates (km) for each pool and transfer rates (aij) between them. These different pools may correspond to different ways in which the quality of the litter is expressed in different studies. For example, they may correspond to different compounds obtained from a specific extraction method (e.g.: water soluble sugars, or acid detergent lignin), or they can be defined by general decay classes such as fast and slow decay compounds.

125
The linear dynamical system represented by Eq.
(1), has many different solutions, but we are only interested in the solution that satisfies where C0 is a m×1 vector with the value of initial litter mass content in the different compartments m. We set total initial C0 to be 100% for this analysis and the resulting pm parameters are the initial proportions of litter in 130 m pools. Before fitting the models, we run a collinearity test following the procedure by Soetaert and Petdzolt (2010) and the results are presented in Sarquis et al. (2022). Briefly, when parameters are functionally related, changes in one of them can be compensated by changes in others. This produces different parameter sets that have similar probability distributions, thus it is impossible to determine a single parameter set for a model (Brun et al., 2001;135 Sierra et al., 2015). From this analysis, we were able to choose three models: a one-pool model, a two-pool parallel model, and a two-pool series model (Fig. 1). The one-pool model represents mass loss data as a single homogeneous mass compartment and has a single parameter, the decomposition rate k. The two-pool model with parallel structure considers litter mass as two distinct compartments that decompose at different rates. Hence, its parameters are the two decompositions rates (k1 and k2) and the initial proportion of litter mass in 140 pool one (p1, from which the proportion of mass in pool 2 can be calculated as p2 = 1 -p1). Finally, the two-pool series model is similar to the parallel model, but it incorporates the transfer of matter from pool one to pool two after its transformation. This is indicated in the model by the parameter a12 (i.e., the transfer rate from pool 1 to pool 2).
https://doi.org/10.5194/egusphere-2023-90 Preprint. Discussion started: 30 January 2023 c Author(s) 2023. CC BY 4.0 License. Figure 1: Decomposition models fitted in this study. C0: total initial carbon content in litter samples; C10: initial carbon content of the fast-decomposing pool; C20: initial carbon content of the slow-decomposing pool; k, k1, k2: decomposition rates of the total carbon pool, the fast-and the slow-decomposing carbon pools, respectively; a1,2: carbon transfer coefficient from the fast-decomposing pool to the slow-decomposing pool.
Specifically for the two-pool series model our collinearity analysis showed that only 20.1% of the datasets 150 produced identifiable results for this model, and only so when we restricted parameter p1. Restricting or fixing parameters to known values is a way of avoiding collinearity issues. For this purpose, we decided to use initial litter lignin content as a proxy for the p2 parameter (the initial proportion of mass in pool 2) which is complementary to p1 (p1 + p2 = 1). We were limited by the number of datasets that provided initial lignin values in aridec. We completed only three of these datasets with information from the TRY database (Kattge et al., 155 2020). We then completed some of the missing values by averaging lignin data of the same litter types that were already present in aridec. Having all the data ready, we proceeded to fit the models mentioned above. All time variables were transformed to monthly timescales to achieve more consistent comparisons.

Transit time
For each model, we calculated litter mass transit time (Sierra et al., 2017). This concept represents the mean age 160 of the particles when they are released from the bulk litter. Another way to interpret this is the time it takes particles to transit the litter system since the beginning of the experiment. We used a modified version of the Mean Transit Time (MTT) from Sierra et al. (2017) without new litter inputs: For both two-pool models, we used the function transitTime in the SoilR package. This function calculates the 165 mean and median of the distribution of the transit time as well as other quantiles of the distribution. The transit time median is interpreted as the time it takes half the litter mass in a sample to decompose. As a special case, for the one-pool model the MTT can be simply calculated as: While the Median TT (mTT) can be calculated as: Some of the mTT values from the two-pool series models were extremely high and did not make any biological sense. Because of this, we decided to make a cutoff at a mTT of 14.5 years. This value came from fitting this model to the longest data set in aridec which is 10 years long and corresponds to average data of different species at Central Plains Experimental Range in Adair et al. (2017) (S1). We excluded from this study the data 175 sets that exceeded this median transit time cutoff. Finally, after accounting for collinearity, the availability of initial litter lignin data and the mTT cutoff, we were left with 128 data sets from 12 aridec entries.

Model selection and multi-model inference
As a first attempt at model selection, we calculated the Bias Corrected Akaike Information Criteria, which is used for small sample sizes (AICc; Burnham and Anderson, 2002). We used the formula from Shumway and Stoffer 180 (2017): where σ2k is the variance of the model (in this case the mean squared residuals, i.e. sum of squared residuals divided by sample size, MSR hereafter), k is the number of parameters in the model, and n is the sample size or the number of points in each time series. We accounted for the variance as one of the parameters in the formula 185 as Burnham and Anderson (2002) recommend. A common way of choosing the model with the best fit is by looking at the model with the lowest AIC value. We did this using the akaike.weights function from the qpcR package. Additionally, we calculated the difference in AICc between the model with the lowest AICc and the other two candidate models (ΔAICc). Since we did not have enough information to choose a single model structure based on AICc (see Results section), we decided to 190 follow a multi-model inference approach (Burnham and Anderson, 2002). We first calculated Akaike Weights using the function Weights from the MuMin R package for each model. Akaike Weights can be interpreted as the probability that a model j is the best of all i candidate models given the data (Lukacs et al., 2010), and are calculated as: 195 We then calculated new average estimators for the mean and the median transit times as: where ̂ is the i parameter estimator ̂ for each j model. This results in estimators of mean (avgMTT) and median (avgmTT) transit times averaged across models for each database entry. We calculated as well the unconditional variance for each averaged estimator (Burnham and Anderson, 2002;200 Lukacs et al., 2010) as: Finally, we estimated 95% confidence intervals as: where cv stands for the critical value of a t-distribution for a particular number of degrees of freedom.

205
We plotted data against environmental variables to explore potential relationships with avgmTT and calculated r correlation coefficients. We used data already available in aridec like mean annual temperature, mean annual precipitation, and one litter quality variable like initial lignin content. We additionally used Global Aridity Index as calculated in Sarquis et al. (2022) for aridec entries and annual solar radiation from the TerraClimate database (Abatzoglou et al., 2018).

210
Further, to test whether the data fit an exponential distribution, we calculated the ratio between avgmTT and ln2 * avgMTT. In an exponential distribution, the median equals ln2 times the mean. So, if the ratio between the median from our models (avgmTT) and the median calculated as ln2 * avgMTT equals 1, that would imply that both medians are equal, and the model follows an exponential distribution. All calculations, modelling and figures were made using R (R Core Team, 2020).

Results
We fit three different candidate models for 128 time series of decomposition, which totaled 384 models (see  table S2 for model fit). The information in our datasets supported all three models in a similar way. Most times the one-pool model had the lowest AICc values, but close to one-third of the times the two-pool series model 220 had the best fit according to AICc (Fig. 2). Our ΔAICc values were very low (ΔAICc of the 3rd quartile: 1.515), which reinforced the idea that the information available was not enough to choose a single model with the best fit. Following this, we decided for a multi-model inference approach using model averaging, which left us with 128 individual models (see table S3 for model variance and confidence intervals). Median transit time of plant litter in arid lands after model averaging was within the range of original models (Fig. 3). In this analysis, we only used data from litter decomposed in ambient conditions (without manipulative treatments). One and two-pool parallel models had similar mTT (23.27 ± 9.28 and 23.04 ± 9.65 months, mean ± 230 standard deviation respectively). The two-pool series model had near three-fold mTT values of 60.21 ± 45.80 months. After model-averaging mTT (i.e.: avgmTT) dropped to 36.15 ± 22.20 months. Looking at the avgmTT alone showed the wide range of time that litter takes to decompose in arid ecosystems (Fig. 4). Exploration of patterns of transit time in relation to environmental variables yielded weak correlation coefficients, except for mean annual temperature which was moderate (r = -0.56). Values of avgmTT at the coldest end ranged between 37 and 65 months, while the warmest site showed values of 8 months (Fig. 4a).

240
This suggests that plant litter in warmer aridlands decomposes faster than in colder sites.

each site. Correlation coefficients (r) are displayed.
Calculating the quotient between the avgmTT and avgMTT times the natural logarithm of two showed contrasting results (Fig. 5). Fourty-two percent of the models in this analysis had values near to zero, which suggests that those models did not follow an exponential distribution. Only 15 % of the models had values between 0.9 and 1.0. Complementarily, this suggests that those models had a near exponential distribution. We asked as our first question: what model structures are more appropriate to represent decomposition in 255 aridlands? After fitting three different models to the data in aridec we found that there was not enough information to choose a unique model judging by their AICc values (Fig 2), so instead we took a multi-model approach (Burnham and Anderson, 2002). This allowed us to incorporate the dynamics of all three models in our results by using AICc weights (Lukacs et al., 2010). In this way, our predictions of transit time in arid lands include the differences in litter chemistry and their effects on decomposition, instead of just considering the bulk of 260 litter as a homogenous C pool. This type of information theoretical approach like model averaging is not novel, but is still underused in ecological studies (Grueber et al., 2011). However, before fitting complex compartmental models, researchers should take into consideration the issue of collinearity. In a previous study, we found that most of the time series in the aridec database could only be fitted to simpler models with less than three parameters (Sarquis et al., 2022a). This was because the 265 information contained in those time series of litter decomposition was not sufficient to inform more complex models, for example: models with three distinct C pools with transfer coefficients between them. This lack of information in the data caused collinearity between parameters, which in turn made it impossible to identify a single set of parameters for each model (Brun et al., 2001;. Some of these limitations probably come from the short number of sampling points in most decomposition studies (Sarquis et al., 2022a), which 270 lowers the degrees of freedom available and limits our capacity to model complex organic matter dynamics. The fact that complex models cannot be obtained from the data suggests that we should put more attention into designing field experiments that can better inform about model structures that are more consistent with our current understanding of litter heterogeneity and transformations (Prescott and Vesterdal, 2021). Our second question was: what are the potential ranges of the median transit times of C in litter for aridlands?

275
This part of our study yielded some new insights into the biogeochemistry of arid environments. Median transit time from one-and two-pool decomposition models without interactions were similar and showed that half of the C in litter is lost after almost 2 years in the field (Fig. 3). However, results from the two-pool model with series structure were almost three times higher. This is explained by the C transfer from the fast-decomposing pool to the slow-decomposing pool, which slows down C release from litter. After model-averaging, we obtained 280 intermediate values of median transit times of around three years (Fig. 3). Interestingly, this connects back to the problem of model parameter identifiability. Most decomposition studies carried out in aridlands last for only a year (Sarquis et al., 2022a). But our results show that decomposition of litter in arid environments can take on average six times longer until all C exits the system. This means that most field decomposition studies are not capturing the entire dynamics of C release through time. In turn, this has 285 consequences for potential future research because the information that is not contained in data cannot be retrieved by modeling techniques (Brun et al., 2001). If we aim to incorporate field data into complex Earth system models, we need to take into consideration the study length to capture the entire decomposition process. We acknowledge this might seem excessive given academic times go usually faster than litter decomposition in aridlands. However, successful long-term litter decomposition projects exist and can be a 290 potential solution to this issue (e.g.: LIDET; Gholz et al., 2000). We asked as our third question: what are the relationships between median transit time and environmental and litter chemical variables? From the set of five variables that we used to explore these relationships, only mean annual temperature showed a moderate correlation with median transit time from average models (Fig. 4a). The importance of temperature as a climatic driver of decomposition is well documented (Zhang and Wang, 295 2015), both through its positive effects on microbial activity (Sinsabaugh et al., 1991) and its increase of photochemical emissions (Day and Bliss, 2020). Moreover, the correlation with mean annual precipitation was weak (Fig. 4b). This was rather expected since it has been long known that precipitation fails to explain patterns of decomposition rates in aridlands (Austin, 2011). As a final remark, we explored what can transit time teach us about the distribution of decomposition models.

300
After calculating the quotient of the median transit time and the natural logarithm of two times the mean transit time from average models, only 15 % of the models had values close to one (Fig. 5). This is indicative that for most cases models did not follow an exponential distribution since the median of an exponential distribution equals ln2 times the mean. The negative exponential model of decomposition has been the standard for litter and soil organic matter decomposition studies since at least five decades ago (Olson, 1963). This connects back 305 to our first results where the one-pool exponential model was not chosen by our information theoretical approach (Fig. 2). Previous studies found similar results where the negative exponential one-pool model did not rank first for the entirety of the datasets considered (Adair et al., 2008;Cornwell and Weedon, 2014;Manzoni et al., 2012). Testing which distributions fit best the data beforehand is a must and future decomposition studies should test whether the single-pool negative exponential model is actually the best model to fit.

Conclusions
Although our theoretical understanding of the litter decomposition process is based on the assumption that plant litter is chemically and physically heterogeneous, and undergoes multiple transformations, time series of litter decomposition studies contain only relatively little information on litter heterogeneity and its 315 transformation rates. However, we have shown that a multi-model inference approach helps to reconcile theoretical understanding with information content in observed datasets of litter decomposition. In particular, the combination of AIC model averaging applied to a metric that is independent of model structure, the transit time, provides an inference framework that is useful to understand decomposition dynamics. This framework could help us get a better insight into the chemical transformations of organic matter in litter and soil, and how 320 soil organic matter respond to changes in the environment. We recognize some limitations for modelling these complex structures arise from field study designs that do not capture the entire decomposition process. This limits the quantity and the quality of the information contained. We recommend that future field decomposition studies incorporate in their designs some strategy to better capture the dynamics of different organic matter pools in litter. This could be done by either measuring the 325 proportion of each compound through time, or by increasing sampling times and study length. The two latter can help gain a better fit and avoid collinearity when using an inverse-modelling approach as in this study. We further encourage researchers to fit models other than the one-pool model, when possible.

Author contribution
CAS supervised the study. CAS and AS conceptualized the study. AS curated the data and carried out the analysis. AS wrote the original draft of the manuscript. CAS and AS revised and edited further versions of the manuscript.

Competing interests
The authors declare that they have no conflict of interest.

Acknowledgments
We thank Mina Azizi-Rad for early discussions on model fitting and selection. We also thank Ignacio Siebenhart for providing annual solar radiation data. We thank Amy Austin, Soledad Méndez and Ignacio Siebenhart for