A Bayesian sequential updating approach to predict phenology of silage maize

. Crop models are tools used for predicting year to year crop development on field to regional scales. However, robust predictions are hampered by factors such as uncertainty in crop model parameters and in the data used for calibration. Bayesian calibration allows for the estimation of model parameters and quantification of uncertainties, with the consideration of prior information. In this study, we used a Bayesian sequential updating (BSU) approach to progressively incorporate 10 additional data at a yearly time-step to calibrate a phenology model (SPASS) while analysing changes in parameter uncertainty and prediction quality. We used field measurements of silage maize grown between 2010 and 2016 in the regions of Kraichgau and Swabian Alb in southwestern Germany. Parameter uncertainty and model prediction errors were expected to progressively reduce to a final, irreducible value. Parameter uncertainty reduced as expected with the sequential updates. For two sequences using synthetic data, one in which the model was able to accurately simulate the observations, and the 15 other in which a single cultivar was grown under the same environmental conditions, prediction error mostly reduced. However, in the true sequences that followed the actual chronological order of cultivation by the farmers in the two regions, prediction error increased when the calibration data was not representative of the validation data. This could be explained by differences in ripening group and temperature conditions during vegetative growth. With implications for manual and automatic data streams and model updating, our study highlights that the success of Bayesian methods for predictions 20 depends on a comprehensive understanding of inherent structure in the observation data and model limitations.


Introduction
The effects of climate change are already being felt, with increasing global temperature and frequency of extreme events (Porter et al., 2015), which will have an impact on food availability.In order to mitigate risks to food security, suitable adaptation strategies need to be devised which depend on robust model predictions of the productivity of cropping systems (Asseng et al., 2009).Soil-crop models, which are able to predict changes in crop growth and yield, as a consequence of changes in model inputs like weather, soil properties, and cultivar-specific traits, are considered suitable tools to plan for a secure future.However, achieving robust model predictions is challenging.This is because there is uncertainty in the model inputs, parameters and process representation, as well as in the observations used to calibrate these models (Wallach and Thorburn, 2017).It is therefore essential to quantify these uncertainties.
Different interpretations of the underlying soil-crop processes have led to different representations in models of varying complexity (Wallach et al., 2016).Process model equations have parameters that represent physiological processes, but are often based on empirical relationships.These relationships describe system processes which cannot be further resolved with reasonable effort.While most parameters, that represent physiological aspects of plant growth and development, can be determined in dedicated experiments (Craufurd et al., 2013), many others still need to be estimated through model calibration.However, the measured parameters and state variables used for model calibration are uncertain due to errors in the measuring device or technique, as well as the natural variability of the system due to processes occurring at different spatial or temporal scales.Given the different sources of uncertainty, it is important to set up adequate workflows to enable uncertainty quantification and protocols for reporting them, especially when they influence decision-making (Rötter et al., 2011).
For this, the Bayesian approach is an elegant framework to propagate uncertainty from measurements, parameters, and models to prediction.One advantage of Bayesian inference is the use of prior information (Sexton et al., 2016).The posterior probability distribution obtained by conditioning on one dataset can then be used as a prior distribution for the next dataset in a sequential manner (Hue et al., 2008).This approach, called Bayesian sequential updating (BSU), would be more computationally efficient than having to re-calibrate the model to all previous datasets, every time new data are available.It has been applied to big data studies in which large datasets were split to reduce computational demand and the information was sequentially incorporated (Oravecz et al., 2017).Cao et al., (2016) used BSU to analyse the evolution of the posterior parameter distribution for soil properties by incorporating data from different types of experiments.Thompson et al. (2019) applied this approach to estimate species extinction probabilities where species-siting data were sequential in time.While there are numerous examples of Bayesian methods being applied in crop modelling for uncertainty quantification and data assimilation (Alderman and Stanfill, 2017;Ceglar et al., 2011;Huang et al., 2017;Iizumi et al., 2009;Makowski, 2017;Makowski et al., 2004;Wallach et al., 2012;Wöhling et al., 2013Wöhling et al., , 2015)), to the best of our knowledge, the BSU method has not yet been evaluated in the field of crop modelling.In this study we assessed whether crop model predictions progressively improve as new information is incorporated using the BSU approach.This ascertains whether the model and parameters are both temporally and spatially transferable for a particular crop species, an important aspect for large-scale and long-term predictions.Our study was focused on modelling crop phenological development.
Plant phenology is concerned with the timing of plant developmental stages like emergence, growth, flowering, fructification, and senescence.It is controlled by environmental factors such as solar radiation, temperature, water availability, and depends on intrinsic characteristics of the plants (Zhao et al., 2013).Phenological development is a crucial state variable in soil-crop models, since it controls many other simulated state variables like yield, biomass and leaf area index by influencing the timing of organ appearance and assimilate-partitioning. Phenology is not only species-specific but can also differ between cultivars of the same species (Ingwersen et al., 2018).Model parameters that influence phenology could vary depending on the cultivars (Gao et al., 2020) and possibly also on environmental conditions (Ceglar et al., 2011).
Since parameter uncertainty is a major source of prediction uncertainty (Alderman and Stanfill, 2017;Gao et al., 2020), it impacts prediction quality.
To this end, we assessed the impact of sequentially incorporating new observations with the BSU approach, on prediction quality of phenological development.For this, we modelled phenological development of silage maize grown between 2010 and 2016 in Kraichgau and Swabian Alb, two regions in southwestern Germany with different soil types and climatic conditions.We monitored the changes in parameter uncertainty and evaluated prediction quality by performing model validation in which simulated phenological development was compared with observations for datasets that were not used for calibration.We hypothesized that: (1) Parameter uncertainty decreases and quality of prediction improves with the sequential updates in which increasing amount of data are used for model calibration.
(2) For the first few sequential updates, the quality of prediction is variable, until the calibration samples become representative of the population.
(3) The prediction error then progressively drops to an irreducible value that represents the error in inputs, measurements, model structure and variability due to spatial heterogeneity that is below model resolution.
We tested these hypotheses by applying BSU in two modelling cases that represent ideal and real-world conditions.In the first case, we applied BSU to two synthetic sequences: an ideal sequence of observations wherein the model is able to simulate the observations accurately, and a controlled cultivar-environment sequence of observations which represent different growing seasons of a single cultivar grown under the same environmental conditions.In the second case, we applied the BSU to two true sequences that follow the actual chronological order in which different cultivars of silage maize were grown in the two regions under different environmental conditions.
With this study, we explicitly deal with a well-known problem in regional modelling, which carries particular weight in the case of maize.On a regional scale, maize cultivars may differ considerably in their phenological development, but cultivar information will rarely be available.Even if data on cultivars grown were available, phenological data on all relevant cultivars in a particular region will rarely be at hand.Consequently, model parameters are typically estimated for the crop species and not for the individual cultivars.Also, the maize cultivars of our study represent only a small subset of cultivars grown in Kraichgau and the Swabian Alb.We therefore grouped the maize cultivars into ripening groups for analysis of prediction quality.

Study sites and measured data
The data used for the study consist of a set of measurements taken at three field sites (site 1, site 2, site 3) in Kraichgau and two field sites (site 5 and site 6) on the Swabian Alb, in southwestern Germany, between 2010 and 2016 (Fig. 1i) (Weber et al., 2021).The main crops in rotation were winter wheat, silage maize, winter rapeseed, and cover crops like mustard and phacelia.Additionally, spelt, spring and winter barley were also grown on the Swabian Alb.Amongst others, continuous measurements of meteorological conditions, soil temperature and moisture were taken.Soil profiles were sampled at the sites for characterization of soil properties.
Kraichgau and Swabian Alb represent climatologically contrasting regions in Germany.Kraichgau is situated 100 to 400m above sea-level and characterized by a mild climate with a mean temperature above 9℃ and mean annual precipitation of 720 to 830mm.It is one of the warmest regions in Germany.The Swabian Alb is located at 700 to 1000m above sea-level with a mean temperature of 6 to 7℃ and mean annual precipitation of 800 to 1000mm.Kraichgau soils have often developed from several metres of Holocene loess, underlain by limestones.They are predominantly Regosols and Luvisols.The Swabian Alb has a karst landscape with clayey loam soils, often classified as Leptosols.Soils may be less than 0.3m thick in some areas.While the soils at the sites in Kraichgau are alike, they vary across the sites on the Swabian Alb (Wizemann et al., 2015).At every study site, which had an area of around 15 ha, replicate observations were made by assessing phenological development stages from maize plants in five subplots of 2m × 2m each.Ten maize plants were chosen from each subplot.We used the BBCH growth stage code (Meier, 1997) to define the development stages.The BBCH value of 10 marks emergence and the start of leaf development, 30 stands for stem elongation, 50 for inflorescence, emergence or heading, 60 for flowering or anthesis, 70 for development of fruit, 80 for ripening, and 90 for senescence (Fig. 1ii).In the following sections, the individual growing seasons for silage maize are denoted by the site and year of growth i.e. the site-year (Table 1).For example, silage maize grown at site 2 in Kraichgau in the year 2012 is referred to as '2_2012'.The different cultivars used in the study can be grouped into three ripening or maturity groups, based on their timing of ripening.Mid-early (ME) and late (L) ripening cultivars were grown in Kraichgau, and early (E) and mid-early (ME) ripening cultivars were grown on the Swabian Alb.

Soil-crop model
To simulate the soil-crop system, we used the SPASS crop growth model (Wang, 1997).SPASS is implemented in the Expert-N 5.0 (XN5) software package (Heinlein et al., 2017;Klein et al., 2017;Priesack, 2006).In XN5, the SPASS crop model is coupled to the Richards equation for soil-water movement as implemented in the model Hydrus-1D model (Šimůnek et al., 1998).The routine uses van Genuchten-Mualem hydraulic functions (van Genuchten, 1980;Mualem, 1976) and the heat transfer scheme from the Daisy model (Hansen et al., 1990)

Selection of model parameters
Parameters were first pre-selected (Hue et al., 2008;Makowski et al., 2006) based on expert knowledge.ParameterThe prior default values were elicited from an expert and uncertainty ranges are given in Table 2.A global sensitivity analysis using the Morris method (Morris, 1991) was then carried out to identify the sensitive parameters to be estimated through Bayesian calibration (supplementary material 1).The sensitive parameters identified for calibration were: effective sowing depth (SOWDEPTH), which influences the emergence rate, and parameters affecting development in the vegetative phase (PDD1, TMINDEV1, DELTOPT1, and DELTMAX1).Parameter DELTOPT2, from the temperature response function during the reproductive phase, was estimated during calibration even though it was less sensitive.The choice of using this parameter during calibration was based on knowledge of model behaviour, so as to reduce the calibration error in the reproductive phase (Lamboni et al., 2009).Thus, out of eleven pre-selected parameters (Table 2) six were estimated in BSU, while the remaining parameters were fixed at their expecteddefault values.

Bayesian sequential updating
In the Bayesian sequential updating (BSU) approach, Bayesian calibration is applied in a sequential manner.New data are used to re-calibrate the model, conditional on the prior information from previously gathered data.We describe the details of this approach below.
Bayes theorem states that the posterior probability of parameters θ given the data Y, P(θ|Y), is proportional to the product of the joint prior probability of the parameters P(θ) and the probability of generating the observed data with the model, given the parameters P(Y|θ).The term P(Y|θ) is referred to as the likelihood function and is defined as the likelihood that observation Y , that is, observed phenological development in this study, is generated by the model using the parameter vector θ.The posterior probability distribution is obtained by normalizing this product by the prior predictive distribution (Gelman et al., 2014) or Bayesian Model Evidence (Schöniger et al., 2015) P(Y), which is obtained by integrating the product over the entire parameter space.
Hence, we write: where Equation ( 2) can become intractable, especially with a large number of parameters as this involves integrating over high dimensional space (Schöniger et al., 2015).Instead, sampling methods like Markov Chain Monte Carlo (MCMC) are used to estimate the posterior distribution.
For one site-year sy 1 and corresponding observation vector Y sy 1 , the posterior parameter probability distribution is: where P(θ) represents the initial prior probability distribution that could be based on expert knowledge.The posterior parameter distribution P(θ|Y sy 1 ) can now be used as a prior distribution for the next site-year sy 2 .Thus, for site-year sy n with an observation vector Y sy n , the posterior parameter probability distribution is: With the aim of making the computations tractable, we deviate slightly from this pure BSU approach as we do not strictly use the posterior from the previous site-year as the prior for the next, but sequentially calibrate the model to data from increasing number of site-years instead.The reason for this deviation is that, in applying BSU, where the posterior parameter distribution is estimated by sampling methods, a probability density function needs to be approximated from the sample, so that it can be used as a prior probability for the subsequent site-year.This approximation introduces additional errors.Since joint inference is known to be better than sequential inference using posterior approximations (Thijssen and Wessels, 2020), Eq. ( 4) can be re-written, under the assumption that the phenology observations from all site-years are independent and identically distributed (Gelman et al., 2014), as follows: Thus, we use Eq. ( 5) to sequentially update the probability distribution of parameters by increasing the dataset size at each step through the addition of one site-year worth of new data Y x to the previous dataset Y x−1 .
After each inferential step, the phenological developmentprobability of observing a certain phenology at the next site-year sy n+1 is predicted by: where P(Y sy n+1 |Y sy n ) is the posterior predictive distribution (Gelman et al., 2014).We refer to the current methodology as BSU, although it is not strictly so, for reasons of simplicity and the formal similarity of our approach.All calculations and the BSU was carried out using the R programming language (R Core Team, 2020).
In the following sections, we describe the components of Bayes formula in detail.

Likelihood function
Let θ = (φ 1 , φ 2 , φ 3 , … φ j ) represent a vector of the model parameters to be estimated in this study (Table 2).Suppose ) is a vector of the means of observed phenological development at different days during the growing season for a particular site-year.The mean observation y ̅ d on day d for the site-year is given by: where y r,p,d represents the r th replicate of observed phenological development, measured at subplot p on day d for a particular site-year, R is the total number of replicates at subplot p, and P is the total number of subplots per field.
If we assume that all replicates R in all subplots P are independent, the standard deviation of the replicate observations on /(P × R) .This is one source of observation error that represents the spatial variability at the study site which is below the spatial resolution of the model.We also assume an additional source of error in identification of the correct phenological stage and its exact timing of occurrence.We assume that this error is within a standard deviation of 2 BBCH (σ ident,d = 2 for each observation day d).This assumption was made, since 2 is the most common difference between development stages in the phenological development of maize on the BBCH scale.Assuming that the error from replicate observations (σ r,p,d ) and error in the identification of phenological stages are additive, the total observation error is The model residual y d − f(θ) d is the difference between the observed y d and the model simulated f(θ) d phenological stage and is represented by the likelihood function.Assuming normally distributed residuals, it is given by: The likelihood values for all the observations are combined by taking the product of the likelihoods per day of observation, under the assumption of independent and identically distributed model residuals.The combinedThus, the joint likelihood function is given by: 220 where Y x is the observation vector for site-year x.

Prior probability distribution
As prior information, we used a weakly informative probability distribution function (pdf) to ensure that the posterior parameter distributions are mainly determined by the data that are sequentially incorporated.For this, we used a platykurtic prior probability distribution that is a convolution of a uniform and a normal distribution (Fig. D-1) of the form: 225 where xφ j is a value of the j th model parameter φ j in the parameter vector θ, a and b are the minimum and maximum limit for the parameter, respectively, μ is the mean (expecteddefault value in Table 2), and σ the standard deviation.The normalization constant c is used to ensure that the area under the curve is unity as required for probability density functions.
The joint prior pdf was calculated by P(θ) = ∏ P(φ j ) J j=1 and the model parameters were assumed to be uncorrelated, i.e. the off-diagonal elements of the variance-covariance matrix are zero..The parameters a, b, c, σ, μ, of P(φ j ) was based on expert knowledge (Table 2).

Posterior probability distribution
The posterior parameter distribution was sampled using the Markov Chain Monte Carlo method -Metropolis algorithm (Metropolis et al., 1953) (for details refer to Appendix B: Posterior sampling using MCMC Metropolis algorithm).Appendix B: Posterior sampling using MCMC Metropolis algorithm).Three chains were run in parallel.A symmetrical transition kernelnormal distribution was chosen as the jump distributiontransition kernel.The jump size was adapted so that the acceptance rate would be between 25% and 35% (Gelman et al., 1996;Tautenhahn et al., 2012).For each sequential update calibration case, when a new site-year was added to the calibration sequence, the three chains were re-initialized and the transition kernel was re-tuned.A preliminary calibration test case, in which the model was calibrated to site-year 6_2010, was used to generate the starting points of the chains for each of the calibration cases.The starting points were randomly sampled from the posterior parameter range of the calibrated test case.This was done to reduce the time to convergence.For the test case calibration, the starting points of the chains were randomly sampled from the prior range.The number of iterations for adapting the transition kernel varied between the different calibration cases.This number was low for some of the calibration cases because we set the initial pre-adaptation value for the standard deviation of the transition kernel, so that the acceptance rate would be between 25% and 35%.This initial value was based on knowledge gained from preliminary calibration test simulations.Convergence of the chains after jump adaptation was checked using the Gelman-Rubin convergence criteria (<= 1.1)diagnostic (Brooks and Gelman, 1998;Gelman and Rubin, 1992).The total number of samples of the posterior distribution in each calibration case was dependent on when the Gelman-Rubin diagnostic was <=1.1, while ensuring a minimum of 500 accepted samples per chain, that is, a minimum of 1500 samples across the three chains.In effect, the total number of samples per calibration case was greater than 1500.The burn-in was variable and depended on the jump-adaptation.Only the iterations from the jump-adaptation step were discarded as burn-in.Parameter mixing was evaluated using trace-plots.
For model validation, the posterior predictive distribution was used to simulate phenological development and compare with observations at site-years that were not included in the calibration sequence.

Performance metrics
Bias and normalized root mean square error (NRMSE), as defined in Eq. ( 12) and ( 13), for site-year sy were calculated to assess the calibration and prediction performance.
Here, θ i is the i th parameter vector, D is the total number of observation days for the particular site-year, f(θ i ) d is the is also reported in some plots.
The prediction quality is good when NRMSE is low and bias is zero.Prediction performance is classified as good, moderate, or poor depending on the median NRMSE of the predictions for a site-year.We use the following categories: good performance for median NRMSE <=1, moderate for 1< median NRMSE<=2, poor for 2<median NRMSE<=3 and very poor for median NRMSE>3.
We estimated the information entropy of the posterior parameter distributions after each sequential update using the redistribution estimate equation (Beirlant et al., 1997) (supplementary material 2).A change in entropy with sequential updates indicates a change in uncertainty of the parameters, where higher information entropy indicates greater uncertainty in the posterior parameters.In line with our hypotheses, we expect the entropy to decrease with sequential updates.

Modelling cases
The BSU approach described above and the subsequent analysis using the performance metrics were applied to two synthetic sequences and two true sequences of site-years.The synthetic sequences were used to demonstrate the application of the BSU approach in ideal conditions, while the true sequences were used to extend the application to real-world conditions.
Figure 2 shows the four sequences and the site-years used for calibration and validation.

Synthetic sequences
We set up two synthetic sequences, namely ideal and controlled cultivar-environment.In each synthetic sequence, we used 10 sequential updates wherein one through 10 site-years were used in calibration.After each sequential update, the calibrated model was validated against a different set of 10 synthetic site-years (Fig. 2).Note here that the 10 site-years used for validation were the same across the sequential updates.Data from the 10 site-years used for calibration (red box-plots) and the 10 site-years used for validation (blue box-plots) for the two synthetic sequences are shown in Fig. 3. Site-year 6_2010 was used to generate data for the synthetic sequences, as described below.290 The ideal sequence represents a case in which the model is able to accurately simulate the observations.The only sources of difference between site-years are from the spatial variability at the field site which is below model resolution, and the incorrect identification of phenological stages during field observations.For generating the ideal sequence of site-years, we first calibrated the model to phenology at 6_2010.The parameter set θ MAP corresponding to the maximum a posteriori probability (MAP) estimate was used to simulate phenology and generate the synthetic dataset.To introduce inter-site-year differences, noise was added to simulated phenology f(θ MAP ) d at observation day d, where the noise was equal to the total observation uncertainty σ d at that day for site-year 6_2010.Thus, for each synthetic site-year on observation day d, the phenological development was sampled from the range of total observation uncertainty σ d at 6_2010, around simulated

Results
In this section, we first describe the results for one example of Bayesian calibration using the data from site-year 6_2010 (section 3.1 Bayesian calibration results).Here we examine the resulting simulated phenology after calibration as well as the posterior parameter distributions.We then look at the results from the synthetic and true sequences.We first evaluate the evolution of the posterior parameter distributions with sequential updates.As an example, we analyse the marginal distributions of the individual parameters and entropy of the joint parameter distributions for the true sequences (section 3.2 Parameter uncertainty).Lastly, we discussreport the prediction quality results for the synthetic and true sequences (section 3.3 Prediction quality).

Bayesian calibration results
By way of example, Fig. 4 shows the Bayesian phenological model calibration results for silage maize at the first site-year 6_2010.Cross-plots of the posterior parameters (Fig. 4i) show weak negative correlation between PDD1 and TMINDEV1 and between PDD1 and DELTOPT1, while a weak positive correlation is observed between PDD1 and DELTMAX1.The observed mean phenological development falls within the range of simulations after calibration (Fig. 4ii).The marginal posterior parameter distributions are narrower than the initial prior distributions (Fig. 4iii).A shift in parameter distribution to the margins of the prior ranges is also notednoteworthy.

Parameter uncertainty
We analysed the change in posterior parameter distribution with the sequential updates.Figure 5(i) shows the marginal initial prior and posterior parameter distributions for the Swabian Alb and Kraichgau true sequences.The x-axis from left to right indicates the initial prior parameter distribution followed by the sequential calibration of the model to an increasing number of site-years.The distributions for the six estimated parameters are compared after each sequential update.The width of each box with whiskers represents the uncertainty in the parameter values.There is a clear narrowing of parameter distributions after the first sequential update from the initial prior.However, with the exception of DELTOPT2, the remaining parameters do not show a noticeable and consistent narrowing in range with sequential updates.However, informationInformation entropy of the joint posterior parameter distributions in Fig. 5(ii) decreases with sequential updates.There and there is a large reduction in entropy afterwith the first sequential update.In the Swabian Alb sequence (Fig. 5ii-a), entropy continues to reduce until the model is calibrated to 6_2010, 5_2011 and 5_2012, after which there is no significant reduction.In the Kraichgau sequence (Fig. 5ii-b), the inclusion of 1_2014 during calibration, results in further uncertainty reduction.Similar observations were made for the synthetic sequences (supplementary material 56).

Synthetic sequences
In the synthetic sequences, we assessed the prediction quality after applying BSU to 10 synthetic site-years, while excluding model structural error and inter-site-year-differences in cultivar and environmental conditions in the ideal sequence and controlled cultivar-environment sequence, respectively.In both sequences we account for identification uncertainty and spatial variability within the modelled site.Figure 6 shows the trend in median NRMSE and bias with the sequential updates from one to 10, for the two synthetic sequences.While the bias and NRMSE were calculated for all parameter vectors in the posterior sample derived from the MCMC sampling method, only the median values are plotted and analysed for simplicity.
In the ideal sequence (Fig. 6i), the overall median NRMSE (Fig. 6i-a) and bias (Fig. 6i-b) are low, with many site-years exhibiting a drop in the median NRMSE below a value of 1.However, after a few sequential updates no further reduction is observed.In the controlled cultivar-environment sequence (Fig. 6ii), although most individual site-years showed a reduction in median NRMSE with the sequential updates, there were some that exhibited an increase in median NRMSE (site-years ss2_12 and ss2_15 in Fig. 6ii-a).These site-years were also characterised by low initial median prediction bias, followed by an increase in the absolute bias with sequential updates (Fig. 6ii-b).The lines and points correspond to the 10 synthetic validation site-years: ss1_11-ss1_20 from the ideal sequence and ss2_11-ss2_20 from the controlled cultivar-environment sequence.

True sequences
As a fewer number of site-years were used for validation in the true sequence as compared to the synthetic sequence, we analysed the prediction quality for each validation site-year individually, with the sequential updates.Figure 7 shows the 400 prediction quality (i.e.NRMSE and bias for all the posterior predictive samples) of the model after BSU was applied to the true sequence of site-years in Kraichgau (Fig. 7i-iii) and on the Swabian Alb (Fig. 7iv-ix).For each site-year, we plot the quality of prediction, after calibration to all preceding site-years.For example, Fig. 7(vi) shows the performance metric for site-year 6_2013 after the model was calibrated first to 6_2010, then to 6_2010 and 5_2011 and finally to 6_2010, 5_2011 and 5_2012, respectively (blue box-plots from left to right).As a reference, the performance metric derived from calibrating the model to the target site-year, namely 6_2013 in Fig. 7(vi), is shown as the leftmost result (redgrey box-plot) of each sequence.It is clear that this calibration always yields the best performance metrics for the given data.While the NBias was calculated for all parameter vectors in the posterior MCMC sample, only the median values of the absolute NBias are also plotted to compare the trends between NRMSE and NBias with the sequential updates.
The NRMSE is expected to decrease with the inclusion of more site-years for calibration.This holds true in the case of Kraichgau, where mid-early cultivars were grown (Fig. 7ii, iii), but in hardly any case on Swabian Alb (Fig. 7iv-ix).We also expected the prediction quality to improve when a calibration sequence is made up of the same cultivar or ripening group.Note, however, the poor prediction quality in Fig. 7(iv) and the increase in NRMSE with the inclusion on 5_2011 in the calibration sequence in Fig. 7(ix).Additionally, the prediction quality for the early cultivar at 5_2016 (Fig. 7viii) deteriorates on the inclusion of the same cultivar grown at 5_2015 in the calibration sequence.In all predictions, the absolute NBias follows a similar trend as the NRMSE.Note that there is a difference in the performance metrics between the different siteyears when the model is directly calibrated to the target site-year (redgrey box-plots in Fig. 7).The three site-years in Kraichgau and site-years 5_2011, 5_2012, 5_2015, and 6_2016 in Swabian Alb exhibit good to moderate calibration quality, while 6_2013 and 5_2016 have moderate to poor calibration quality.

Discussion
In this study, we aimed to analyse whether progressively incorporating more data through Bayesian sequential updating (BSU) reduces model parameter uncertainty and produces robust parameter estimates for predicting phenology of silage maize.

Parameter uncertainty
Bayesian calibration resulted in reduced posterior parameter uncertainty in comparison to the initial prior ranges that were guided by expert knowledge (Fig. 4iii).The uncertainty in parameter DELTOPT2, reduced as seen from the narrowing of the marginal posterior distributions (Fig. 5).The remaining parameters did not show a consistent progressive reduction in uncertainty with the sequential updates.They also had a relatively higher correlation to the other parameters (Fig. 4i).The lack of uncertainty reduction may be due to equifinality, meaning that multiple parameter combinations produce the same output (Adnan et al., 2020;He et al., 2017b;Lamsal et al., 2018).The reduction in information entropy of the posterior parameter distributions after the sequential updates (Fig. 5ii) confirms the reduction in overall parameter uncertainty.
The optimum temperatures for vegetative (TOPTDEV1= TMINDEV1 + DELTOPT1) and reproductive (TOPTDEV2 = 8 + DELTOPT2) development are lower than our prior belief.The effective sowing depth (SOWDEPTH) is higher than the actual sowing depth of 3-5cm as the model cannot capture slow emergence (as discussed in the Appendix A: SPASS phenology model).In Kraichgau, the posterior distributions for SOWDEPTH and minimum temperature for vegetative development (TMINDEV1) did not change significantly as compared to the prior, indicating that the model did not learn much from the data.These parameters, however, show a change from the prior in the Swabian Alb.Kraichgau is warmer than the Swabian Alb.On most days, temperatures in Kraichgau are above the minimum temperature for vegetative development (TMINDEV1), resulting in limited learning.A similar reasoning applies to SOWDEPTH which is a proxy parameter that impacts emergence rate.Emergence occurs only above a certain threshold temperature which is hard-coded in the model.Temperatures in Kraichgau are mostly above this threshold temperature for emergence, resulting in limited learning and insignificant change from the prior distribution.In the Kraichgau sequence (Fig. 5i-b), PDD1 and DELTMAX1 decrease when site-year 1_2014 is added to the calibration sequence.Both parameters cause a faster development rate during the vegetative phase.This faster vegetative development results in earlier initiation of the reproductive phase, as seen in the mid-early ripening cultivar 1_2014 as compared to the late cultivars 3_2011 and 2_2012.In the Swabian Alb sequence (Fig. 5i-a), inclusion of early cultivars at 5_2012 and 5_2016 results in shallower SOWDEPTH and consequently, faster emergence.However, whether this early emergence is truly a feature of early cultivars or a consequence of the timing of first observations in the growing season cannot be satisfactorily distinguished with the available data.The physiological development days at optimum vegetative phase temperature (PDD1) were also lower than our initial prior belief.We, however, interpret these results with caution as parameters may compensate for model structural errors and some parameters are correlated (Alderman and Stanfill, 2017).

Prediction quality
We analysed synthetic sequences to assess whether a consistent reduction in prediction error is achieved when more siteyears are available for calibration, in the absence of model structural errors (ideal sequence), and in the absence of inter-siteyear-differences due to cultivars and environmental conditions (controlled cultivar-environment sequence).For the ideal sequence we used simulated phenology and added a random noise term that represents spatial variability and identification error.For the controlled cultivar-environment sequence we used the observations instead of simulated phenology to generate the dataset.Hence, in the latter sequence, there is not only random noise but also a model structural error component.As the noise and model error components cannot be resolved, the estimated model parameters compensate for both, leading to larger prediction errors (Fig. 6ii).
In the ideal sequence, the model was able to accurately simulate the observations, the only source of between-site-yearvariability being within-site spatial variability and identification uncertainty.The overall initial prediction quality was moderate to good, indicating that when there was no model structural error, the calibrated model was able to predict for the individual site-years represented by redgrey box-plots in Fig. 7, show that the model is able to simulate some siteyears better than others.Residual analysis (supplementary material 3) showed that the model was unable to capture the slow development during the vegetative phase for these site-years with poorer calibration quality.This could be due to model limitations (that is, model equations or hard-coded parameters) and could explain the correlation between temperature and prediction quality.
The single site-year predictions showed that the mid-early cultivar grown at 1_2014 and 2_2014 were the best predictors of each other and their prediction by the late cultivar at 3_2011 was poorer.Therefore, in case of the Kraichgau sequence (Fig. 7ii-iii) we observed a decrease in prediction error as we progressively calibrated the model to 3_2011, to 3_2011 and 2_2012, and to 3_2011, 2_2012 and 1_2014.In the Swabian Alb sequence (Fig. 7iv-ix) where mid-early and early cultivars are grown, the effect of different ripening groups and temperatures caused an increase in prediction error.
In real-world conditions represented by the true sequences, the prediction quality thus depends on the interplay between model limitations and inherent data structures presented in the differences between maturity group and cultivars.Since the model calibration and prediction quality varies with environmental factors, it highlights the need to better account for the influence of these environmental drivers in the model.This would increase model transferability to other sites.This could be best achieved by improving the process representation in the model and by including the uncertainty in forcings during calibration.An alternative approach would be to define separate cultivar-and environment-specific parameter distributions.
It is common practice to determine cultivar-specific parameters in crop modelling (Gao et al., 2020).He et al., (2017b) found that data from different weather and site conditions are required to obtain a good calibrated parameter set for a particular cultivar.Improved crop model performance has been reported upon the inclusion of environment-specific parameters in calibration (Coelho et al., 2020).Cultivar-or genotype-and environment-specific parameters already exist in some models (Jones et al., 2003;Wang et al., 2019).However, these genotype parameters have also been found to vary with the environment, indicating that they may represent genotype × environment interactions and not fundamental genetic traits (Lamsal et al., 2018).Further analysis of calibrated model parameters and model performance metrics with respect to environmental variables would provide insights into areas for model improvement.Nonetheless, the cultivar and environmental-dependency of parameters is a major drawback for large-scale model applications and long-term predictions, as information on crop cultivars is usually not available on regional scales and specific characteristics of future cultivated varieties are currently not known.While the collectionunknown.Collection of cultivar and maturity group information in official surveys is essential.Furthermore, other Bayesian approaches, such as hierarchical Bayes, which allow for the incorporation of this information during calibration, should be explored.Model calibration in a Bayesian hierarchical framework would enable inherent data structures, represented by the cultivars within ripening groups of a particular species, to be accounted for.Additionally, differences in environmental conditions can also be represented.On regional scales, where information about maturity groups and cultivars areis unavailable, accounting for environmental effects alone may still prove to be beneficial.A Bayesian hierarchical approach could potentiallyeven be applied to predict the growth of current as well as future cultivars.

Limitations
We would like to draw attention to the three assumptions in the current study which might cause an underestimation of uncertainties.First, the standard deviation of the likelihood model was not estimated, but assumed to be known and equal to the sum of observed spatial variability and identification error.It represents the minimum error and is equal to the total error only whenif there are no differences in environmental conditions and cultivars across the site-years.Second, the likelihood model was assumed to be centred at zero, which only holds true when there are no structural errors.In most cases, however, model structural errors and other systematic errors canwill exist, which couldmay result in much larger errors than what was assumed.Third, the errors are assumed to be independent and identically distributed.A violation of this assumption can lead to underestimation of uncertainty in the parameters and the output state variable (Wallach et al., 2017).In the residual analysis of the sequential updates with 3 or more site-years, a slight deviation from a Gaussian distribution was observed (supplementary material 3).This skewness was caused due to model limitations, that is, its inability to capture the slow development observed during the vegetative phase in some site-years.Autocorrelation of errors can exist for state variables like phenology that are based on cumulative sums.However, based on the limited dataset an autocorrelation in the errors could not be substantiated and an in-depth analysis is far beyond the scope of this study.
We observed that the posterior parameter distributions were at the margins of the initial prior distribution ranges, for which this study now provides a basis to update this prior belief.This considerable update of the parameter prior indicates that either the prior ranges are not suitable for the cultivars in this study, or that the parameters are compensating for structural limitations of the model.Further in-depth investigation of their potential contributions could only be achieved onwith datasets that are much larger datasets than the one at handemployed here.

Conclusions
Through a Bayesian sequential updating (BSU) approach, we extended a classical application of Bayesian inference through time to analyse its effectiveness in the calibration and prediction of a crop phenology model.We assessed whether BSU of the SPASS model parameters, based on new observations made in different years, progressively improves prediction of phenological development of silage maize.
We applied BSU to synthetic sequences and true sequences.As expected, the parameter uncertainty reduced in all sequences.
The prediction errors reduced in most cases in the synthetic sequences, where we had an ideal model that was able to accurately simulate observations, and where the model could contain structural errors but the dataset contained only a single maize cultivar grown under the same environmental conditions.In the ideal synthetic sequence, the prediction quality was variable for the first few sequential updates.The prediction error then reduced in both synthetic sequences until it approached an irreducible value.In the true sequences however, thatwhich included cultivars from different ripening groups and environmental conditions, the prediction quality deteriorated in most cases.Differences in ripening group and temperature during the vegetative phase of growth between the calibration and prediction site-years influenced prediction quality.
With increasing amount of data being gathered and improvements in data-gathering techniques, there is a drive to use all available data for model calibration.However, our study shows that a simplistic approach of updating the model parameter estimates without accounting for model limitations and inherent differences between datasets can lead to unsatisfactory predictions.To obtain robust parameter estimates for crop models applied on a large scale, the Bayesian approach needs to account for differences not only in maturity groups and cultivars but also environment.This could be achieved by applying Bayesian inference in a hierarchical framework, which will be a subject of future work.A-7 As the cardinal temperatures are phase-specific, the temperature response function is also phase-specific.For f T,v , the cardinal temperatures are TMINDEV1 , TOPTDEV1 and TMAXDEV1 , while for f T,r , the cardinal temperatures are TMINDEV2, TOPTDEV2 and TMAXDEV2.To explain the spread in prediction NRMSE within ripening groups, we examined the relationship between NRMSE and the 700 difference in average temperature between the site-year used for calibration and the predicted or target site-year.The   2 was used in eq. ( 10) to obtain the prior probability distribution for the estimated parameters.

Figure 1 :
Figure 1: i) Location of the sites in Kraichgau (site 1, site 2 and site 3) and the Swabian Alb (site 5 and site 6) in the state of Baden-Wuerttemberg, Germany (© Google Earth 2018 modified from Eshonkulov et al., 2019) ii) Observations of phenological development (expressed in BBCH growth stages) of silage maize at site 6 are plotted against the day of the year in 2010.The red labels indicate important phenological development stages.The red points are means of the observations while the box and whiskers represent the range of replicate observations.Length of the box represents the inter-quartile range (IQR), whiskers extend from the box up to 1.5 × IQR and values beyond this range are plotted as points.Each of the boxes and whiskers are based on 50 points corresponding to observations made on the same day i.e. 10 maize plants at 5 subplots within site 6 for one day in 2010.In site-year 6_2010, observations were made on 6 days during the growing season.
θ|Y sy (n−1) ) P(Y sy n |θ) ∫ P (θ|Y sy (n−1) ) P(Y sy n |θ) dθ θ (4) This equation defines the Bayesian sequential updating (BSU) approach in which the model is calibrated in a sequential manner.New data from a site-year (Y sy n ) is used to re-calibrate the model, conditional on the prior information from previous site-years.The posterior distribution obtained from the previous Bayesian calibration P(θ|Y sy (n−1) ) is used as prior probability for calibration to the next site-year.
simulated phenological development, y d is the mean observed phenological development and σ d is the standard deviation of the observations (as defined in section 2.4.1 Likelihood function) on day d.Under the assumption of normally distributed error, the natural logarithm of the likelihood probability is inversely proportional to the normalized mean square error: ln (P(Y sy |θ i )) ∝ NRMSE sy 2 −NRMSE sy 2 .The normalized bias NBias sy =

Figure 2 :
Figure 2: The site-years used for calibration and validation in each sequential update for the two synthetic sequences namely, ideal and controlled cultivar-environment, and the two true sequences for Kraichgau and the Swabian Alb are shown.In the synthetic sequences, a total of 10 updates were done by sequentially adding 1 through 10 site-years to the calibration dataset.After each update, prediction quality was analysed for a set of 10 validation site-years.A total of 3 sequential updates in Kraichgau and 6 sequential updates in the Swabian Alb true sequences were analysed.In the sequential updates for the true sequences, a site-year was included for calibration, following the actual chronological order of growth.The remaining site-years grown in the region were then used for validation.

Figure 3 :
Figure 3: Synthetic site-year observations used for calibration and prediction in (i) the ideal and (ii) controlled cultivarenvironment synthetic sequences.The red boxpink boxes and whiskers represent the range of values for the 10 synthetic site-years used for calibration while the blue boxboxes and whiskers represent the range of values for the 10 site-years used for validation.Length of the box represents the inter-quartile range (IQR), whiskers extend from the box up to 1.5 × IQR and values beyond this range are plotted as points.

Figure 4 :
Figure 4: Results of Bayesian calibration of the model to phenological development (BBCH stages) in site-year 6_2010.(i) Crossplot of the six posterior samples of the six estimated parameters.Red represents high density and blue low density.(IDPmisc package in R (Locher, 2020)) (ii) Observed and simulated phenological development after calibration, plotted against the day of the year.The red points are the mean observations, while the black error bars indicate +/-3 standard deviations.The mean simulation is indicated by the continuous black line.The greenblue bands represent the different percentiles of simulated phenology.Note that the simulated phenology bands only represent the uncertainty in model parameters and does not include the noise term.(iii) Prior (white) and posterior (salmon) marginal parameter distributions for the six estimated parameters.

Figure 5 :
Figure 5: (i) Marginal initial prior and posterior parameter distributions of the 6 estimated parameters plotted against the calibration site-years, after BSU was applied to a true sequence (a) on the Swabian Alb and (b) in Kraichgau.The SPASS model was calibrated to observed phenological development (BBCH).(ii) Information entropy of the joint posterior parameter distributions plotted against the calibration site-years, after BSU was applied to the true sequences.The x-axis labels from left to right indicate the initial prior parameter distribution followed by the sequential calibration of the model to an increasing number of site-years.The '+' symbol before the site-year label on the x-axis indicates the new site-year that was included in the sequential calibration.Length of the box in (i) represents the inter-quartile range (IQR), whiskers extend from the boxes up to 1.5 × IQR and values beyond this range are plotted as points.

Figure 6 :
Figure 6: (a) Median NRMSE and (b) median bias of prediction for the 10 validation site-years, after BSU was applied to the ideal (i) and controlled cultivar-environment (ii) synthetic sequences.The number of site-years used for calibration is shown on the xaxis and represents the sequential updates from one to 10.The SPASS model was calibrated to phenological development (BBCH).

Figure 7 :
Figure 7: Performance metrics for site-years in Kraichgau (i-iii) and on Swabian Alb (iv-ix), after applying BSU to the two true sequences.The SPASS model was calibrated to observed phenological development (BBCH).NRMSE and bias are plotted against the site-years used in calibration.In each sub-plot, the redgrey box-plot represents the calibration performance metric i.e. when the model is calibrated to site-year of interest.The blue box-plots represent the prediction performance metrics when the model is calibrated (from left to right) to an increasing number of preceding site-years.L, ME and E indicate the maturity group of the cultivars: late, mid-early, and early, respectively.The '+' symbol before the site-year label on the x-axis and before the maturity group label indicates the new site-year that was included in the sequential calibration.Length of the box represents the interquartile range (IQR), whiskers extend from the box up to 1.5 × IQR and values beyond this range are plotted as points.The zero bias is indicated by a red dashed line in the bias plots.The median values of the absolute NBias are represented by red horizontal linesasterisks (*) in the NRMSE plots.
f T,v (T, TMINDEV1, TOPTDEV1, TMAXDEV1) is the temperature response function (TRF) for the vegetative phase.TMINDEV1, TOPTDEV1, and TMAXDEV1 are the minimum, optimum and maximum temperatures (°C) of the vegetative development phase, respectively.The photoperiod factor is expressed as:f(h php ) = 1 − exp(−4 × (h php − dlmin)/(DLOPT − dlmin))where dlmin = DLOPT + 4/PDL A-5 h php (h) is the photoperiod length, that is, the amount of time between the beginning of the civil twilight before sunrise and the end of the civil twilight after sunset (the time when the true position of the centre of the sun is 4° below the horizon), PDL (-) is the photoperiod sensitivity and DLOPT (h) is the optimum daylength for a particular cultivar.The development rate in the generative or reproductive phase (R dev,r ) (d -1 ) only depends on temperature such that: R dev,r = R dev,r,max f T,r (T, TMINDEV2, TOPTDEV2, TMAXDEV2)A-6where R dev,r,max = 1/PDD2 is the maximum development rate in the reproductive phase ( d -1 ), PDD2 is the number of physiological development days from anthesis to maturity (d) and f T,r (T, TMINDEV2, TOPTDEV2, TMAXDEV2) is the temperature response function (TRF) for the reproductive phase.TMINDEV2 , TOPTDEV2 , and TMAXDEV2 are the minimum, optimum and maximum temperatures (°C) of the reproductive development phase, respectively.The temperature response function f T has cardinal temperatures: minimum temperature, TMINDEV (°C), optimum temperature, TOPTDEV (°C), and maximum temperature, TMAXDEV (°C): f T = 2(T − TMINDEV) α (TOPTDEV − TMINDEV) α − (T − TMINDEV) 2α (

Figure C- 1 :
Figure C-1: Median NRMSE ratio for prediction-target site-years after single site-year calibration of the SPASS model to observed phenological development (BBCH).The median NRMSE ratio on the y-axis is the ratio between the median NRMSE of temperature was averaged over an interval of 40 to 100 days after sowing (i.e.approximate vegetative phase of development).For the mid-early ripening cultivars (Fig. C-2i), the median NRMSE shows a clear correlation.Albeit tested with a limited number of site-years, early-ripening cultivars (Fig. C-2ii) show a similar trend.

Figure
Figure C-2: A cross-plot between the performance metric median NRMSE and the absolute difference in temperature between the site-year used for calibration and the prediction-target site-year, averaged over 40 to 100 days after sowing, for (i) mid-early and (ii) early ripening cultivars.Colours of the best-fit lines and points indicate the prediction-target site-year.Median NRMSE points at 0°C on the x-axis are calibration performance metrics for the target site-year while the remaining are prediction

Figure
Figure D-1: An example of the platykurtic probability density function that was used as a prior for the model parameters.The default, minimum, maximum and standard deviation values for the parameter are used to define this function.

Table 1 : Early (E), mid-early (ME), and late (L) ripening cultivars of silage maize, with their sowing and harvest dates, grown at the study sites in Kraichgau (sites 1, 2 and 3) and the Swabian Alb (sites 5 and 6) between 2010 and 2016. Region Year Site Site- year Cultivar Maturity/ Ripening group Sowing date (DD/MM/YYYY) Harvest date (DD/MM/YYYY)
The soil horizons in the model were based on these soil profile descriptions.Initial values of soil volumetric water content were based on measurements.The simulations for each site-year were started on the harvest date of the preceding crop in the crop rotation at that site.This ensured adequate spin-up time prior to the simulation of silage maize, which was sown in between April and May.