Global biogeochemical ocean models help to investigate the present and potential future state of the ocean, its productivity and cascading effects on higher trophic levels such as fish. They are often subjectively tuned against data sets of inorganic tracers and surface chlorophyll and only very rarely against organic components such as particulate organic carbon or zooplankton. The resulting uncertainty in biogeochemical model parameters (and parameterisations) associated with these components can explain some of the large spread of global model solutions with regard to the cycling of organic matter and its impacts on biogeochemical tracer distributions, such as oxygen minimum zones (OMZs). A second source of uncertainty arises from differences in the model spin-up length as, so far, there seems to be no agreement on the required simulation time that should elapse before a global model is assessed against observations.

We investigated these two sources of uncertainty by optimising a global biogeochemical ocean model against the root-mean-squared error (RMSE) of six different combinations of data sets and different spin-up times. Besides nutrients and oxygen, the observational data sets also included phyto- and zooplankton, as well as dissolved and particulate organic phosphorus (DOP and POP, respectively). We further analysed the optimised model performance with regard to global biogeochemical fluxes, oxygen inventory and OMZ volume.

Following the optimisation procedure, we evaluated the RMSE for all tracers located in the upper 100 m (except for POP, for which we considered the entire vertical domain), regardless of their consideration during optimisation.
For the different optimal model solutions, we find a narrow range of the RMSE, between 14 % of the average RMSE after 10 years and 24 % after 3000 years of simulation.
Global biogeochemical fluxes, global oxygen bias and OMZ volume showed a much stronger divergence among the models and over time than RMSE, indicating that even models that are similar with regard to local surface tracer concentrations can perform very differently when assessed against the global diagnostics for oxygen.
Considering organic tracers in the optimisation had a strong impact on the particle flux exponent (Martin

In conclusion, calibration against observations of organic tracers can help to improve global biogeochemical models even after short spin-up times; here especially, observations of deep particle flux could provide a powerful constraint. However, a large uncertainty remains with regard to global OMZ volume and its evolution over time, which can show very dynamic behaviour during the model spin-up, which renders temporal extrapolation to a final equilibrium state difficult if not impossible. Given that the real ocean shows variations on many timescales, the assumption of observations representing a steady-state ocean may require some reconsideration.

Global biogeochemical ocean models, especially when combined with data assimilation techniques, serve as useful tools to generate spatially and temporally consistent global fields of dissolved and particulate ocean tracers, such as nutrients, oxygen or organic constituents, from sparse observations.
More important, when embedded in Earth system models, they can be used to investigate the present state of the ocean and its biogeochemistry, including its productivity

The relevance and prospects of marine biogeochemical model applications seem obvious, but challenges remain in finding unambiguous solutions for the global distribution and flux of mass and in quantifying the uncertainties of respective model estimates

During model calibration, the calibration bias can have significant effects on optimal parameter estimates and may affect model performance with regard to simulated biogeochemical fluxes such as primary and secondary production.
For instance, amongst different model configurations that yield equally good fits to global nutrients and oxygen concentrations (with differences between root-mean-square errors

Another important aspect of model uncertainty relates to the models' spin-up time. Global models are started from some observed or assumed distributions of simulated tracers. However, their inherent assumptions, as expressed through the model components, equations, constants, forcing and boundary conditions, may diverge from the real world. When the model is simulated forward in time, the inherent assumptions will translate into a specific simulated tracer distribution and will likely diverge from the observations (unless we have a perfect model). The discrepancy between model and observations will change over the simulation (spin-up) time, until finally, under climatological, seasonally varying forcing, the model reaches an equilibrium or steady state. This equilibrium is characterised by a steadily repeating annual cycle, in which the tracer concentrations at all locations change from year to year by only a negligible amount. In equilibrium, the model output then reflects only the assumptions that went into the model and is usually independent of the initial tracer distribution and, in the case of tracer exchange through open boundaries, the initial tracer inventory.

Because of the slow ocean overturning circulation, it requires millennia of numerical integration to reach an equilibrated biogeochemical state on a global scale

We here investigate the driving factors that contribute to potential biases in parameter tuning and analyse their effect on global model performance with regard to several combinations of metrics, parameters to be calibrated and simulation timescales.
Initially, we optimised four parameters that determine the turnover of organic matter in the euphotic zone of a global biogeochemical ocean model and two parameters related to oxygen consumption and particle flux (with two different ranges of potential parameter values).
The choice of the six parameters was partly motivated by the results from earlier optimisations

By including all simulated tracers in the misfit function in the two initial optimisations, we aim to avoid the calibration bias mentioned above. In particular, we obviate any tendency of the optimisation procedure to reduce misfits in inorganic tracer concentrations to the disadvantage of (otherwise unconstrained) organic components, such as DOM or plankton biomass. In three further experiments, we then successively omitted observations of organic constituents and reduced the number of parameters to be optimised, thereby investigating the impact of different data types on optimisation performance.

Because changes in plankton parameters especially affect model performance, mainly at the surface, which adjusts on timescales of years

All model simulations and optimisations apply the transport matrix method

The biogeochemical model describes the cycling of phosphorus, nitrogen and oxygen in a stoichiometrically consistent manner

To assess model skill for all seven simulated tracers, we have compiled a data set of corresponding observations, as detailed in Appendix

We note that, in contrast to the inorganic tracers and phytoplankton, the data sets for organic components (particularly zooplankton, POP and DOP) are much more sparse in space and time (see also Appendix

Model misfit (i.e, the cost function applied during optimisation) was calculated by the root-mean-square error (RMSE) between simulated and observed annual mean tracers mapped onto the respective three-dimensional model geometry

In the different optimisations, we considered different combinations of data sets and spatial domains for the evaluation of the misfit function

Firstly, we calculated Eq. (

Secondly, we calculated several performance statistics such as the volume-weighted RMSE, the global bias

Using MOPS coupled to ECCO TMs,

Starting from observed inorganic tracer distributions

Because of the short spin-up time and our focus on the adjustment of parameters related to surface processes, calculations of the models' fits to nutrient and oxygen concentrations were restricted to the upper 100 m.

These five optimisations differed with respect to (i) their combinations of organic tracers considered in the misfit function and (ii) the number, type and variational ranges of parameters to be optimised (Table

In our initial optimisation, we allow the detritus sinking speed, expressed through the particle flux exponent

This setup is similar to S6

Based on S6-All, we here exclude DOP data from Eq. (

For experiment S4-SO, we reduce the number of parameters for optimisation simply by adopting the best estimates of the DOP parameters

Optimisation S4-Org tests the impact
of organic tracer data and how they can affect optimal parameter estimates and model performance in general.
The setup of S4-Org is similar to S4-SO, but it excludes observations of organic tracers everywhere, which leaves only surface data (0–100 m) of nutrients and oxygen to enter Eq. (

We finally investigate the impact of deep nutrient and oxygen concentrations on the optimal model solution by extending the model spin-up time to 3000 years and then consider global nutrients and oxygen in the evaluation of the misfit function.
Otherwise, the setup is the same as for S4-SO, i.e. with DOP and phytoplankton data excluded south of 40

The six models setups with the respective optimised parameter sets were analysed at two time slices, namely after 10 and 3000 years of spin-up. With these cross-validation model runs, we investigate the impact of spin-up time on the model performance on various timescales. In particular, we extend all model runs of the optimal S scenarios to 3000 years; likewise, we also analyse the model performance of L4-SO after 10 years of simulation. For a consistent comparison of model performance after optimisation we always include all tracers in the different metrics, regardless of the contribution to the misfit during optimisation.

Experimental setup of different model runs and optimisations: range of parameters and data sets included in Eq. (

Optimisations were carried out using an evolutionary estimation of distribution algorithm, namely the covariance matrix adaption evolution strategy

We first evaluate the performance of the optimisation procedure and compare the optimised model solutions against the best solution obtained with ECCO

The different optimisation setups generate parameter estimates that are all distinct from those originally derived for the reference ECCO

Results of different model optimisations: number of generations

The efficiency of the optimisation procedure can be deduced from the number of generations required for convergence (

Regardless of the optimisation setup, values of the minima of the normalised misfit function

To summarise, considering the trade-off between simulation time (S vs. L setups), number of iterations required (

Improved fits to surface nutrients are achieved from the optimisations, exhibiting a substantial reduction in

Components of

The large remaining misfit can be explained by a lack of spatial correlation between observed and modelled patterns, as is evident from Taylor diagrams in Fig.

Taylor diagrams for
annual mean oxygen (circles), phosphate (squares), nitrate (diamonds), phytoplankton (stars), zooplankton (triangles), DOP (pluses) and POP (inverted triangles) after 10 years

Simulated organic tracers exhibit a spatial variability

All metrics (

Compared to S4-SO, optimisation L4-SO combined two changes to the misfit function, namely a change of spin-up length before evaluation of the misfit function and the consideration of deep inorganic tracers.
The combination of these two changes was necessitated by the long timescales of deep-ocean processes and circulation, which require a long spin-up to induce effects on deep tracers.
To disentangle the effects of these changes, one could (i) either consider deep inorganic tracers after a short spin-up time or (ii) spin-up the model over 3000 years and apply the same metric as in S4-SO (surface nutrients and organic tracers).
For the first case (i), we note that, after a short spin-up time, simulated deep tracers would still be very near the model's initial condition, resulting in a misfit that is less informative and not very sensitive to changes in model parameters.
Considering the alternative case (ii), an extended spin-up time applied with the same metric as in S4-SO may likely result in surface metrics similar to those obtained with a short spin-up time. This is indicated by Figs.

Figure

Summary of optimal model setups after 10 years of simulation for different metrics (

The metrics for surface nutrients obtained in our experiments agree well with those of other global model studies (see Table S3 in the Supplement), in particular with regard to the high correlation coefficient (between 0.93–0.96), small bias and a normalised variance around 1.
Global model studies also sometimes report model performance with respect to chlorophyll but not always in the same way.
Some studies report a correlation of log-transformed chlorophyll between 0.6 and 0.7

For organic tracers other than chlorophyll, an evaluation of model skill has been carried out less often.

After fitting a global model against observed DOC, DON and DOP,

In summary, our model representations of surface nutrients and DOP benefit most from optimisation, yet much of the improvement, especially for DOP, is due to an extensive reduction of the bias.
The pattern error, as expressed through RMSE' or

A potential solution to this problem could be to apply optimisation to a coupled physical–biogeochemical model that more realistically resolves the physical environment at high spatial and temporal scales. However, such a model could hardly be simulated at a global scale over more than a few decades. In addition, even a model that resolves eddies and filaments well may not exactly reproduce the position and timing of mesoscale physical features such as eddies and filaments, again with consequences for biogeochemical variables and hence the pattern-matching statistics such as RMSE' or correlation.

A misfit function that involves point-wise, local data–model residuals, such as the

After a spin-up of 10 years, all experiments except S4-Org exhibit lower global primary production than the observational estimates (Fig.

Global annual biogeochemical fluxes (Pg C yr

The high maximum grazing rates (Table

In contrast to primary production and grazing, global export production is similar in all model simulations (Fig.

Global particle flux at 2000 m simulated by our different model setups depends strongly on the optimal value of

The range in particle flux profiles applied in the different model setups also affects the (particle) transfer efficiency TE, as calculated by dividing the simulated particle flux at 1000 m by the particle flux at a depth of 100 m.
Dividing global mean particle flux at these two depths, our model simulations with

We note that, when diagnosing TE from simulated model fluxes, a considerable fraction of the transfer of particulate organic matter is due to processes other than particle sinking.
While theoretically it seems straightforward to derive the (nominal) TE directly from the applied particle flux profile (for example,

Given the many assumptions that go into observed global estimates of deep particle flux and the potentially strong effect of this flux on transfer efficiency and carbon storage, we additionally examined simulated particle flux against three different data sets of sediment trap observations (see Sect.

Taylor diagrams for simulated particle flux after a spin-up of 10 years

Thus, for model setups with

On long timescales, nutrient and oxygen concentrations in the deep ocean are mainly determined by the value assigned to the particle flux exponent

Taylor diagrams for annual mean phosphate

In addition to tracer distributions, the global nitrate and oxygen inventory may also be affected by changes in model parameters

The bias range found in our study is similar to that of many models analysed by

Oxygen bias

Differences in global oxygen bias between the model setups (Fig.

Oxygen bias

The global OMZ volume bias is more sensitive to parameter changes and spin-up length than global oxygen bias.
After 3000 years and for a criterion of 50 mmol O

Like global average oxygen, the metric pattern of the model setups does not resemble any of the other metrics such as RMSE or global biogeochemical fluxes (compare Fig.

After 3000 years, the OMZ bias seems to depend mostly on the oxygen demand of remineralisation

Yet, even after 3000 years, some models have not reached equilibrium with respect to the OMZ volume bias (Fig.

As shown above, the optimised models exhibit a considerable spread among parametric model setups and also differ in

To obtain a common scale, the range of each diagnostic has been normalised by the respective average before evaluating the minimum and maximum.
This approach allows us to evaluate the potential variability (with regard to parametric setup and time) of each diagnostic in comparison to other diagnostics (as indicated by the rectangles' position on the

Graphic summarising the different sources of variability in model results caused by model setup (parameter sets) and spin-up time before model analysis for biogeochemical fluxes, oxygen inventory and OMZ volume.
The

The spread between parametric model setups (see also

The sensitivity to spin-up length (lower and upper borders of the rectangles in Fig.

As discussed above, the very large value of

In contrast to the biogeochemical fluxes, the maximum variation in OMZ volume over time remains the same because the reduced ensemble still includes setup S4-SO, which shows the largest amplitude of the model trajectory (Fig.

An even larger variability exists for global models that also differ in circulation and/or model complexity (

Our analysis of the different sources of model variability has so far focused on biogeochemical fluxes and oxygen.
It remains to be investigated if this analysis is applicable to tracers with different boundary exchanges and residence times, such as the carbon cycle, which is driven by an atmospheric signal and characterised by different residence times.
Yet, at least after 250 years, tracers related to the carbon system may still show a considerable drift that is in the range obtained for oxygen and nitrate, which warrants a careful consideration and potential correction when analysing and comparing model results

We applied different optimisation strategies to calibrate a global biogeochemical ocean model against observed inorganic and organic tracers.
Optimal parameter sets diverge strongly with regard to the particle flux exponent

Despite the large range of some optimal model parameters, the resulting model solutions yield similar values of the normalised, volume-weighted root-mean-squared error (

Since the root-mean-squared error combines bias and pattern error information, major improvements in model performance may not be well reflected by this metric during optimisation.
For example, errors in circulation can cause a large pattern error, especially for sparse and episodic observations of organic tracers.
These errors cannot be further reduced by any adjustment of biogeochemical parameter values.
One way to examine the effects of circulation errors on the metric's ability to resolve optimal model parameters would be to calibrate the model against synthetic data derived from an earlier simulation (an identical-twin experiment).
Such an approach could provide deeper insight into the importance of organic tracers for model calibration and, if combined with different sampling strategies of the pseudo-data, also into the impact of data sparsity.
However, the pseudo-data will typically be representative for one particular model setup and may not reflect the full misfit sensitivity over a range of model characteristics.
A second possible approach could involve the development and application of a an alternative misfit function.
Such a metric could include an assessment of the tracers' observed and simulated statistical properties within specified ocean regions instead of calculating local point-wise residuals as in the root-mean-squared error

Diagnostics such as global primary production and grazing, global and local particle flux, oxygen bias, and OMZ volume show a large divergence among the models and over time that is not reflected by differences in RMSE.

Global primary production and grazing exhibit a similar pattern of model performance after 10 and 3000 years of spin-up, but the difference between the model setups increases over the long spin-up, eventually becoming almost twice as high as the observational uncertainty of 40 Gt C yr

Global export production is very similar among the optimal model setups, likely because all model runs presented here applied the same circulation and because of the antagonistic effects of particle sinking and nutrient supply from subsurface waters. The variation of 14 % to 17 % found in our study is less than one-fourth of the spread among other GCMs (which differ in many aspects, such as circulation) and is much lower than the spread of observational estimates.

Owing to the wide range of optimal estimates for parameter

Global average oxygen changes by up to 12 % over time for the individual models, and the difference among the model setups is strongly amplified after 3000 years of simulation, when it reaches almost 22 % of the ensemble mean.
A large fraction of this variation can be attributed to values of particle flux parameter

Global OMZ volume, when defined by O

The dependence of simulated global biogeochemical fluxes on tuning strategy and the resulting model parameters can have consequences for models of higher trophic levels, such as fish, that often rely on primary production and/or export of organic matter to the mesopelagic and deep ocean.
The interactions between biogeochemistry and fish may be further complicated through feedback effects of fish on biogeochemical features such as oxygen distributions

Overall, the best performance with regard to oxygen and OMZ volume was obtained by tuning strategies that either apply long (millennial) timescales of simulation or include observations of organic tracers in the misfit function. Given the computational expense of long-term simulations and the likely dependence of global oxygen distribution on particle flux to the deep ocean, we speculate that well-confined observational estimates of particulate organic matter flux to the ocean interior may help to constrain global models, even when these are spun up for only a few decades or centuries.

For model calibration of phytoplankton, we used chlorophyll data derived from remote sensing

For model calibration of zooplankton we used the MAREDAT data set of mesozooplankton

There is no direct observational equivalent to simulated detritus; the nearest type of observations are probably those of particulate organic phosphorus (POP), nitrogen (PON) or carbon (POC).
However, due to the methods applied, these observations also contain phytoplankton and possibly a fraction of smaller zooplankton.
For model evaluation, we downloaded the data set by

Most observations of dissolved organic phosphorus (DOP) have been compiled by Angela Landolfi.
They include data from cruises 36N, AMT10, AMT12, AMT14, AMT15, AMT16 and AMT17

Observational data: number of observations over the full model domain and for the upper 100 m only, minimum and maximum concentration, and average concentration over full domain and upper 100 m.
See Sect.

The data set by

The basic TMM and MOPS code used for the ocean biogeochemical simulations are available to download from

The supplement related to this article is available online at:

IK designed and carried out the experiments. AL provided the data for DOP. IK prepared the paper with contributions from all the co-authors.

The contact author has declared that none of the authors has any competing interests.

Publisher’s note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work is a contribution to the BMBF joint project CO2Meso (project no. FKZ 03F0876A) and to RESCUE (EU grant agreement no. 101056939). Parallel supercomputing resources have been provided by the North-German Supercomputing Alliance (HLRN). The authors wish to acknowledge use of the Ferret programme of NOAA's Pacific Marine Environmental Laboratory for the analysis and graphics in this paper. We thank Jörg Schwinger and an anonymous reviewer for their very helpful and constructive comments on the paper.

This research has been supported by the Bundesministerium für Bildung, Wissenschaft, Forschung und Technologie (grant no. 03F0876A) and the European Commission, HORIZON EUROPE Framework Programme (grant no. 101056939).The article processing charges for this open-access publication were covered by the GEOMAR Helmholtz Centre for Ocean Research Kiel.

This paper was edited by Marilaure Grégoire and reviewed by Jörg Schwinger and one anonymous referee.