One size ﬁts all? - Calibrating an ocean biogeochemistry model for different circulations

. Global biogeochemical ocean models are often tuned to match the observed distributions and ﬂuxes of inorganic and organic quantities. This tuning is typically carried out “by hand”. However, this rather subjective approach might not yield the best ﬁt to observations, is closely linked to the circulation employed, and thus inﬂuenced by its speciﬁc features and even its faults. We here investigate the effect of model tuning, via objective optimisation, of one biogeochemical model of intermediate complexity 5 when simulated in ﬁve different ofﬂine circulations. For each circulation, three of six model parameters have adjusted to characteristic features of the respective circulation. The values of these three parameters – namely, the oxygen utilisation of remineralisation, the particle ﬂux parameter and potential nitrogen ﬁxation rate — correlate signiﬁcantly with deep mixing and ideal age of NADW and the outcrop area of AAIW and SAMW in the Southern Ocean. The clear relationship between these parameters and circulation characteristics, which can be easily diagnosed from global models, can provide guidance when 10 tuning global biogeochemistry within any new circulation model. The results from 20 global cross-validation experiments show that parameter sets optimised for a speciﬁc circulation can be transferred between similar circulations without losing too much of the model’s ﬁt to observed quantities. When compared to model intercomparisons of subjectively tuned, global coupled biogeochemistry-circulation models, each with different circulation and/or biogeochemistry, our results show a much lower range of oxygen inventory, OMZ volume and global biogeochemical ﬂuxes. Export production depends to a large extent 15 on the circulation applied, while deep particle ﬂux is mostly determined by the particle ﬂux parameter. Oxygen inventory, OMZ volume, primary production and ﬁxed nitrogen turnover depend more or less equally on both factors, with OMZ volume showing the highest sensitivity, and residual variability. These results show a beneﬁcial effect of optimisation, even when a biogeochemical model is ﬁrst optimised in a relatively coarse circulation, and then transferred to a different, ﬁner resolution circulation model. Parallel supercomputing resources the North-German Supercomputing (HLRN). We Biogeochemical Modelling GEOMAR discussions many helpful Ferret program for analysis and in this We two very helpful


Introduction
Global models of marine biogeochemistry are applied to prognostic problems, such as the future exchange of CO 2 between ocean and atmosphere, the evolution of oxygen minimum zones (OMZs) under a changing climate, or future primary production, which is the ultimate food source for fish. Unfortunately, in steady state, these models vary greatly in their representation et al., 2014), it would be desirable to apply optimisation directly to the more highly resolved models applied in prognostic simulations. Yet, to date this approach has proven to be prohibitive, due to the large computational demand mentioned above. 60 Therefore, we currently must accept a trade-off between finely resolved representation of physical transport processes, and well-tested and objectively optimised biogeochemical models. The calibration of model biogeochemistry in one computationally cheap circulation may elucidate the model's behaviour in its (biogeochemical) parameter space, and indicate a best set of parameters consistent with observed tracer fields. If the mean transport simulated by models were independent from model resolution, one could then transfer these parameters into the more expensive, high-resolution model. However, like biogeochemical models, physical parameterisations also reflect an idealised system, which can introduce errors in small-and large-scale patterns and processes. Biogeochemical model calibration is affected by these physical errors -the resulting optimal parameters can thus strongly depend on the circulation applied (Löptien and Dietze, 2019). So far it is not clear how model dynamics and performance will change, once these calibrated parameters are transferred to a different circulation, that resolves physical processes in more detail. 70 To investigate the mutual effects of circulation, biogeochemical model parameters, and model performance we have tested the effect of five different circulations on the objectively optimised parameters of a biogeochemical model. The ocean models differ in resolution as well as physical forcing and dynamics. Biogeochemical model calibration against nutrients and oxygen was carried out using a quasi-evolutionary algorithm, which carries out a dense scan of a six-dimensional, biogeochemical parameter space. Differences in optimal parameters are discussed before the background of large-scale physical properties. 75 In portability experiments we examine model performance when parameters optimal for one circulation are transferred into another circulation. We finally quantify the effects of changes in parameter sets vs. those of circulation on global quantities such as oxygen inventory, OMZ volume, and global biogeochemical fluxes.
2 Models, experiments, and optimisations 2.1 Circulation and physical transport 80 All model simulations and optimisations apply the Transport Matrix Method (TMM; Khatiwala, 2007Khatiwala, , 2018 as an efficient "offline" method for ocean passive tracer transport. The TMM represents advection and mixing in the form of transport matrices that have been calculated from an ocean circulation model simulation prior to the biogeochemical simulations performed here. For our model simulations we apply monthly mean transport matrices (TMs), as well as monthly wind speed, temperature and salinity for air-sea gas exchange. One set of TMs and forcing have been derived from a 2.8 global configuration of the MIT 85 ocean model with 15 vertical levels (Marshall et al., 1997, see also Table 1). Using this rather coarse spatial grid and a time step length of 1/2 day for tracer transport, a model setup with seven tracers can be integrated for 3000 years in ⇡ 1-1.5 hours on 4 nodes of Intel Xeon Ivybridge at the North-German Supercomputing Alliance (www.hlrn.de). This circulation is hereafter referred to as MIT28. For the second physical model configuration (hereafter referred to as ECCO) we apply TMs derived from a circulation of the Estimating the Circulation and Climate of the Ocean (ECCO) project, which provides circulation fields that 90 yield a best fit to hydrographic and remote sensing observations over the 10-year period 1992 through 2001 with a horizontal resolution of 1 ⇥ 1 and 23 levels in the vertical (Stammer et al., 2004). A full spinup (3000 years) of the coupled model requires about 9 hours on 16 nodes of Intel Xeon Ivybridge.
Finally, three sets of transport matrices have been derived from version 2.9 of the University of Victoria Earth System Climate Model (UVic ESCM; hereafter called "UVic"; Weaver et al., 2001), a coarse-resolution (1.8 ⇥3.6 ⇥19 vertical layers) ocean- 95 atmosphere-biosphere-cryosphere-geosphere model. TM extraction was carried out as described by Kvale et al. (2017). One set of TMs is identical to that described in Kvale et al. (2017), and includes tidal mixing and a high-mixing scheme in the Southern Ocean, as well as an increased low latitude isopycnal diffusivity (configuration "UHigh"). This configuration utilizes a vertical diffusion coefficient of 0.43 cm 2 s 1 to stabilise meridional overturning in its linear, 3 rd order upwind-biased advection scheme (UW3, Holland et al., 1998;Griffies et al., 2008). This vertical diffusion coefficient is more than double the "standard" value of 100 0.15 cm 2 s 1 , typically used with the UVic ESCM configured with the default 1 st order flux corrected transport (FCT, Weaver and Eby, 1997) advection scheme. Reasons for the change in advection scheme for the application of the TMM to UVic are given in Kvale et al. (2017), but it is important to note here that the UW3 configuration has not benefitted from the more than two decades of careful parameter adjustments that users of the FCT configuration appreciate. The annual maximum global meridional overturning strength in the UHigh configuration is 18.5 Sv, but other physical features of the circulation have not 105 been previously assessed in detail. Two further sets of UVic ESCM TMs do not include regional adjustments to mixing. They have been tuned to a maximum annual average global overturning circulation of either 20 Sv (named U20) or 17.5Sv (named U17.5). This tuning was achieved by adjusting the vertical diffusion coefficient (0.409 cm 2 s 1 in U17.5, and 0.4179 cm 2 s 1 in U20). The physical circulation parameterisation is otherwise identical to UHigh; utilising UW3 advection and the same tidal mixing scheme. Differences arising in the calibrations between UVic ESCM TMs therefore reflect both differences in the 110 application of regional mixing "corrections" (UHigh versus U17.5 and U20), as well as in global overturning strengths and secondary effects from changed values of the vertical diffusion coefficient. We note that none of the circulation configurations, aside from UHigh, have been previously evaluated against the most commonly used UVic ESCM FCT configuration (e.g., Weaver et al., 2001;Schmittner et al., 2005;Somes et al., 2013).

Properties of circulation models
mixed-layer depths derived from observed temperature and salinity. On the other hand, only this circulation exhibits deep mixing in the Labrador Sea, which is in agreement with observations. All configurations of UVic exhibit a too large area of deep mixing in the Southern Ocean, while ECCO tends to underestimate mixing in this area.
Differences between circulations are also reflected in the global distributions of ages diagnosed from the models. MIT28, U20 and U17.5 show very old (> 1400 years) waters in the deep northern North Pacific ( Figure 3). Here, ECCO and UHigh 130 contain much younger waters, mostly below 1400 years. In MIT28 the age increases rapidly with depth in the Southern Ocean (up to more than 800 years). Especially ECCO, but also UVic exhibit much younger waters below 2000 m in this region.
Finally, the U20 and U17.5 configurations result in too old deep waters in the northern North Atlantic (Khatiwala et al., 2012, their Fig. 4). In general, ECCO and UHigh agree much better with mean age constrained with transient (radiocarbon, CFCs) and hydrographic (temperature, salinity, nutrients, oxygen) tracer observations .

135
To summarise, our applied offline circulations for MIT28 and ECCO differ strongly with respect to resolution and many global physical properties, with the UVic configurations in between these two. As a data-constrained circulation, ECCO shows the best overall agreement with observations of all five circulations.

Derived indicators of circulation
The underlying circulation models, from which the TMs and forcing were extracted, differ in many aspects, such as parame-  Tuerena et al., 2015). Given that the models differ so strongly with respect to the surface density in the Southern Ocean (Figure 1), we evaluated the outcrop area of waters defined by a density of 26.5  ✓ < 27.5 and 27.5  ✓ in both the Southern Ocean and the northern North Atlantic (defined as above). In the Southern Ocean the first criterion approximately 165 reflect SAMW and AAIW combined, and the second criterion Circumpolar Deep Water (CDW) (similar to the definitions used by Palter et al., 2010;Iudicone et al., 2011;Rafter et al., 2012Rafter et al., , 2013. North Atlantic waters defined by densities of 26.5  ✓ < 27.5 and ✓ 27.5 coincide mainly with the region between 40 N-60 N and the Greenland Sea, respectively ( Figure 1). Age of water masses: We use the concept of water mass (or ventilation) age as a diagnostic for the combined effects of 170 ocean circulation, mixing and ventilation on the time elapsed since a water parcel has been isolated from the atmosphere.
We distinguish between average ideal age in three different water masses by applying the criteria of Matsumoto et al. (2004).
According to their water mass definitions, North Atlantic Deep Water (NADW) comprises all waters in the North Atlantic between 0 and 60 N and 1500-2500 m depth. North Pacific Deep Water (NPDW) is defined for a region between 0 and 60 N and 1500-5000 m depth. Finally, Circumpolar Deep Water (CDW) consists of all waters south of 45 S, for a depth between 175 1500-5000 m (see also Figure 3). We note that, using these region definitions, the average age of NADW is influenced by waters of the Eastern Tropical Atlantic (ETA), which are quite old in ECCO circulation (> 400y), and young in UHigh (Figure 3, left panels). One reason for this could be different rates of overturning, which is between 13 to 14 Sv in ECCO (together with a weak western boundary current; Wunsch and Heimbach, 2006), while the UVic configurations are characterised by higher overturning around 17.5 to 20 Sv. Finally, because the eastern tropical Pacific (ETP) is the main region of fixed nitrogen loss 180 in the models (Kriest and Oschlies, 2015), but the water age in this region varies strongly among the circulations applied in our study ( Figure S2), we also calculated average age in the Eastern Equatorial Pacific (ETP) between ±20 latitude, east of 160 W, and within 150-500 m depth as a fourth potential indicator.

The biogeochemical model
For all model simulations we apply the Model of Oceanic Pelagic Stoichiometry ("MOPS"), which simulates the biogeochem-185 ical cycling among phosphate, phytoplankton, zooplankton, dissolved organic phosphorus (DOP) and detritus. The model is described in detail in Kriest and Oschlies (2015), and we here only give a brief overview on model structure, with focus on parameters (and processes) affected by optimisation (see below Suboxic remineralisation (denitrification) also follows a saturation curve for the oxidant nitrate, defined by the half-saturation constant for nitrate, K DIN [mmol m 3 ]. The model assumes immediate coupling of the different processes involved in nitrate reduction to dinitrogen, following the stoichiometry derived by Paulmier et al. (2009) . With a constant degradation rate r = 0.05 d 1 , in equilibrium this is equivalent to a depth-dependent particle flux curve, corresponding to a power law: Kriest and Oschlies, 2008). Depending on the rain rate to the sea floor, a fraction of detritus deposited at the bottom of the deepest vertical box is buried in some hypothetical 200 sediment. Non-buried detritus is resuspended into the deepest box of the water column, where it is treated as regular detritus.
The global annual burial of organic phosphorus and nitrogen is resupplied in the next year via river runoff (when simulated with MIT28 or ECCO TMs) or at the ocean surface (TMs derived from UVic). We note that in uncalibrated models (e.g., Kriest and Oschlies, 2015) this creates differences of a few percent in the global inventory of nitrate and oxygen. Differences in the regional distribution of nutrients and oxygen are largely comparable in magnitude to those caused by the numerical sinking 205 scheme of detritus (Kriest and Oschlies, 2011). The effect of this process is subject to further research.

Optimisation algorithm
Optimisation of n = 6 biogeochemical model parameters (see section 2.7 for choice of parameters to be optimised) is carried out using an Estimation of Distribution Algorithm, namely the Covariance Matrix Adaption Evolution Strategy (CMA-ES; Hansen and Ostermeier, 2001;Hansen, 2006). The application of this algorithm to the coupled biogeochemistry-TMM frame-210 work has been presented in detail in Kriest et al. (2017), and we here only give a brief overview: In each iteration ("generation") the algorithm defines a population of 10 individuals (biogeochemical parameter vectors of length n), sampled from a multivariate normal-distribution in R n . Following the simulation of these 10 model setups over 3000 years to near-steady state, after which, for example, global oxygen and nitrate inventory change only by a small amount (see Figure 2 by Kriest and Oschlies, 2015), the misfit (cost) function (presented in section 2.6) is evaluated, and information of the current as well as 215 previous generations is used to update the probability distribution in R n such that the likelihood to sample solutions resulting in a good fit to observations increases. Therefore, the population (the number of model simulations per generation) in CMA-ES is smaller, and of lower computational demand than in classical evolutionary algorithms, making the algorithm applicable to this computationally expensive problem. On the other hand, with its quasi-stochastic sampling CMA-ES can still to a certain degree perform well with misfit functions characterised by a rough topography (i.e., many local optima; Kriest et al., 2017;220 Kriest, 2017).

Misfit (cost) function
As in Kriest et al. (2017) and Kriest (2017) the standard misfit to observations J is defined as the root-mean-square error (RMSE) between simulated and observed annual mean phosphate, nitrate, and oxygen concentrations (Garcia et al., 2006a, b), mapped onto the respective three-dimensional model geometry. Deviations between model and observations are weighted by 225 the volume of each individual grid box, V i , expressed as the fraction of total ocean volume, V T . The resulting sum of weighted deviations is then normalised to the global mean concentration of the respective observed tracer:

Parameter optimisations and cross-validation experiments
The optimisations presented here are based upon two successive optimisations presented by Kriest et al. (2017)  was calculated after a spin up of 3000 years, as also used in the present study.
In the first optimisation, Kriest et al. (2017) optimised four parameters related to plankton growth and loss terms, together with b and R O2:P . The optimal parameters of this first calibration led to a better agreement of simulated global biogeochemical fluxes to observations of primary and export production, zooplankton grazing, particle flux at 2000 m, and organic matter burial 240 at the sea floor . In a subsequent optimisation Kriest (2017) kept the four optimal plankton parameters fixed, and calibrated four parameters related to remineralisation and nitrogen fixation (namely K O2 , K DIN , DIN min and µ NFix described in section 2.4), together with b and R O2:P (see Table 1). This second optimisation by Kriest (2017) led to a better match to independent estimates of pelagic denitrification, without deteriorating the matches to observed primary and export production, zooplankton grazing, particle flux at 2000m and organic matter burial at the sea floor. It is hereafter referred to as 245 MIT28 ⇤ and serves as the starting point for the four additional optimisations presented in this paper.
In particular, we repeated optimisation MIT28 ⇤ against Equation 1 in circulations ECCO, UHigh, U20 and U17.5 described above. In the following we refer to these four additional optimisations and optimal parameter sets as ECCO ⇤ , UHigh ⇤ ,U20 ⇤ 3 Results

Performance of optimised models
When optimised for different circulations the coupled models show similar values of the misfit function J ⇤ ( Table 1). The 255 misfit decreases with more realistic circulation and physics (according to the criteria described in section 2.2), and is lowest for ECCO ⇤ . Global mean phosphate profiles are quite similar, and vary less then 10% of the observed concentrations at depths below 500 m ( Figure 4, panel (A) and Figure S3, black lines). The low variation of phosphate might be explained by the fixed phosphorus inventory of the model. Only at the surface, where nutrient concentrations become low, relative variation is larger.
However, despite optimisation some regional mismatches remain: for example, the model when optimised for the three UVic 260 circulations shows a considerable overestimate of deep (> 3000 m) phosphate in the Atlantic ( Figure S4). All optimal models further underestimate phosphate in the mesopelagic of the northern North Pacific.
Vertical nitrate profiles are also quite similar to each other ( Figure 4, panel (B) and Figure S3, red lines), although the nitrate inventory is allowed to adjust dynamically to the loss of fixed nitrogen during denitrification, and its balance through nitrogen fixation at the sea surface. Because optimisation also attempts to match nitrate observations, the differences between 265 the models are nevertheless quite small, and result in deviations of 1-2% of observed global mean nitrate (see Table 1). The global pattern of nitrate residuals ( Figure S5) generally corresponds to that of phosphate. However, the spatial distribution of fixed nitrogen gain and loss causes variations in the global distribution of the nitrate deficit in relation to phosphate, as expressed as N ⇤ = NO 3 16 ⇥ PO 4 ( Figure 5).
In agreement with observations, all optimised models show a large nitrate deficit in the Pacific Ocean, manifest in strongly 270 negative N ⇤ ( Figure 5). This lack of nitrate is caused by denitrification in the ETP, and is balanced by nitrogen fixation in tropics and subtropics (see Figure S6). In the Atlantic Ocean, where denitrification is largely absent, simulated N ⇤ is far less negative than in other areas, but it is never positive as suggested by the observations. This mismatch can be explained by the prerequisites of the biogeochemical model applied: In MOPS, nitrogen fixation relaxes nitrate to 16⇥phosphate with a time constant defined by µ NFix whenever N ⇤ is negative; otherwise it does not occur. Because of this process parameterisation, N ⇤ 275 is restricted to values  0. Finally, the Southern Ocean has moderate values of N ⇤ , owing to the mixing of water masses of different origin. Thus, although global average nutrient profiles match observations well, with little differences among optimal models, some regional biases remain, which differ among the models. Also, because of the different processes involved in nutrient turnover, phosphate and nitrate distributions are not exactly the same, with consequences for the nitrate deficit in different oceanic domains.

280
Global mean oxygen profiles of optimised models vary more strongly than those of nutrients, up to ⇡ 40 mmol O 2 m 3 in the deep ocean, and thus more than 20% the observed value ( Figure 4, panel (C) and Figure S3, blue lines). Overall, a finer resolution and a more realistic circulation improve the representation of this tracer, reducing the global oxygen bias, which ranges from 5.3 mmol O 2 m 3 (U17.5 ⇤ ) to almost zero (ECCO ⇤ ; Table 1). On a regional scale all models show some common biases: South of 40 S all optimised models overestimate zonal mean oxygen in subsurface waters above ⇡ 1500 m (MIT28 ⇤ ) 285 to ⇡ 2000 m (ECCO ⇤ ), or even further downward (UVic simulations; Figure S7). In the Pacific Ocean these too high oxygen concentrations propagate northward. Finally, in mesopelagic waters of the northern North Pacific all models overestimate oxygen, especially above 1000 m. Common to all models is further an underestimate of oxygen in subsurface waters (down to ⇡ 1000 m) in the subtropics and tropics of the southern hemisphere. The Atlantic Ocean is generally characterised by too low oxygen concentrations at greater depths. Here, the five optimal models differ: ECCO ⇤ exhibits too low mesopelagic 290 oxygen in the tropical and subtropical Atlantic. MIT28 ⇤ underestimates oxygen particularly in deep waters of the southern hemisphere, and north of 60 N. The optimal UVic configurations are biased low in deep waters of the northern hemisphere.
These low oxygen concentrations of the UVic configurations are accompanied by too high phosphate and nitrate in the deep North Atlantic ( Figure S5), and indicate too high remineralisation of organic matter in these depths. Together with circulation these result in a too strong accumulated signal of remineralisation.

295
Thus, while there are common features among the five optimal models, there are also some striking differences especially in the Atlantic. These differences can be explained with the large impact of the Pacific on the misfit function. Owing to its large volume, and the comparatively large impact of oxygen on the misfit function (see Kriest et al., 2017), optimisation will attempt to minimise especially oxygen misfits in this region, and tune the biogeochemical model parameters to compensate for potential errors of the respective circulation. These different parameters affect the oxygen distribution in the Atlantic, which 300 does not contribute so much to the global misfit. Different physical properties of the circulations then cause divergent patterns in the oxygen distribution of this region.

Best parameters of optimisations in different circulations differ
As presented and discussed by Kriest (2017) applied by Kriest and Oschlies, 2015), a low affinity of denitrification to nitrate (K DIN = 32 mmol m 3 ), and a low maximum nitrogen fixation rate (µ NFix = 1.19 µmol m 3 d 1 ; Table 1), which is only about half of the default value applied by Kriest and Oschlies (2015). The optimised oxygen affinity of remineralisation is very high, as indicated by a low value of K O2 (the half-saturation constant for oxygen). We note that K O2 also regulates the inhibition of denitrification by oxygen; when this 310 parameter becomes very low, denitrification is more strongly inhibited by oxygen. Hence, the optimal model configuration MIT28 ⇤ induces only moderate denitrification, and prevents a decline of the global nitrate inventory through this process (see also Kriest and Oschlies, 2015). The oxygen demand of remineralisation, R O2:P , remains close to the value derived from observations (170±10 mmol O 2 :mmol P; Anderson and Sarmiento, 1994). Finally, as noted by Kriest (2017), some parameters are only weakly constrained by the misfit function: for example, an almost ten-fold increase in K O2 results in a misfit function 315 not larger than 1% of the optimal misfit (see also Table 1). One reason for this low sensitivity of the misfit to a variation in K O2 , K DIN or DIN min is the small volume occupied by suboxic zones, where these parameters can play a role for dissolved inorganic tracer concentrations (Kriest, 2017).
Optimising the same set of parameters in either the three different UVic circulations or the ECCO circulation also results in a high threshold for the onset of denitrification, DIN min (Table 1). As for MIT28 ⇤ the dependencies of oxic and suboxic 320 remineralisation on oxygen or nitrate, expressed through K O2 and K DIN , may vary largely within the parameter space without having a large impact on the misfit function.
In contrast, R O2:P and b are constrained very well by the misfit function, as indicated by the narrow range of parameters that result in a good fit to observations. For example, all solutions of the ECCO ⇤ , which result in a misfit within 1% of the optimal fit require a b value between 1.4 and 1.5, and a stoichiometric demand for oxygen between 150 and 154 mol O 2 : mol P (Table 1).

325
Optimal R O2:P decreases from 170 mol O 2 : mol P (MIT28 ⇤ ) over 162 mol O 2 : mol P (UHigh ⇤ ) to 151 mol O 2 : mol P (ECCO ⇤ ). The exponent determining the shape of the particle flux curve, b, also varies among the five optimisations, between 1.27 (UHigh ⇤ ) and 1.46 (ECCO ⇤ ). The range of good (within 1% of the optimal fit) values for b differs between ECCO ⇤ and UHigh ⇤ (b between 1.2-1.3). Also, the range for good values for R O2:P of ECCO ⇤ does not overlap with that of the other optimisations.
Therefore, to achieve a good fit to observations different circulations seem to require markedly different parameters for the 335 oxygen utilisation by remineralisation, (R O2:P ), the exponent determining the particle flux curve (b) and the potential rate of nitrogen fixation (µ NFix ). Other parameters vary little (DIN min ), or (as indicated by overlapping ranges of good parameters) the differences among them might not be relevant (K O2 and K DIN ) for a misfit function targeting on the global scale. It is important to note that the relevant parameters do not seem to be correlated with each other ( Figure S8). Apparently, different characteristics of each circulation influence the choice of the optimisation algorithm for the optimal values of R O2:P , b and 340 µ NFix .
To investigate the potential dependence of optimal parameters on circulation, we examined the area of dense-water outcrop and deep mixing in two different regions (see section 2.3 for definition). Together with average age of four different regions or water masses, we investigate 12 different diagnostics for each circulation for their influence on optimal biogeochemical parameter estimates. In most cases the optimal parameters are not correlated with physical properties (Table 2). However, the 345 oxygen demand of remineralisation, R O2:P is significantly correlated with the area of deep mixing in the northern North Atlantic for maximum mixed-layer depths of 200 and 400 m (p < 0.05). For both criteria R O2:P increases with increasing area of deep mixing ( Figure 6). In addition, R O2:P also correlates with a deep mixing area defined by > 200 m in the Southern Ocean, albeit not significantly (Table 2). In other words, more vigorous mixing in areas of deep, intermediate or mode water formation allows for a higher oxygen utilisation by remineralisation. Parameter b describing the particle flux curve 350 correlates significantly with the ideal age of NADW (Table 2), and increases with increasing age of this water mass ( Figure 6).
Finally, the maximum potential rate of nitrogen fixation µ NFix correlates with the outcrop area of waters with moderate density (26.5  ✓ < 27.5) waters in the Southern Ocean. An increase in the outcrop area of these waters -which correspond roughly to the sum of AAIW and SAMW -results in an increase of optimal maximum nitrogen fixation rate ( Figure 6). The age of mesopelagic waters in the ETP seems to play a small role, despite the fact that it is an important area in the global nitrogen 355 budget ( Figure S6). Possible reasons for the dependence of these three parameters on physical diagnostics will be discussed in section 4.1.

Cross-validation experiments: Can we transfer parameters optimal for one circulation to another circulation?
Given that three optimal parameters differ among the model circulations, we investigate model performance and dynamics when these parameter sets are swapped among circulations. Of course, every coupled model performs best (with respect to 360 J ⇤ of Equation 1) when simulated with parameters optimal for the respective circulation, as indicated by the lowest relative misfit along the main diagonal in panel (A) of Figure 7. When exchanging the biogeochemical parameters optimal for ECCO or MIT28 circulation with parameters optimised in any other circulation, the model performance with respect to respect to J ⇤ deteriorates. The misfit function changes less when the optimal parameters are swapped among the different UVic circulations.
In the model the global oxygen inventory adjusts dynamically to the combined effects of circulation and biogeochemical 365 parameters, causing a large impact of this tracer on the misfit function . Therefore, optimisation attempts to reduce the global oxygen bias, which is low for each optimal model configuration, When swapping parameter sets among the different configurations of the UVic circulation, the effect is much less pronounced.
Apparently, despite their different overturning and mixing, the UVic circulations are more similar to each other than those of MIT28 and ECCO. We note that these large differences in oxygen inventory arise mainly from deeper (27.5  ✓ ) layers, while 375 the oxygen inventories of waters lighter than ✓ = 27.5 are quite similar ( Figure S9).
OMZ volume is biased low for the parameter set of ECCO ⇤ , and in the MIT28 circulation ( Figure 7, panel (C)). This underestimate is likely caused by the very low R O2:P and high b of ECCO ⇤ , or the vigorous mixing in the MIT28 circulation, which both tend to increase subsurface oxygen concentrations. Otherwise, OMZ volume does not seem to be closely related to the parameter set or circulation, likely because this diagnostic is largely independent of the applied misfit function, and depends 380 on the local circulation pattern (see discussion by Sauerland et al., 2019

Effect of parameters and circulation on phosphate and oxygen concentrations in different water masses
Because of the regional biases of nutrients and oxygen in the North Atlantic, North Pacific and Southern Ocean (see section 3.1), and because the values of R O2:P and b selected in the calibration process correlate significantly on water mass properties of the NADW, NPDW and CDW (section 3.2), we here examine more closely how these parameters affect the large scale distribution of phosphate (as a conserved nutrient) and oxygen, which can adjust dynamically at the model's air-sea interface, and is thus a non-conservative tracer. Kriest et al. (2012) showed that b, the parameter determining the particle flux length scale, has a large influence on the distribution of phosphate along the "conveyor belt" (i.e., along waters of different age), in agreement with the results obtained by Bacastow and Maier-Reimer (1991) and Kwon and Primeau (2006). We here carry out an analysis similar to that by Kriest et al. (2012) and evaluate average phosphate and oxygen within the NADW, NPDW and CDW, with region definitions as In contrast to phosphorus, the oxygen inventory is not fixed, but regulated by the interplay of circulation, air-sea gas ex-405 change, and biogeochemical turnover. Because of this we find a pattern that is very different from that of phosphate when examining the distribution of average oxygen in different water masses. Now, within each circulation average oxygen in the NPDW increases almost linearly with average oxygen in the CDW and NADW (Figure 9), highlighting the role of these waters for the ventilation of the deep North Pacific. In most circulations, a large value of R O2:P results in a low oxygen content in all water masses. All optimised coupled model configurations suggest average oxygen between 220-250 mmol m 3 in the NADW.

410
For the NPDW all optimised models simulate average oxygen concentrations around 150 mmol m 3 , and thereby overestimate the observed value of 125 mmol m 3 by one fifth. Average oxygen in the CDW of optimal model simulations varies between ⇡ 210 240 mmol m 3 , encompassing the observed value of 215 mmol m 3 . Thus, given the quite wide range of potential parameter values, optimisation improves the global oxygen bias (see Table 1), but some residual regional bias especially in the NPDW remains. For example, to determine Circ for global average oxygen or OMZ volume (here denoted as X), for each parameter set i simulated with the five different circulations j = 1...5 we compute the difference between the maximum and minimum value X i = max(X i,j=1...5 ) min(X i,j=1...5 ), and then determine the maximum of these differences: Circ= max( X i ).
The computation of Par is done analogously. We also compute the maximum across all optimal model configurations 425 Opt= max(X i=j ) min(X i=j ), and across all 25 experiments presented in this study: All= max(X i,j ) min(X i,j ).
Using this approach, a value for Par close to All indicates that the model variability is mainly induced by the biogeochemical parameter set, whereas a relatively large value for Circ indicates a major impact of circulation. Table 3 and Figure 11 show the results of this comparison, and Figure 12 illustrates the variability for each circulation or parameter set, normalised by the average over all optimal models. Note that the longest horizontal or vertical line in Figure 12 corresponds to Par and 430 Circ in Table 3, while the width of the grey shaded square corresponds to All.
Firstly, biogeochemical parameters ( Par) as well as circulation ( Circ) play an about equally large role for the global average oxygen, which varies by ⇡ 24 mmol m 3 (Table 3 and Fig. 11), or about ±15% of the average optimal value (see also Figure 12). The variation decreases to less than one fourth of this value if we restrict our analysis to only optimal models ( Opt); as noted above, this strong decrease arises because all optimal models have adjusted R O2:P to account for different 435 ventilation in the high latitudes. Considering all 25 experiments, i.e., the accounting for variation induced by both biogeochemical and physical configurations ( All), leads to a variation six times as large as for the optimal configurations ( Opt).
The variability is much more pronounced when considering the OMZ volume as defined by two criteria, 50 mmol m 3 and 80 mmol m 3 . Again, both circulation and parameter set play an about equally large role; but the impact of changes in parameters or circulation varies across the different models (Fig. 12). For example, applying the parameter set of MIT28 ⇤ to a 440 different circulation causes a very strong increase in OMZ volume (vertical black lines in Fig. 12), while model MOPS coupled to the circulation of MIT28 is quite robust with respect to different parameters (horizontal black lines in Fig. 12). On the other hand, when coupled to the ECCO circulation the biogeochemical model is quite sensitive to the biogeochemical parameter set (horizontal red lines in Fig. 12), but its optimal parameter set ECCO ⇤ has a smaller effect when applied to other circulations (vertical red lines in Fig. 12). This diverging effect of parameters and circulation among the different models eventually causes 445 a large spread of 67.2 ⇥ 10 15 m 3 of global OMZ volume across all model experiments ( All in Table 3 and Fig. 11), i.e., more than 100% the observed volume. The effects are even more pronounced when considering a criterion of 80 mmol m 3 for OMZ definition (Table 3). The OMZ volume does not show any consistent trend with b, R O2:P , or circulation (Fig. 10, panel (B)), although models with high b and low R O2:P tend to result in a smaller OMZ volume. The circulation of MIT28 shows the lowest OMZ volume.

450
To summarise, oxygen inventory and OMZ volume are almost equally influenced by physics and biogeochemistry. Optimisation reduces the spread induced by either biogeochemistry or physics to about 30% percent for average oxygen, but less for OMZ volume, which varies strongly across all model experiments. We note that the individual contributions of Par and Circ for both diagnostics do not add up to All (Table 3), indicating that the effects of biogeochemical parameters and circulation are not linear and additive.

Effect of parameters and circulation on global biogeochemical fluxes
Oxygen and nutrient distributions are influenced by the production of organic matter in the euphotic zone, and its subsequent transport to the ocean interior by physical and biogeochemical processes. In addition, denitrification in combination with nitrogen fixation can affect the global nitrogen inventory, and the spatial distribution of the nitrate deficit (see section 3.1). We finally here investigate how these fluxes are affected by the two parameters R O2:P and b.

460
In our model experiments simulated global primary production depends slightly more on circulation ( Circ) than on biogeochemical parameters ( Par; Table 3, Figures 11 and 12). An increase in b (corresponding to slowly sinking particles) causes primary production to increase ( Figure 10, panel (D)), likely because of the higher nutrient retention in the euphotic zone, shallow remineralisation and enhanced entrainment of nutrients into the surface layers. Because the latter process depends on physical dynamics, we also find an influence of the circulation model on global primary production. Further, our optimisations 465 did not include parameters relevant for plankton dynamics at the surface, which can also explain the comparatively large impact of circulation. The variation across all optimal models of our study ( Opt) is much smaller (about one third) than the variation across all model experiments ( All).
Circulation also plays a large role for export production (particle flux through 100-130 m, depending on model grid), as it supplies new nutrients to the well-lit upper ocean which will, under steady state conditions, be exported again. Somehow 470 surprisingly, export production is not strongly determined by b (Fig. 10, panel (E)). This parameter affects directly the sinking of organic matter out of the euphotic zone. On the other hand, it determines the subsurface concentration of nutrients, as a source for upwelling and entrainment of nutrients. A large b, corresponding to slow sinking and shallow remineralisation, increases nutrients within and below the euphotic zone, and thus primary production as the ultimate source of export production; on the other hand, it prevents fast settling of organic particles out of the euphotic zone. Because the plankton parameters were not 475 changed during optimisation, global grazing follows almost linearly primary production (r = 0.95), and the statements and conclusions made with respect to the former flux largely apply to grazing (no figure). Therefore, the combined antagonistic effects of b on surface (and subsurface) nutrient turnover, subsurface nutrient concentrations (as a source of nutrient entrainment and mixing) and direct organic particle flux in the upper few hundred meters explain the relatively small variation caused by biogeochemical parameters on export production ( Par ; Table 3 and Figures 11 and 12), which is only about half as much the 480 variation due to circulation ( Circ).
Deep particle flux, on the other hand, is almost entirely determined by b, and circulation plays a negligible role for this flux ( Fig. 10, panel (F)). The large influence of this parameter is also reflected in its range over all model simulations, which is only slightly larger than the range of flux in optimally configured models (Table 3). Therefore, simulated organic matter supply to the deep ocean and deep nutrient concentrations will, to a large extent, depend on the prescribed particle flux profile, with 485 potential effects on the long-term storage of carbon dioxide.
The loss of fixed nitrogen through pelagic denitrification is tightly related to the extent of OMZs, and thus varies quite strongly among the different experiments, with no clear trend for either b, R O2:P (Figure 10, panel (C)) or µ NFix (no figure).
The range of variation due to parameters and circulation is about equally large (about 50% of the average optimal global flux; induce changes of about ±30% around the mean optimal flux for each model circulation or parameter set (Fig. 12). Global fixed nitrogen loss of the optimised models varies much less, likely because optimisation adjusts the parameters to match observed nitrate profiles.
To summarise, at least for this particular biogeochemical model circulation and biogeochemistry affect global biogeochemical fluxes in different ways. While primary production and fixed nitrogen loss are almost equally influenced by physics and 495 biogeochemistry, export production depends mainly on physics. Deep particle flux, on the other hand, is affected to a large extent by b. Optimisation reduces the spread induced by changing either biogeochemistry or physics to about 50% percent for fixed nitrogen loss and for primary production (compare Opt with Circ or Par). In contrast, there is no such reduction in model variability for export production (which is mainly determined by circulation) or deep particle flux (which is mainly determined by b). 500 4 Discussion

Why do different circulations require different parameters?
As we have seen in section 3.2, three optimal parameters depend significantly on three unique diagnostics that result from different features of the circulation model. These diagnostics are related to the northern North Atlantic and the Southern Ocean; the ETP seems to play a lesser role.

505
The strong correlation of R O2:P with the area of deep mixing clearly confirms that these two model properties (physics and biogeochemistry) regulate global ocean oxygen distribution and inventory in concert. The larger the area of deep mixing, the more oxygen can -or should -be respired in the model, in order to match observed oxygen concentrations. Despite optimisation, the optimal models show an average oxygen concentration of ⇡ 145 mmol m 3 in the NPDW (Figure 9), which is higher than the observed value of 125 mmol m 3 . Given that the models differ strongly in their physical properties, this 510 residual mismatch of all optimal models especially in the NPDW may point towards a deficiency of the biogeochemical model.
For example, the spatially homogeneous, and thus inflexible, particle flux profile may not be adequate to simulate the very dynamic response of ecosystem dynamics and particle size structure to regionally variable mixing and nutrient supply (e.g., Guidi et al., 2015;Marsay et al., 2015). Here, a more flexible model resulting in variable sinking speeds of particles (e.g., Gehlen et al., 2006;Niemeyer et al., 2019), or, more generally, spatially flexible remineralisation length scales (e.g., Weber 515 et al., 2016), might be of advantage.
The parameter determining particle flux, b, correlates with the ideal age of NADW ( Figure 6). This physical diagnostic comprises several aspects of circulation: a large area of deep mixing in the northern North Atlantic supplies this region with "young" waters. At the same time, a strong Atlantic Meridional Overturning circulation (AMOC) and/or confined Deep Western Boundary current (DWBC) can more quickly export the preformed properties to the southern parts of the basin. Depending 520 on the parameterisation of mixing and other physical processes, biogeochemical tracers are distributed more efficiently in the west-east direction, or mixed with deeper waters. When these combined properties of a model cause a long residence time of waters in the NADW, the resulting age of this water mass will be quite high, and vice versa.
The circulations applied in our study vary with regard to several aspects in this region: in contrast to all other models the circulation of MIT28 has a large area of deep mixing in the north, including the Labrador Sea (see Figure 2). There is also a 525 quite strong and wide transport of young waters in the western part of the North Atlantic via the DWBC, as indicated by the southward propagation of relatively young waters between 1500-2500 m in the western part of the basin (Figure 3). At the same time there is a strong lateral spreading of these young waters from the western part (see also Dutay et al., 2002). All processes combined lead to relatively young average age of NADW in MIT28. ECCO, in contrast, shows only comparatively shallow mixing in the northern North Atlantic (Figure 2), little southward transport of these waters in the DWBC (Wunsch and 530 Heimbach, 2006), and a large extent of older waters in the Eastern Tropical Atlantic (Figure 3). This circulation is characterised by the oldest average age of NADW.
Our optimisations suggest that models with old NADW adjust to a large b, or slow particle sinking (for example, b = 1.46 of ECCO ⇤ ). As ideal age becomes younger, optimal b decreases. Why is this the case? So far, we can not attribute this exclusively to the area of deep mixing (Table 2), or, for example, to the overturning circulation, which is quite low in ECCO
Instead, the average age of NADW, and resulting optimal b, likely reflects the combined effects of various model physical and biogeochemical parameterisations: the adjustment of b to smaller values decreases shallow production and remineralisation (see Figure 10). It also increases export of phosphorus to deep waters, and finally to the NPDW (see Figure 8). Circulation models with high physical turnover in the NADW (e.g., UHigh), as indicated by young NADW, can more easily resupply 540 nutrients to surface waters, and therefore balance the loss due to particle sinking in this region.
As shown in Figure 6, the maximum potential rate of nitrogen fixation µ NFix increases with area of surface waters defined 26.5  ⇥ < 27.5 in the Southern Ocean, i.e., waters reflecting the formation and ventilation of AAIW and SAMW. A broad view of large scale circulation and the spatial separation of fixed nitrogen loss and gain helps to understand the adjustment of maximum nitrogen fixation rate to physical processes in the Southern Ocean. Denitrification is a very localised process, 545 occurring mainly in the Eastern Tropical Pacific (ETP) ( Figure S6). On the other hand, simulated nitrogen fixation takes place throughout large parts of the tropical and subtropical regions of the Pacific, Atlantic and Indian Ocean. Even though nitrogen fixation in the Atlantic accounts only for a fraction of global fixed N gain (see also Marconi et al., 2017, for evidence from observations), in our models it nevertheless contributes to the stabilisation of the global fixed-nitrogen budget. A very negative N ⇤ , as arises from denitrification in the ETP, has to arrive in the Atlantic for nitrogen fixation to trigger a competitive advantage 550 of nitrogen fixation. These two regions in the Atlantic and Pacific Ocean are connected through large scale circulation, which transports N ⇤ on centennial to millennial time scales from areas of fixed nitrogen loss to areas of fixed nitrogen gain. When passing the CDW of the Southern Ocean, these waters can act as a "mixer of deep waters with distinct isotopic signatures and nutrient stoichiometry" (Tuerena et al., 2015); the resulting mixed properties provide the source of AAIW and SAMW. The subsequent transport via AAIW and SAMW then can trigger nitrogen fixation, e.g., in the Atlantic, and balance the nitrate 555 deficit arising mainly in the Pacific Ocean.
As shown in Figure 5, the nitrate deficit N ⇤ differs among the different models. For example, MIT28 ⇤ exports water with an N-deficit of ⇡ 3 mmol m 3 from the Southern Ocean to the low latitudes (promoting nitrogen fixation). This model adjusts to a low rate of maximum potential nitrogen fixation of 1.19 µmol m 3 d 1 . On the other hand, UHigh ⇤ simulates SAMW and AAIW that contain a lower N-deficit of ⇡ 2 mmol m 3 , which -depending on phosphate availability -will result in lower 560 nitrogen fixation. The optimal high parameter of UHigh ⇤ of almost 3 µmol m 3 d 1 can partially compensate for this. The effect of N ⇤ is, however, not consistent across all optimal models: U17.5 ⇤ also shows a small nitrate deficit in this region, but has a still relatively low maximum nitrogen fixation rate. Here, other effects might play a role, such as a stronger ventilation and consequently younger waters in the ETP ( Figure S2), which induce a smaller OMZ (Table 1), less denitrification in this region ( Figure S6), and thus a lower nitrate deficit in this area, that is to be eventually balanced by nitrogen fixation.

575
As shown in section 3.2, each circulation requires its own set of parameters for an optimal fit to dissolved inorganic tracers.
Optimisation facilitates the identification of these constants; on the other hand, it requires many model evaluations, so this approach is prohibitive for models of high resolution because of computational constraints. It would be desirable to optimise a biogeochemical model in a computationally cheaper circulation and then transfer the optimal parameters to a different model that includes more physical details, but is computationally more expensive. However, as shown in Sections 3.3, 3.5 and 3.6, 580 model performance can deteriorate when simulated with non-optimal parameters, and result in a considerable spread of the independent diagnostics. Is there any advantage of calibrating biogeochemical models in these rather coarse scale, simplified circulations, if the parameters are to be transferred to a different circulation? To answer this question, we here discuss the model variability before the background of earlier model studies and observed estimates.
The mean diagnostic across all 25 model experiments (Mean(All) of Table 3) differs only slightly from the mean across only 585 the optimal model configurations (Mean(Opt)), and is close to observed quantities (Table 3), in agreement with Kriest et al. (2017) and Kriest (2017), who found that optimisation against global nutrients and oxygen can help to improve global model performance. Further, the overall maximum variation across all 25 experiments ( All of Table 3) is usually less than 50% of that found by Bopp et al. (2013), who examined seven global models of different biogeochemical structure and circulation for their global average oxygen, OMZ volume, primary and export production. This indicates that optimisation can help to improve model performance and reduce its uncertainty, even if parameters were optimised in a different circulation.

Model uncertainty, oxygen inventory and OMZ volume
As shown in section 3.5 changes in circulation and biogeochemical parameters affect model performance with respect to global average oxygen about equally, resulting in an overall variation that is less than 25% of the observed value ( All of Table 3). In contrast, the global OMZ volume shows a large response to variations in circulation and model parameters, and varies by more 595 than 100% ( All of Table 3) to 200%  of the observed volume. To our knowledge, no global model study exists so far that systematically distinguishes between the effects of circulation and biogeochemistry on global OMZ volume; our study suggests that both are equally important. Even the range across optimal models ( Opt) is still quite large, which can be explained by the fact that the target of optimisation (the RMSE to nutrients and oxygen, as of Equation 1) is only weakly correlated to the fit to OMZs (Sauerland et al., 2019). Application of a revised misfit function, or multi-objective optimisation 600 as presented by Sauerland et al. (2019), can help to better constrain the relevant model parameters, and better represent OMZs.
Nevertheless, the mean of all models in our study deviates by less than 5% from the observed mean. Obviously, a good representation of nutrients and oxygen can improve the fit to OMZs to some extent.
However, many global circulation models suffer from a deficient representation of physical processes in the tropics and subtropics, for example in their representation of the equatorial undercurrent (Dietze and Loeptien, 2013), or from inadequate 605 ventilation from the Southern Ocean and North Pacific (see Cabre et al., 2015, and citations therein). The first problem can possibly be cured by a higher resolution, which leads to a more realistic OMZ ventilation by equatorial and off-equatorial undercurrents (Duteil et al., 2014). Parameterisation of intermediate jets (Getzlaff and Dietze, 2013) can also lead to a better agreement of the models. Tuning of biogeochemistry before the background of inadequate physics could compensate for the physical errors, but also result in misleading model parameterisations ('overtuning'), with potential consequences for future 610 projections (Löptien and Dietze, 2019). Given the yet unexplored structural and parameter sensitivity of models employed in global assessments, and the large error with respect to OMZ volume and expansion (Table 3; Cocco et al., 2013;Bopp et al., 2013;Cabre et al., 2015), a careful analysis of different error sources (physical and biogeochemical) can help to determine the reasons for model divergence. The study presented here can serve as a first step towards this.

615
The loss of fixed nitrogen through pelagic denitrification is tightly linked to OMZs, and therefore also almost equally influenced by circulation and biogeochemical parameters. Our average optimal model estimates are in agreement with recent estimates by Eugster and Gruber (2012) and Somes et al. (2013). The variation due to biogeochemical parameters is lower than found by Somes et al. (2013), who varied the nitrate threshold for denitrification from 20 to 32 mmol m 3 , i.e. a wider range than identified by our objective parameter calibration (see Table 1).

620
Average global primary production of the models lies well within the range of observed estimates (Carr et al., 2006), and depends slightly more on circulation than on biogeochemical parameters, similar to the results obtained in the sensitivity study by Schmittner et al. (2005). That study included a wide range of sinking and mixing parameterisations, which might explain the larger variation compared to our results. Applying three different circulations to one biogeochemical model, Seferian et al. (2013) found a spread of primary production which comparable to our experiments.

625
Quite many global model studies analysed the impact of circulation on global export production. Najjar et al. (2007) found an effect of circulation more than ten times larger than in the present study, which can likely be explained by the nutrient-restoring approach applied in their simple model. Using a more complex biogeochemical model, export production in the study by Seferian et al. (2013) varied only by ⇡ 3 Pg C y 1, which is closer to the effects of circulation found in our study. The large effects observed by Schmittner et al. (2005) can again likely be ascribed to the wide range of parameterisations tested.

630
Using a biogeochemical model similar to the one applied by Najjar et al. (2007), Kwon et al. (2009) found an increase in export production of ⇡ 5 Pg C y 1 when increasing b from 1.1 to 1.4 (about the range tested in our study). Again, this can possibly be explained by the nutrient restoring approach, which does not account for the interplay between particle export and remineralisation in surface and subsurface layers. Therefore, our model experiments show a lower sensitivity of global export production on biogeochemical parameters or 635 circulation than the previous studies. Some part of this difference could be explained by the large variation in physical model setup (in the study by Schmittner et al., 2005), or by the very different structure of the biogeochemical model applied (Najjar et al., 2007;Kwon et al., 2009). The large sensitivity of export production in the study by Najjar et al. (2007) also reflects on the range of deep particle flux (as diagnosed from export production times 0.052), which is almost ten times higher than in the present study.

640
Our experiments suggest that even though circulation does play a large role for export production, deep particle flux is mainly determined by parameter b. As a consequence, the sensitivity of export production to circulation noted by Najjar et al. (2007) and in the present study does not necessarily imply an equally large sensitivity of deep particle flux (and resulting remineralisation and deep oxygen consumption) on physical model features, which is somehow in contrast to the conclusions drawn by Najjar et al. (2007).

The effect of model complexity
To summarise, in all cases studied here biogeochemical parameter optimisation can help to narrow down the model uncertainty induced by circulation and biogeochemical parameters. Even if parameters optimised in one circulation are later transferred to a different circulation the resulting spread is mostly around 50% that of the model intercomparison presented by Bopp et al. (2013). However, that study included models that diverged not only in physics, but also in the biogeochemical structure, which 650 might introduce another source of variability. To have a closer look at this we finally contrast the results of model MOPS, which we consider as a model of intermediate complexity, with an equivalent optimisation of a much simpler model "RetroMOPS" presented by Kriest (2017).
RetroMOPS is a four-component model that simulates only phosphate, nitrate, oxygen and dissolved organic phosphorus, but includes the same structural form for particle flux and remineralisation as model MOPS. When coupled to MIT28, and 655 optimised against the same data set and misfit function presented above, the performance and global fluxes of RetroMOPS are very similar to MOPS (in the same circulation; Table 3 and Kriest, 2017)). For primary production the difference between RetroMOPS and MOPS is about as large as when MOPS is simulated with different parameter sets within a given circulation.
One reason for this is the fact that the optimisation of RetroMOPS by Kriest (2017) aimed only at the parameters related to particle sinking and remineralisation, but not at parameters related to phytoplankton growth and loss terms; an additional 660 optimisation of these parameters may likely have produced smaller differences.
Therefore, after optimisation a simple model can perform quite well with respect to large-scale biogeochemical quantities, in agreement with earlier findings (Kriest et al., 2012;Kwiatkowski et al., 2014;Galbraith et al., 2015), illustrating the benefit of parameter optimisation: optimisation allows for a "fair" comparison of models of different complexity (after each model has been tuned to match some desired quantity best); it can therefore also help to decide about the necessary level of model 665 complexity.

Conclusions
Optimisation of a global biogeochemical ocean model coupled to five different circulations achieved a good fit to observed nutrients and oxygen with partly different biogeochemical parameters. We identified three parameters that depend significantly on characteristic features of circulation, as summarised in Figure 13. Areas of deep ventilation in the North Atlantic and in 670 the Southern Ocean determine how much oxygen is supplied to the ocean via air-sea gas exchange and subsequent mixing.
As a consequence, optimisation of the model in circulations with vigorous ventilation triggers a high oxygen demand of remineralisation during optimisation. Fast turnover and mixing of NADW, as expressed through low ideal age, affects the parameter responsible for the timescale and vertical extent of remineralisation of sinking particles. Here, models characterised by relatively young waters in the NADW adjust to deeper sinking and remineralisation, with consequences for the large scale 675 distribution of phosphate. Finally, the combined outcrop area of SAMW and AAIW determines the optimal maximum rate of nitrogen fixation. This may be explained with the role of the Southern Ocean as "mixer" for waters of different origin and nitrogen deficit (Tuerena et al., 2015). The extent and properties of waters originating from this region (and their fixed nitrogen deficit), in conjunction with denitrification within the OMZs then set the stage for the competitive advantage of cyanobacteria in tropical and subtropical waters. Models with a small outcrop area of these waters benefit from a low maximum rate of nitrogen 680 fixation, and vice versa. In conclusion, when aiming for a good fit to large-scale biogeochemical quantities, key properties of the underlying circulation model should be considered; depending on region and tracer of interest, these key properties may, however, be different from those of our study.
Cross-validation experiments showed that the optimal parameters could be swapped between the different circulations to a limited extent. Parameters affect biogeochemical model performance in different ways: while the stoichiometric demand of 685 oxygen during remineralisation affects, for example, the ocean oxygen inventory, the particle flux parameter b determines the large-scale distribution of nutrients, in line with earlier studies (Bacastow and Maier-Reimer, 1991;Kwon and Primeau, 2006;Kriest et al., 2012).
Compared to other intercomparisons of global coupled models tuned more subjectively, our overall variation of biogeochemical key properties is at least 50% smaller, with different contributions from circulation and biogeochemical parameters.

690
For example, export production seems to be mainly determined by circulation, while deep particle flux is determined almost entirely by the particle flux parameter b. Other biogeochemical diagnostics are affected more or less equally by circulation and biogeochemical parameters. Finally, OMZ volume is very sensitive to changes in circulation and biogeochemical parameters, and varies most strongly across all model experiments.
However, models considered in global intercomparisons usually differ also in their biogeochemical structure and complexity.

695
Our experiments suggest that after optimisation the differences due to model structure are much smaller than those due to model parameters or circulation. This indicates that a simpler model can perform as well as a more complex model (with respect to the metrics and diagnostics applied here), similar to the results obtained by Kriest et al. (2012), Kwiatkowski et al. (2014) and Galbraith et al. (2015). It also illustrates how biogeochemical parameter optimisation can aid model development: whenever new components or parameterisations are introduced to a global model, this new model has to be tuned in order to match 700 observations on global or regional scales. Often, the choice of appropriate parameters is not easy, and requires extensive testing and sensitivity analysis. Automatic (algorithmic) optimisation can make calibration more efficient, simplifying the search for a good model match to observed quantities. For a given misfit function it can also support decisions about the necessary level of model complexity.
Therefore, our experiments suggest that global biogeochemical ocean models benefit from optimisation, even if this was 705 carried out in a circulation differing from that of the "target" circulation. However, to date there is no guarantee that a model showing a good fit to observed quantities in steady state (i.e., when simulated in a preindustrial or present day, climatological physical forcing) will exhibit the correct response when applied to transient scenarios reflecting future climate change. As shown here and in earlier studies (Cocco et al., 2013;Cabre et al., 2015;Löptien and Dietze, 2019), OMZs seem to be particularly sensitive to both biogeochemical and physical parameters. Accounting for the match between simulated and observed 710 OMZs during optimisation can reduce the model spread in steady state (Sauerland et al., 2019). Extending the simulation of optimal models to future states could then inform us about their sensitivity to changes in circulation and forcing, and may provide a better constraint on their uncertainties. Thus, the study presented here serves as a first step to unravel the uncertainties associated with the divergence of global biogeochemical model performance and uncertainty.
Code and data availability. Model code employed for the experiments presented in this paper, as well as model output are available at Iudicone, D., Rodgers, K., Stendardo, I., Aumont, O., Madec, G., Bopp, L., Mangoni, O., Schmittner et al. (2005), and refer only to experiments 2, 8 and 12. For Par we report the difference between experiments 12 and 13, and for All the maximum spread across experiments 2 to 13. † We refer to Figure 2b of Kwon et al. (2009), but consider only a range of 1.1  b  1.4 for Par, to be comparable to our range of b. ‡ Calculated from export production ⇥(75/2000) 0.9 = 0.052.        OMZ volume (as percent of total ocean volume) bias. OMZs are defined by 50 mmol m 3 . The x-axis denotes the optimal parameter sets of MIT28 ⇤ , ECCO ⇤ , UHigh ⇤ , U20 ⇤ , U17.5 ⇤ and the y-axis the circulation. Pluses along the main diagonal indicate the optimal model simulation for each circulation (i = j), i.e., MIT28 ⇤ , ECCO ⇤ , UHigh ⇤ , U20 ⇤ , U17.5 ⇤ . values. Thin black lines extending from the stars denote the distance between observation and each optimal model configuration.  i denote different combinations of circulation j and parameter set i, andX ⇤ j the average of all optimal model configurations (see Table 3 Table 3. Symbols denote values from Bopp et al. (2013, large squares), Schmittner et al. (2005, small squares), Seferian et al. (2013, circles), Najjar et al. (2007, diamonds), Somes et al. (2013, plus) and Kwon et al. (2009, cross). OMZ volume defined by a concentration of < 50 mmol m 3 , (C) global fixed nitrogen loss, (D) global primary production, (E) export production and (F) organic particle flux at 2000 m. All diagnostics X are expressed as relative deviation to mean of the five optimal model simulations (Xi,j/X ⇤ j 1), where j and i denote different combinations of circulation j and parameter set i, andX ⇤ j is the average of all optimal model configurations (see Table 3 for values). Each panel shows the range of the normalised diagnostic when the parameter set is kept constant and circulation varied (vertical lines), and when circulation is kept constant and the parameter set is varied (horizontal lines). Symbols denote the optimal model configuration. Colour denotes circulation and optimal parameter set. Black: circulation MIT28 or parameter set of MIT28 ⇤ . Thick blue: UHigh/UHigh ⇤ . Medium blue: U20/U20 ⇤ . Thin blue: U17.5/U17.5 ⇤ . Red: ECCO/ECCO ⇤ . Symbols denote optimal model configurations (see also symbol legend in Fig. 6). The grey-shaded area shows total variation across all 25 model experiments, corresponding to All in Table 3. Note that the maximum variation due to parameter within each circulation ( Par of Table 3) is given by the longest horizontal line in each panel. Likewise, the maximum variation due to circulation within each parameter set ( Circ of Table 3) is given by the longest vertical line.