Potential predictability of marine ecosystem drivers

. Climate variations can have profound impacts on marine ecosystems and the socioeconomic systems that may depend upon them. Temperature, pH, oxygen (O 2 ) and net primary production (NPP) are commonly considered to be important marine ecosystem drivers, but the potential pre-dictability of these drivers is largely unknown. Here, we use a comprehensive Earth system model within a perfect modeling framework to show that all four ecosystem drivers are potentially predictable on global scales and at the surface up to 3 years in advance. However, there are distinct regional differences in the potential predictability of these drivers. Maximum potential predictability ( > 10 years) is found at the surface for temperature and O 2 in the Southern Ocean and for temperature, O 2 and pH in the North Atlantic. This is tied to ocean overturning structures with “memory” or inertia with enhanced predictability in winter. Additionally, these four drivers are highly potentially predictable in the Arctic Ocean at the surface. In contrast, minimum predictability is simulated for NPP ( < 1 years) in the Southern Ocean. Potential predictability for temperature, O 2 and pH increases with depth below the thermocline to more than 10 years, except in the tropical Paciﬁc and Indian oceans, where predictability is also 3 to 5 years in the thermocline. This study indicating multi-year (at surface) and decadal (subsurface) potential predictability for multiple ecosystem drivers is intended as a foundation to foster broader community efforts in developing new predictions of marine ecosystem drivers


Introduction
Marine organisms and ecosystems are strongly influenced by seasonal-to decadal-scale climate variations, challenging the sustainable management of living marine resources (Drinkwater et al., 2010;Lehodey et al., 2006).Anomalies in temperature, pH, O 2 and nutrients are important drivers of such climate-induced ecosystem variations (Gattuso et al., 2015;Gruber, 2011).Therefore, skillful predictions of these marine ecosystem drivers have considerable potential for use in marine resource management (Gehlen et al., 2015;Hobday et al., 2016;Payne et al., 2017;Tommasi et al., 2017).
The primary tools for investigating how marine organisms and ecosystems change on seasonal to decadal timescales are Earth system models, where prognostic equations are implemented for biogeochemical cycles.These models are capable of representing both natural variability and transient changes in the marine ecosystem drivers (Bopp et al., 2013;Frölicher et al., 2016).Recently, Earth system models have been used to explore and quantify the predictability of marine biogeochemical tracers.Most of the studies focus on predicting the ocean uptake of carbon (Li et al., 2016(Li et al., , 2019;;Lovenduski et al., 2019;Séférian et al., 2018).
To date, only a few studies have investigated the predictability of marine ecosystem drivers (Chikamoto et al., 2015;Park et al., 2019;Séférian et al., 2014a).An intriguing finding of these studies is that marine biogeochemical drivers may be more predictable than their physical counterparts.Séférian et al. (2014a), for example, showed that net primary productivity (NPP) has greater predictability than sea surface temperature (SST) in the eastern equatorial Pacific.They hy-T.L. Frölicher et al.: Potential predictability of marine ecosystem drivers pothesized that SST is strongly influenced by high-frequency surface fluxes, whereas NPP is more directly impacted by thermocline adjustment processes that determine the rate at which nutrients are brought into the ocean's euphotic layer.Thus, biogeochemical predictions may hold great promise and highlight the need for further investigation.Changes in ecosystem drivers have impacts not only on the surface ocean but also over upper ocean waters spanning the euphotic zone and below, making it important to understand more broadly how ecosystem drivers vary over a range of depths.To our knowledge there is no comprehensive assessment of potential predictability of marine ecosystem drivers at the global scale spanning multiple depth horizons and a comparison of the relative predictability among them.
In this study, we assess the potential predictability of the four marine ecosystem drivers using "perfect model" simulations of a comprehensive Earth system model.We address the following three questions: -To what extent are marine ecosystem drivers predictable at the global scale?
-What are the regional and depth-dependent characteristics of potential predictability?
-Which underlying physical and biogeochemical processes prescribe or limit the potential predictability of marine ecosystem drivers?
This study is organized as follows.First, we introduce the model and methods used to assess the potential predictability in marine ecosystem drivers.Subsequently, the temporal sequencing of potential predictability over global scales for the four marine ecosystem drivers are identified and evaluated for regional differences in potential predictability horizons.Both surface and subsurface manifestations are presented to assess the origin of potential predictability.Finally, we also identify the mechanistic controls on the limits to potential predictability and conclude with a discussion and summary section.

Earth system model: GFDL-ESM2M
For this study we conducted a new 240-member ensemble suite of simulations of 10-year duration each with the Earth system model ESM2M developed at the Geophysical Fluid Dynamics Laboratory (GFDL) of the National Oceanic and Atmospheric Administration (NOAA; Dunne et al., 2012Dunne et al., , 2013)).The GFDL-ESM2M is a fully coupled carbon cycleclimate model.The physical core of the model is based on the physical coupled model CM2.1 (Delworth et al., 2006).The atmospheric model AM2 has a horizontal resolution of 2 • latitude × 2.5 • longitude with 24 vertical levels (Anderson et al., 2004).The land model simulates land water, energy and the carbon cycle, and it has the same horizontal resolution as the atmospheric component.The ocean model MOM4p1 (Griffies, 2012) has 50 vertical levels of varying thickness and a nominal horizontal resolution of 1 • latitude × 1 • longitude, increasing towards the Equator to up to 1/3 • .The sea ice model includes full ice dynamics, three thermodynamic layers and five ice thickness categories and is defined on the same grid as the ocean model (Winton, 2000).
Ocean biogeochemistry and ecology is simulated by the Tracers Of Phytoplankton with Allometric Zooplankton version 2.0 (TOPAZ2; Dunne et al., 2013).TOPAZ2 represents 30 prognostic tracers to describe the cycles of carbon, phosphorus, silicon, nitrogen, iron, alkalinity, oxygen and lithogenic material as well as surface sediment calcite.TOPAZ2 includes three phytoplankton functional groups: small (mostly prokaryotic pico-or nanoplankton), diazotroph (fixing nitrogen from the atmosphere) and large phytoplankton.TOPAZ2 only implicitly simulates zooplankton activity.The growth of phytoplankton depends on the level of photosynthetically active irradiance, nutrients (e.g., nitrate, ammonium, phosphate and iron) and temperature (see Sect. 2.3.2 and Appendix A).

Perfect model framework
We estimated potential predictability within a perfect model experiment.By perturbing the initial conditions of the GFDL-ESM2M and quantifying the spread of initially close model trajectories, the limit of initial condition predictability was assessed.The underlying assumption is that we have a perfect model (e.g., the model accurately represents all physical and biogeochemical processes relevant to assess marine ecosystem drivers at adequate temporal and spatial resolution) and nearly perfect initial conditions and that we exclude the role for external forcing in determining or limiting predictability.Specifically, we first performed a 300-year preindustrial control simulation (black line in Fig. 1), which is branched off a preexisting quasi-steady-state 1000-year preindustrial control simulation.Using this 300-year preindustrial control simulation to provide initial conditions, six 40-member ensemble simulations of 10-year duration each are performed.Each ensemble simulation starts at different times in the control simulation: 1 January in years 22, 64, 106, 170, 232 and 295.The six distinct initialization dates for the individual large ensemble simulations were randomly selected from the 300-year preindustrial control simulation.This was intended to average across biases that may result from predictability being different across different phases of climate modes (e.g., different El Niño-Southern Oscillation phase states) within the preindustrial simulation.Note that the last ensemble exceeds the control simulation by 5 years.Each of the six ensembles consists of 40 ensemble members with micro-perturbations to oceanic initial states but with the same atmospheric, land, ocean biogeochemical, sea ice and iceberg initial conditions.Specifically, for each ensemble member, i = 1, 2, ..., 40, an infinitesimal temperature perturbation δ is added to a single grid cell in the Weddell Sea at 5 m depth, similar to the approach described in Wittenberg et al. (2014) and Palter et al. (2018): Thus, the range of perturbations is evenly spread from −0.002 to 0.002 • C with the unperturbed control case in the center with zero perturbation.As stated above, our model setup encompasses 240 ensemble members, each of 10-year duration and thus 2400 years of model integration in addition to the 300-year-long control simulation.While our perturbation method is in no way optimal in terms of, for example, sampling the likely range of atmospheric-oceanbiogeochemical errors, it is sufficient to generate ensemble spread on the timescales of interest.After just 4 d of simulation time subsequent to the micro-perturbations for each cluster of 40 starting points, the SSTs of all surface ocean grid cells are numerically different from the SST of the control simulation, underscoring the rapidity with which divergences due to nonlinearities in the model express themselves.
The method applied here mirrors that of Griffies and Bryan (1997a), Msadek et al. (2010), and Wittenberg et al. (2014) and emphasizes the amplitude (but not the phase) of perturbations to identify potential predictability.Our perturbation method produces ensemble experiments likely to give the upper limit of the model predictability, hence the term potential predictability.Nevertheless, it warrants mentioning here that studies have been published arguing that predictability in the real world for some variables may even be larger than estimated with the perfect modeling framework within an Earth system model in cases where the ratio of the predictable mode to model noise is underestimated (Eade et al., 2014;Kumar et al., 2014).

Analysis methods
We calculate the potential predictability for the four marine ecosystem drivers: temperature, pH, O 2 and NPP.In the following, NPP is always integrated over the upper 100 m, whereas temperature, pH and O 2 are analyzed at different depth levels.In addition to identifying the upper limits of predictability of these variables within the Earth system model, an equally important objective is to identify the relative predictability of the four variables under consideration.

Assessment of potential predictability
The prognostic potential predictability (PPP) is the main metric used in this study to assess predictability.The PPP is the ratio between the variance among the ensemble members at a given time t after the initialization and the temporal variance of an undisturbed control simulation.The PPP is calculated following Griffies and Bryan (1997b) and Pohlmann et al. (2004): where X ij is the value of a given variable for the j th ensemble and ith ensemble member, X j is the mean of the j th ensemble over all ensemble members, σ 2 c is the variance of the control simulation, N is the total number of different ensemble simulations (N = 6) and M the number of ensemble members (M = 40).The variance of the control simulation is calculated for each month of the year separately to exclude the seasonality from the natural variability, i.e., only the natural variability at that month in the seasonal cycle is considered.PPP equal to unity constitutes perfect predictability.An F test is applied to estimate a significant difference between the ensemble variance and the variance of the control run.With N = 6 and M = 40, predictability is achieved with a 95 % confidence level when PPP ≥ 0.183.
The predictability time horizon is defined as the lead time at which PPP falls below the predictability threshold (Fig. 1b).To calculate global means, all metrics are first calculated at each individual grid cell and then averaged with area weighting over the global ocean.

Taylor deconvolution method to identify mechanistic controls of predictability
To understand the processes behind the simulated predictability, we applied a first-order Taylor-series deconvolution method to decompose the normalized ensemble variance of pH, O 2 and NPP into contributions from their physical and biogeochemical driver variables: where σ denotes the standard deviation among the ensemble members of the different variables.Specifically, the Taylor deconvolution method is applied to decompose the normalized ensemble variance for f of pH, O 2 and NPP into the contribution from their physical and biogeochemical drivers by expressing the ensemble variance and the variance of the control run from Eq. (2) in terms of Eq. (3).The partial derivatives in Eq. ( 3) are calculated at the point p = x, where x is the mean value of the corresponding driver variables over the entire control simulation.The changes in pH are attributed to changes in temperature, salinity, total alkalinity (Alk) and total dissolved inorganic carbon (DIC).Here, we assume that variations in phosphate and silicate are negligible.
Dissolved oxygen (O 2 ) is decomposed into an oxygen solubility component O sol 2 and an apparent oxygen utilization (AOU) component using (e.g., Frölicher et al., 2009) O sol 2 is the solubility of oxygen, which depends nonlinearly on temperature and salinity (Garcia and Gordon, 1992).The difference between diagnosed O sol 2 and simulated O 2 is AOU.Variations in AOU reflect changes in oxygen consumption and ocean ventilation.Earlier studies demonstrated that changes in AOU are typically associated with changes in ventilation, as simulated changes in the remineralization rates of organic material and in associated O 2 consumption are relatively small (Gnanadesikan et al., 2012).
NPP can be decomposed into the contributions from the three phytoplankton groups simulated in the TOPAZ2 model: where NPP Sm , NPP Di and NPP Lg are the contributions from small, diazotroph and large phytoplankton, respectively.At any time t the NPP for all phytoplankton groups "phyto" is given by the phytoplankton stock P phyto times the phytoplankton growth rate µ phyto : The growth rate µ Sm of the small phytoplankton is parameterized using a maximum growth rate µ max , which is limited by nutrients N lim , light L lim and temperature T f (see Appendix A for further details): Note that grazing, sinking and other loss processes impact phytoplankton stock, but these processes in TOPAZ2 are only a function of steady-state growth and biomass implicit grazing formulation, and they exert no separate dynamic control.Therefore they do not require separate consideration.

Potential predictability at the ocean surface
The change in globally averaged annual PPP over time is very similar for all four marine ecosystem drivers at the surface, i.e., the PPP decreases exponentially over lead time for all four drivers (solid thick lines in Fig. 2).After 3 years, the PPP falls below the predictability threshold (dashed line in Fig. 2), indicating that the global predictability time horizon is about 3 years for all four ecosystem drivers.The seasonality in PPP (solid thin lines in Fig. 2) as well as the differences among the four drivers are very small at the global scale.At the regional scale, the predictability time horizon shows distinct structured patterns and also large differences between each of the four different marine ecosystem drivers (Fig. 3).In general, SST (Fig. 3a), surface pH (Fig. 3b  importance for resource management, given the high density of neighboring human populations. The NPP predictability time horizon pattern (Fig. 3d) is fundamentally different from the patterns of the other three ecosystem drivers.NPP has long predictability time horizons (6-10 years) in the midlatitudes, where the annual mean NPP is generally small (indicated with contour lines in Fig. 3d), but very short predictability time horizons of 0-1 years in the Southern Ocean, the North Atlantic and the Pacific, as well as short predictability time horizons of 1-3 years in the tropical oceans, where annual mean NPP is high (Fig. 3d).The spatial pattern of the predictability time horizon and the sequencing of predictability among the ecosystem drivers is very similar when using two other metrics for potential predictability, indicating that our results do not depend on the predictability metric used (Appendix B).
We further average the local potential predictability across 17 biogeographical biomes (Fig. 4) to highlight the pronounced seasonal cycle in predictability for some variables in particular biomes.The biomes capture patterns of largescale biogeochemical function at the basin scale and are defined by distinct SSTs, maximum mixed-layer depths, maximum ice fractions and summer chlorophyll concentrations (Fay and McKinley, 2014).As shown in Fig. 4, potential predictability exhibits strong seasonality for SST, surface O 2 and surface pH in the North Atlantic (biomes 8, 9, 10 and 11), in the Southern Ocean (biomes 15 and 16) and in the subtropical/subpolar gyre boundary region of the North Pacific (biome 3).In all these biomes, predictability is higher during the cold season (boreal and austral winter) and lower during the warm season.The biomes with high seasonality in PPP are also the regions which generally show larger predictability in the annual mean.The PPP values of SST and surface O 2 have almost identical seasonal amplitudes, while the seasonal amplitude of the surface pH is generally smaller compared to SST and surface O 2 seasonal amplitude.Interestingly, the PPP for NPP generally shows no large differences amongst the seasons, except in biome 8, which is influenced by seasonal sea ice retreat and growth.Figure 4 also reveals other interesting characteristics of PPP.For example, the changes in PPP over lead time are very small, but they fluctuate around the predictability threshold for NPP in biome 10 and for SST and O 2 in biome 8, making the predictability horizon in some biomes for some variables very sensitive to small changes in PPP.In addition, the PPP for NPP in the eastern equatorial Pacific (biome 6) shows large interannual variations with lead time, indicating that even more ensemble members are needed to robustly assess the predictability there.The PPP for SST in biome 17 (around Antarctica) is even negative, indicating a higher variance simulated in the ensemble simulations than simulated in the 300-year preindustrial control simulation.

The role of the subsurface ocean in the potential predictability of marine ecosystem drivers
Next, we assess the predictability time horizon for temperature, O 2 and pH in the top 1000 m (Figs. 5 and 6).In theory, the subsurface ocean should be expected to be predictable longer than the surface layer, as the subsurface is not directly coupled to the high-frequency and relatively unpredictable variability of the atmosphere.Indeed, the potential predictability for temperature, oxygen and pH rapidly increases with depth at the global scale (Fig. 5a-c).Below 300 m, the predictability time horizon of all three ecosystem drivers exceeds a decade; i.e., the PPP is still larger than the predictability limit (depth levels with no hatching in Fig. 5ac).Interestingly, the PPP at depth changes more rapidly with time for temperature than for O 2 and pH.In fact, the PPP for temperature is constant below 500 m for a given year; i.e., the PPP value does not change with depth.This is different for O 2 and pH, for which the PPP increases with all depth levels.Clearly, the overall increasing potential predictability with depth can be attributed to the increasing disconnection of the deeper ocean with the surface ocean (see also Sect.3.3).However, the biogeochemical processes lead to enhanced predictability below 500 m for O 2 and pH, relative to temperature.The global mean picture of Fig. 5a-c obscures some interesting seasonal features at the regional scale, which are highlighted in Fig. 5d-f for the North Atlantic.Even though the North Atlantic is among the regions with the largest potential predictability at the ocean surface, the predictability at 1000 m depth for pH and O 2 is smaller in the North Atlantic than the global average at the same depth (Fig. 5d-f), especially in boreal winter.For example, the PPP in winter of year 3 for pH is 0.6 at the global scale at 400 m depth (Fig. 5b) but only 0.3 in the North Atlantic (Fig. 5e).The  strong connection in the Atlantic between the ocean surface and the upper 1000 m in winter increases the predictability but at the same time decreases the potential predictability within the subsurface.Interestingly, this effect is also visible for temperature but confined to the upper few hundred meters.The reason is that anomalies from the ocean surface do not penetrate as deep for pH and O 2 as they do for temperature.
Figure 6 shows the spatial pattern of the predictability time horizon for ocean temperature, O 2 and pH at 300 m (panels a-c) and 1000 m (panels d-f) depth, respectively.Although the predictability time horizon is close to 10 years below 300 m on global average, there are specific regions with a reduced predictability time horizon.At 300 m, these regions are the tropical Pacific, the Indian Ocean and parts of the Southern Ocean (Fig. 6a-c).In the equatorial Pacific and Indian Ocean averaged over 20 • N and 20 • S, the predictability is 4 years for temperature and 7 years for O 2 and pH.For temperature and O 2 , the predictability time horizon drops to values lower than 5-6 years in the eastern equatorial Atlantic.At 1000 m depth (Fig. 6d), the spatial pattern of temperature predictability time horizon is similar to the one at 300 m.Large parts of the equatorial Pacific and the Indian Ocean still show relatively short predictability time horizons.This is not the case for O 2 and pH, for which the predictability time horizon largely increases at 1000 m depth compared to 300 m depth in the eastern equatorial Pacific and in the Indian Ocean as well as in the Southern Ocean, so that the predictability time horizon of both O 2 and pH is up to 10 years almost everywhere.Only the western equatorial Pacific (for pH) and the central equatorial Pacific (for O 2 ) are characterized by reduced potential predictability at 1000 m (predictability time horizons lower than 8 years).

Deconvolution into physical and biogeochemical control processes
The predictability patterns and timescales presented in the previous sections are investigated next for their underlying dynamical and/or biogeochemical controls.For SST, we compare our findings with previous studies that attributed SST predictability to particular processes.In order to understand the dynamical and biogeochemical control processes of O 2 , pH and NPP and to quantify their contribution, we apply a Taylor deconvolution method (see Sect. 2.3.2).It is important to note that a large contribution of a particular driver to the potential predictability of O 2 , pH and NPP does not imply a long predictability time horizon of that driver.In addition, the contribution of a process depends not only on its potential predictability (captured by the variance terms in Eq. 3) but also on the potential interaction with the other drivers (covariance terms in Eq. 3).

Sea surface temperature
The long predictability time horizon of SST in the North Atlantic between 40 and 70 • N (Fig. 3a) is consistent with previous findings (Boer, 2004;Collins et al., 2006;Griffies and Bryan, 1997a;Pohlmann et al., 2004).The SST in the North Atlantic experiences low-frequency variability that is linked to the Atlantic Meridional Overturning Circulation (AMOC; Buckley and Marshall, 2016).In GFDL-ESM2M, the AMOC experiences strong low-frequency variability, consistent with Msadek et al. (2010), and its predictability time horizon is about 9 years (Fig. C1).Similarly, the Southern Ocean surface waters are also strongly connected to the deep ocean (Morrison et al., 2015), and slow subsurface ocean processes there give rise to decadal predictability in SST (Marchi et al., 2019;Zhang et al., 2017).In CM2.1, the peak in the power spectrum of deep convection in the Weddell Sea is simulated to lie between 70 and 120 years (Zhang et al., 2017).
In the North Atlantic and the Southern Ocean, the potential predictability is enhanced during the winter period (Fig. 4), as the surface waters are especially well connected with the deep ocean during the cold season.The long SST predictability time horizon in the Arctic Ocean is due to the overall lowfrequency variability in SST there, because these waters are permanently covered by sea ice in the preindustrial ESM2M control simulation and cannot exchange heat (and carbon) with the atmosphere.This is not the case around the Antarctic continent, where sea ice almost vanishes during austral summer in ESM2M, allowing the surface ocean to exchange heat and carbon with the atmosphere.Therefore, the influence of high-frequency atmospheric variability is large, which leads to diminished predictability time horizons around Antarctica. Moderate predictability time horizons in SST of about 3 to 5 years are simulated in the tropical oceans associated with the coupled atmosphere-ocean system (Boer, 2004).

Dissolved oxygen
To understand the processes that give rise to the O 2 predictability pattern, we use a Taylor deconvolution method (see Sect. 2.3.2) to further split the O 2 predictability into respective O sol 2 and AOU contributions.Figures 7 and 8 show the predictability time horizon of O 2 (identical to patterns shown in Figs.3c and 6c), O sol 2 , AOU and their covariance (panels a, b, c and d) as well as their percentage contribution to the normalized ensemble variance (panels e, f and g) for the surface (Fig. 7) and 300 m depth (Fig. 8).The percentage contribution is defined as the value of a given variance term (first term on the right-hand side of the equal sign in Eq. 3) or covariance term (second term on the right-hand side in Eq. 3), divided by the sum of all absolute variance and covariance values.By combining the information from panels e, f and g (i.e., percentage contribution to total predictability) with the information from panels a, b, c and d (i.e., predictability time horizon), we can attribute the local predictability of O 2 to O sol 2 , AOU or the covariance.For example, if both the percentage contribution and the predictability time horizon of a particular variable are high, then the O 2 predictability is high.If the percentage contribution is generally low for a particular variable, then this variable does not contribute to the overall short or long predictability time horizon of O 2 .
The largest contribution to the normalized variance in O 2 at the surface stems from O sol 2 (Fig. 7) with a globally averaged contribution of 58 %, followed by AOU with 23 % and the covariance between O sol 2 and AOU contributing 19 %.Thus, the O sol 2 predictability time horizon pattern (Fig. 7b) is almost identical to the O 2 predictability time horizon pattern (Fig. 7a or Fig. 3c), i.e., long predictability time horizons in the North Atlantic, Southern Ocean and Arctic and short predictability time horizons in the midlatitudes.As O sol 2 at the ocean surface is mainly controlled by temperature (Garcia and Gordon, 1992), it is not surprising that the time horizon pattern of surface O 2 predictability (Figs.7a and 3c) is also almost identical to the time horizon pattern of SST predictability (Fig. 3a).In the Arctic Ocean and around Antarctica, however, AOU (Fig. 7f) is almost solely responsible for the normalized variance of O 2 .As a result, the predictability time horizon of O 2 (Fig. 7a) is similar to the AOU predictability time horizon (Fig. 7c) in these two regions.The covariance between O sol 2 and AOU overall plays a minor role (Fig. 7g).
The picture is quite different at 300 m depth (Fig. 8), where the largest contribution percentage-wise to the normalized variance of O 2 stems from AOU (64 % on global average), with minor contributions from O sol 2 (13 %) and the covariance between O sol 2 and AOU (23 %).Therefore, the pattern of the AOU predictability time horizon (Fig. 8c) is similar to the pattern of the O 2 predictability time horizon (Fig. 8a).Exceptions are found in the eastern equatorial Pacific, where the covariance dominates (Fig. 8g), and the northern North Atlantic, where O sol 2 dominates (Fig. 8e).The dominance of AOU in explaining subsurface O 2 predictability is also the reason why O 2 predictability generally increases with depth (Fig. 5c), which is not the case for temperature (Fig. 5a).

pH
The predictability characteristics of pH are decomposed into its primary drivers in the marine carbonate system, namely temperature, salinity, DIC and Alk (Fig. 9).Even though the total normalized ensemble variances from the Taylor deconvolution are only approximations of the total real ensemble variances due to nonlinearities in carbonate chemistry, the values of the Taylor deconvolution are always within ±2 % of the real values, giving us confidence in the appropriateness of the Taylor deconvolution method for pH.
At the surface, the largest contribution percentage-wise stems from the covariance between Alk and DIC (Fig. 9j; with 26 % globally averaged), followed by DIC (Fig. 9i; 22 %), Alk (Fig. 9h; 15 %), the covariance between SST and DIC (Fig. 9k; 14 %), and SST (Fig. 9g; 9 %).All other possible contributors such as sea surface salinity and its covariances (including the covariance between SST and Alk) are not discussed further, as their contributions are below 5 %.The pH predictability time horizon at the surface is mainly determined by Alk and DIC and to a lesser extent SST.The long predictability time horizon of pH in the North Atlantic, the Arctic Ocean and the eastern North Pacific and the short predictability time horizon in the tropical regions (Figs.9a  and 3c) are mainly determined by DIC and Alk and the covariance between DIC and ALK.SST plays a role for parts of the North Atlantic.The predictability of pH in the Southern Ocean is mainly determined by DIC, SST and their covariance.Even though SST exhibits enhanced predictability in the Southern Ocean in relation to pH, the short predictability www.biogeosciences.net/17/2061/2020/Biogeosciences, 17, 2061-2083, 2020 time horizon of DIC and the covariance of DIC and SST lead to the overall diminished predictability time horizon for pH relative to SST there.
The pH predictability time horizon at 300 m depth (Fig. 10a) is mainly determined by DIC (accounts for 44 % on a global scale; Fig. 10j) and to a lesser extent by the covariance between DIC and SST (19 %; Fig. 10k) and the covariance between Alk and DIC (15 %; Fig. 10j).Interestingly, the relatively short pH predictability time horizon of about 5 years in the western equatorial Pacific and the northern Indian Ocean is also mainly determined by DIC (Fig. 10d, i) and the covariance between DIC and SST (Fig. 10f, k).The short predictability time horizon of pH in the South Pacific is caused by the covariance between SST and DIC.Again, salinity plays a negligible role (not shown).

Net primary production
To understand the drivers that may set the upper limits of NPP predictability, we first split the NPP into the contributions from small-phytoplankton production (NPP Sm ), large-phytoplankton production (NPP Lg ) and production by diazotrophs (NPP Di ; see Sect.2.3.2 and Appendix A).The largest contribution (i.e., the most important driver of NPP potential predictability) stems from NPP Sm (65 % averaged globally; Fig. 11).The second most important contributor is the covariance between NPP Sm and NPP Lg (19 %) followed by NPP Lg (9 %).Diazotrophs and all other covariances have only a small impact on the predictability of NPP (less than 5 %; not shown in Fig. 11).The large dominance of NPP Sm is not unexpected as the small-phytoplankton production overall dominates the total phytoplankton production in ESM2M (Dunne et al., 2013;Laufkötter et al., 2015).NPP Sm accounts for 84 % of the total NPP at global scales, whereas NPP Lg and NPP Di only account for 14 % and 2 %, respectively.
On regional scales, NPP Sm determines the predictability of NPP almost everywhere (Fig. 11f).Exceptions are the eastern equatorial Pacific and the higher northern latitudes, where NPP Lg (Fig. 11e) and the covariance between NPP Lg and NPP Sm (Fig. 11g) also play a substantial role.Interest- ingly, the NPP Lg (Fig. 11b) has overall a longer predictability time horizon than NPP (Fig. 11a) and NPP Sm (Fig. 11c).
To understand the drivers of small-phytoplankton predictability, we further deconvolve NPP Sm into growth rate and small-phytoplankton stock (Fig. 12; Eq. 6 in Sect.2.3.2).The deconvolution suggests that the largest contribution to the potential predictability on a global scale stems from the small-phytoplankton stock (51 %) followed by the growth rate (31 %) and the covariance between stock and growth rate (18 %).Between 40 • S and 40 • N, the NPP Sm predictability is almost solely determined by the small-phytoplankton stock, with the exception of the eastern equatorial Pacific, where the growth rate is more important.Also, the short NPP Sm predictability time horizon in the North Atlantic mainly originates from the variance of the stock, indicated by the short predictability time horizons of the stock compared to the growth rate there.As we stated previously, NPP has a relatively short potential predictability time horizon over the Southern Ocean compared to the other ecosystem drivers (Fig. 3d).Our analysis shows that small phytoplankton (Fig. 11) and especially the growth rate of the small phytoplankton (Fig. 12) are important for setting this local minimum.
We further deconvolute the drivers of the surface growth rate predictability of small phytoplankton into their temperature, nutrient and light limiting factors (see Eq. 7 in Sect.2.3.2;Fig. 13).As the limiting factors are not saved routinely as three-dimensional fields, we focus here on the growth rate and its limiting factors at the surface.Note that the growth rate predictability time horizon at the surface (Fig. 13a) may differ from the growth rate predictability time horizon integrated over the top 100 m (Fig. 12c), especially in the Southern Ocean and the North Atlantic.At the surface and at the global scale, the largest contribution stems from the nutrient limitation term (50 %) followed by the temperature limitation term (25 %) and the covariance between the temperature and nutrient limitations (13 %).At the regional scale, the nutrient limitation term clearly dominates at midlatitudes (Fig. 13f).In GFDL-ESM2M, the subtropical gyres are mainly iron limited (hatching in Fig. 13f), and therefore iron fundamentally constrains the predictability of the growth rate of small phytoplankton there.Exceptions are the boundary region between the subtropical and subpolar gyre in the North Pacific (nitrate limited) as well as the tropical Atlantic (phosphate and nitrate) and the northern Indian Ocean (phosphate).GFDL-ESM2M's overall strong iron limitation is in contrast to the findings of Moore et al. (2013), who suggest that nitrogen is the limiting nutrient in the subtropical gyres.GFDL-ESM2M is a fully coupled Earth system model and assesses iron limitation through the ability to synthesize chlorophyll.In contrast, Moore et al. (2013)   Note that the terms that do not contribute to pH predictability such as sea surface salinity and the covariances between sea surface salinity and all other terms as well as the covariance between SST and Alk are not shown here.
iron limitation through nutrient uptake alone.The temperature limitation term is dominant in the higher latitudes and the eastern equatorial Pacific (Fig. 13g).The light limitation term only plays a substantial role (up to 20 %) around Antarctica and close to the Arctic sea ice edge (Fig. 13h).
The simulated long predictability time horizon for NPP in the midlatitudes can therefore be attributed to the long predictability time horizon of the nutrient limitation, especially given that the growth rate predictability at the surface is similar to the growth rate predictability integrated over the top 100 m in this region.At latitudes north of 40 • N and south of 40 • S, the temperature limitation is the most important con- tributor.Therefore, the predictability time horizon pattern of the growth rate strongly resembles the one for SST in these regions.In the Southern Ocean, however, the growth rate predictability time horizon at the surface is much longer than the growth rate predictability integrated over the top 100 m, indicating that a process other than temperature (e.g., light limitation) may limit predictability there.

Discussion and conclusion
We set out three goals for this study: (a) assessing the global characteristics of potential predictability for temperature, pH, O 2 and NPP, as a mean to identify an upper bound on our ability to predict conditions for marine ecosystems; (b) assessing regional and depth-dependent characteristics of potential predictability; and (c) identifying the potential mechanisms that limit or increase predictability for the different marine ecosystem drivers.This was pursued within a perfect modeling framework using a comprehensive Earth system model.The analysis revealed that on global scales the predictability time horizon of each variable is surprisingly similar, i.e., 3 years for all four marine ecosystem drivers (Fig. 2; first goal), despite the fact that the regional processes operating are different over a range of scales (second and third goal).This is unexpected, as the ocean processes that sustain the disparate divers should not be expected to have identical memory as pertains to predictability.For example the relatively long predictability time horizon identified for SST and surface O 2 over the subpolar North Atlantic (the SST to be consistent with Griffies and Bryan, 1997a, b;Boer, 2000;Collins et al., 2006;Keenlyside et al., 2008) and the Southern Ocean (consistent with Zhang et al., 2017 andMarchi et al., 2019) is not reflected in NPP.Likewise, the long predictability time horizon of NPP in the subtropical gyres is not simulated for other ecosystem drivers, and the short predictability time horizon of surface pH in the Southern Ocean is reflected in neither SST nor surface O 2 .
Our results suggesting the same global predictability time horizon for all four ecosystem drivers are not inconsistent with time of emergence diagnostics for transient climate warming scenarios where pH (early emergence) and NPP (late emergence) behave oppositely (Frölicher et al., 2016;Rodgers et al., 2015;Schlunegger et al., 2019).Time of emergence is defined as the ratio (large for pH and small for NPP) of the anthropogenic forced change to the background internal variability.Comparing our results with the time of emergence analysis is therefore complicated by the presence of the anthropogenic forced signal in scenario projections.In fact it is the presence of the large invasion flux for CO 2 that renders acidification the most rapidly emergent of the drivers under anthropogenic perturbations, in particular relative to NPP.The similarities between the analyses of predictability and emergence timescales lie in the noise, which is expected to include not only modes of climate variability such as El Niño-Southern Oscillation (ENSO) but also higher-frequency variability such as cloud cover that may impact NPP for both cases.
Our study complements earlier studies which suggested that marine ecosystem drivers may be predictable on multiannual timescales.In contrast to earlier studies (Chikamoto et al., 2015;Park et al., 2019;Séférian et al., 2014b), rather than focusing on a single ecosystem driver, we compare and contrast the potential predictability of four marine ecosystem drivers and also evaluate the processes behind their respective predictability limits.We find that in contrast to SST, these ecosystem drivers depend on a complex interplay between physical and biogeochemical underlying processes.For O 2 , the importance of subsurface AOU reveals a complex interplay between nonlocal circulation and biological consumption, whereas at the surface O 2 is mainly determined by the predictability of SST.For NPP, the growth rate of the small phytoplankton in the Southern Ocean is important for setting the local minimum in predictability time horizon there.The predictability time horizon of surface pH is mainly determined by a complex interplay between DIC and Alk predictability in the low latitudes and DIC, Alk and temperature predictability in high latitudes.Interestingly, we find longer predictability time horizons for SST than for NPP in the equatorial Pacific, which is in contrast to findings of Séférian et al. (2014a).Importantly, this may be indicative of a potential model dependency of the relationship between ecosystem driver predictability.Séférian et al. (2014b) attributed longer NPP predictability time horizons to the idea that the nutrient supply processes that modulate NPP are themselves regulated by thermocline wave adjustment processes, without sizable modulation by surface fluxes.This was framed as being in contrast to the case of SST, where airsea fluxes reflecting higher-frequency variations act to reduce the predictability of SST.In ESM2M, the predictability time horizon for SST in the eastern equatorial Pacific (biome 6 in Fig. 4) is approximately 3.5 years, modestly longer than the predictability time horizon for NPP of approximately 3 years.In ESM2M, NPP is only weakly correlated with changes in upwelling and nutrient supply in the eastern tropical Pacific (as was shown in Fig. 2 of Kwiatkowski et al., 2017)  is confirmed by our analysis showing that nutrient limitation is not the dominant term for explaining the predictability of NPP there.This indicates that less predictable processes occurring over shorter timescales, such as temperature and/or light level variations, influence NPP predictability.Even though we consider our conclusion to be robust, a number of potential caveats warrant discussion.These include the (i) ensemble design of the perfect model simulations (e.g., initialization and number of ensemble members) and (ii) the impact of model formulation and biases.For the first of these caveats, our simulations are all initialized with SST perturbations applied to a single grid cell in the Weddell Sea, and therefore a different spatial perturbation strategy may give different results.However, as the signal at the ocean surface spreads very rapidly (i.e., after 4 d all grid cells at the ocean surface are perturbed) our results are insensitive to the spatial initialization method, at least in the upper ocean.Second, all ensemble simulations start on 1 January of the corresponding simulation year.It has been shown that the forecast skill of seasonal predictions may depend strongly on the way the models are initialized.ENSO forecasts, for example, have a much lower predictability if they are initialized before and through spring (Webster and Yang, 1992).However, as our focus is on annual to decadal timescales, this effect is less important for our analysis.Third, we have employed only six starting points for our 40-member ensemble simulation.Even though all six ensemble simulations branched off at different El Niño-Southern Oscillation states of the preindustrial control simulation, our choice of six macro-perturbations may still introduce aliasing issues that could bias our results.Although the computing resources at our disposal for this study did not allow for expanding the number of starting points, we recommend that future studies with CMIP-class models should expand the number of initialization points to further explore the sensitivity of the results to the starting point of the ensembles.
The second caveat in our study is that we only used one single Earth system model and that our results might depend on the model formulation and resolution.Even though the GFDL-ESM2M model achieves sufficient fidelity in its preindustrial states (Bopp et al., 2013;Dunne et al., 2012Dunne et al., , 2013;;Laufkötter et al., 2015), it is well known that CMIP5generation models have imperfect representation of biogeochemical and physical processes as well as variability over a range of timescales, ranging from weather variability to ENSO variability (Frölicher et al., 2016;Resplandy et al., 2015) to decadal variability (England et al., 2014;McGregor et al., 2014).Different physical and biogeochemical parameterizations within a given model may change the length of the predictability time horizon.For example, TOPAZ2 represents a hypothetically optimal phytoplankton physiology; namely the model assumes that the fastest growing phytoplankton group always wins in all environments via the upper limit in growth rates.In addition, TOPAZv2 represents a steady-state ecosystem, such that there are no time lags between primary production and the grazing response.In the subsurface, the remineralization of particles is set to reproduce the vertical scale of the nutricline on the timescale of sinking particles, and the sinking particle velocity is fast.All three factors may tend to decrease the memory associated with the real-world surface ecosystem and minimize predictability.For the case of weather prediction, it has been argued that the inclusion of stochastic parameterizations increases potential predictability (Palmer and Williams, 2008).To our knowledge, this remains unexplored for marine biogeochemistry and ecosystem drivers.In any case, it would be necessary to repeat our predictability experiments with a set of different Earth system models including different parameterizations of biogeochemical and/or physical ocean processes to investigate the dependence of our result on the model representation (Séférian et al., 2018), in parallel with broader efforts to further evaluate noise characteristics of these models.Additionally, the ocean model resolution of GFDL-ESM2M is rather coarse and cannot represent the critical scales of small-scale structures of circulation.Predictability studies using high-resolution ocean models with improved process representations are therefore needed to explore potential predictability, especially at the local scale.However, it is currently impossible in many cases to con-strain the simulated variability in biogeochemical drivers, especially for the ocean subsurface, with observations due to limited data availability (Frölicher et al., 2016;Laufkötter et al., 2015).
Currently, no global coupled physical-biogeochemical seasonal to decadal forecast system is yet operational (Tommasi et al., 2017).However, our study suggests great promise that physical-biogeochemical forecast systems may have the potential to provide useful information to a wide group of stakeholders, such as, for example, for the management of fisheries (Dunn et al., 2016;Park et al., 2019).Our study therefore underscores the need to further develop integrated physical-biogeochemical forecast systems.Especially in regions with long predictability time horizons, such as the North Atlantic (for temperature, O 2 and pH), the Southern Ocean (for temperature and O 2 ) and midlatitudes (for NPP), installing and maintaining a spatially and temporally dense physical and biogeochemical ocean observing system would have the potential to significantly improve the effective predictability of marine ecosystem drivers.

Figure 1 .
Figure 1.Illustration of the model setup and the calculation of the predictability time horizon.(a) Simulated global mean SST of the 300-year reference control simulation (black line) and of the six 10-year-long 40 ensemble simulations (red lines).(b) Global mean SST anomaly (i.e., deviation from the control simulation) for the ensemble simulation starting in the year 170.The thick red line indicates the period over which SST is predictable (i.e., PPP ≥ 0.183), and thin red lines indicate the period over which SST is unpredictable (i.e., PPP < 0.183).The dashed horizontal lines indicate 1 standard deviation of the control simulation, and the vertical line indicates the predictability time horizon.
) and surface O 2 (Fig. 3c) share similar predictability time horizon patterns with short predictability time horizons (1-2 years) between 20 and 40 • in both hemispheres, intermediate predictability time horizons (3-5 years) in the tropical oceans, and long predictability time horizons (> 10 years) in the North Atlantic between 40 and 70 • N, in the Southern Ocean between 40 and 65 • S (except for surface pH) and in the Arctic Ocean.Interestingly, the potential predictability time horizon of surface pH is short relative to SST and surface O 2 in the Southern Ocean but longer over both the Caribbean and the eastern subtropical North Pacific relative to SST.The Caribbean and the eastern North Pacific are both regions of

Figure 2 .
Figure 2. Globally averaged prognostic potential predictability (PPP) for all four marine ecosystem drivers at the surface, except for NPP which is integrated over the top 100 m.Monthly mean (thin lines) and annual mean (thick lines) values of PPP are shown.The horizontal black dashed line represents the predictability threshold.If PPP is above (below) the predictability threshold, the driver is potentially predictable (unpredictable) as indicated with the arrows on the right-hand side.The PPP has first been calculated at each grid cell and then averaged globally.

Figure 3 .
Figure 3. Predictability time horizon for (a) SST, (b) surface pH, (c) surface O 2 and (d) NPP integrated over the top 100 m using PPP as a predictability measure.The red contour lines in panel (d) indicate the annual mean total nitrogen production in moles of nitrogen per kilogram per year averaged over the 300-year preindustrial control simulation to highlight regions with low and high NPP.In panel (d) regions north of 69 • N and south of 69 • S have been excluded since NPP is zero during wintertime there.

Figure 4 .
Figure 4. PPP for all four ecosystem drivers averaged over 17 different biomes at the surface, except for NPP, which is integrated over the top 100 m.Monthly means are shown as thin lines, and annual means are shown as thick lines.The horizontal dashed black lines in each panel represent the predictability threshold.The lower right panel shows the boundaries and the geographical location of biomes 1 to 17.

Figure 6 .
Figure 6.Spatial pattern of the predictability time horizon at (a-c) 300 m and (d-f) 1000 m depth for (a, d) ocean temperature, (b, e) pH and (c, f) dissolved oxygen.

Figure 7 .
Figure 7. Spatial pattern of the (a-d) predictability time horizons and (e-g) contribution of different terms to the predictability of oxygen at the surface.(a-d) Predictability time horizon for (a) O 2 , (b) O sol 2 , (c) AOU and (d) covariance between O sol 2 and AOU.(e-g) Percentage contributions of (e) O sol 2 , (f) AOU and (g) covariance between O sol 2 and AOU relative to the sum of all terms.Red shading in panels (e)-(g) represents positive absolute values of the variance and covariance terms.The percentage contributions are shown as averages over the entire 10 years of the simulations.The percentage contributions do not change substantially over the 10 years (always within ±5 % of the 10-year averages).

Figure 9 .
Figure 9. Spatial pattern of the (a-f) predictability horizons and (g-k) contribution of different terms to the predictability of pH at the surface.(a-f) Predictability time horizon for (a) pH, (b) SST, (c) Alk, (d) DIC, and the covariance between (e) Alk and DIC and (f) DIC and SST.(g-k) Percentage contributions of (g) SST, (h) Alk, (i) DIC, and covariance of (j) ALK and DIC and (k) DIC and SST relative to the sum of all terms.Red shading in panels (g)-(k) represents positive absolute values of the variance and covariance terms.The percentage contributions are shown as averages over the entire 10 years of the simulations.The percentage contributions do not change substantially over the 10 years (always within ±5 % of the 10-year averages).Note that the terms that do not contribute to pH predictability such as sea surface salinity and the covariances between sea surface salinity and all other terms as well as the covariance between SST and Alk are not shown here.

Figure 11 .
Figure 11.Spatial pattern of the (a-d) predictability horizons and (e-g) contribution of different terms to the predictability of NPP integrated over the top 100 m. (a-d) Predictability time horizon for (a) NPP, (b) large-phytoplankton production NPP Lg , (c) small-phytoplankton production NPP Sm , and (d) the covariance between NPP Lg and NPP Sm .(e-g) Percentage contributions of (e) NPP Lg , (f) NPP Sm , and (g) covariance of NPP Lg and NPP Sm relative to the sum of all terms.Red shading in panels (e)-(g) represents positive absolute values of the variance and covariance terms.The percentage contributions are shown as averages over the entire 10 years of the simulations.The percentage contributions do not change substantially over the 10 years (always within ±5 % of the 10-year averages).Note that the terms that do not substantially contribute to NPP predictability such diazotrophs (NPP Di ) and the covariances between NPP Di and all other terms are not shown here.

Figure 12 .
Figure 12.Spatial pattern of the (a-d) predictability horizons and (e-g) contribution of different terms to the predictability of smallphytoplankton production (NPP Sm ) integrated over the top 100 m. (a-d) Predictability time horizon for (a) NPP Sm , (b) small-phytoplankton stock, (c) growth rate of small phytoplankton, and (d) the covariance between the stock and the growth rate of small phytoplankton.(eg) Percentage contributions of (e) stock, (f) growth rate, and (g) covariance of stock and growth rate relative to the sum of all terms.Red shading in panels (e)-(g) represents positive absolute values of the variance and covariance terms.The percentage contributions are shown as averages over the entire 10 years of the simulations.The percentage contributions do not change substantially over the 10 years (always within ±5 % of the 10-year averages).

Figure 13 .
Figure 13.Spatial pattern of the (a-e) predictability horizons and (f-i) contribution of different terms to the predictability of the smallphytoplankton growth at the surface.(a-e) Predictability time horizon for (a) growth rate of small phytoplankton, (b) nutrient limitation, (c) temperature limitation, (d) light limitation, and (e) the covariance between the temperature and nutrient limitation.(f-i) Percentage contributions of (f) nutrient limitation, (g) temperature limitation, (h) light limitation, and (i) covariance between temperature and nutrient limitation relative to the sum of all terms.Red shading in panels (f)-(i) represents positive absolute values of the variance and covariance terms.The percentage contributions are shown as averages over the entire 10 years of the simulations.The percentage contributions do not change substantially over the 10 years (always within ±5 % of the 10-year averages).Note that the terms that do not substantially contribute to NPP predictability covariances between temperature and light and nutrients are not shown here.The hatching in panel (f) indicates the limiting nutrients as obtained from the 300-year-long preindustrial control simulation.