Climate-induced interannual variability of marine primary and export production in three global coupled climate carbon cycle models

. Fully coupled climate carbon cycle models are sophisticated tools that are used to predict future climate change and its impact on the land and ocean carbon cycles. These models should be able to adequately represent natural variability, requiring model validation by observations. The present study focuses on the ocean carbon cycle component, in particular the spatial and temporal variability in net primary productivity (PP) and export production (EP) of particulate organic carbon (POC). Results from three coupled climate carbon cycle models (IPSL, MPIM, NCAR) are compared with observation-based estimates derived from satellite measurements of ocean colour and results from inverse modelling (data assimilation). Satellite observations of ocean colour have shown that temporal variability of PP on the global scale is largely dominated by the permanently stratiﬁed, low-latitude ocean (Behrenfeld et al., 2006) with stronger stratiﬁcation (higher sea surface temperature; SST) being associated with negative PP anomalies. Results from all three coupled models conﬁrm the role of the low-latitude, permanently stratiﬁed ocean for anomalies in globally integrated PP, but only one model (IPSL) also reproduces the inverse relationship between stratiﬁcation (SST) and PP.


Introduction
Marine net primary productivity (PP) is a key process in the global carbon cycle, controlling the uptake of dissolved inorganic carbon (DIC) in the sunlit surface waters of the ocean and its transformation into organic carbon (OC). Subsequent gravitational sinking of detrital particulate organic carbon (POC) through the water column results in the export of POC (EP) from the surface into the ocean's interior, where it becomes partly or entirely remineralised and eventually transported back to the surface as DIC and nutrients. The export of organic matter leads to a depletion in DIC and nutrients in the surface and an enrichment in the deep. Without this biological cycle, surface water pCO 2 and consequently atmospheric CO 2 would be higher than observed (Volk and Hoffert, 1985). However, neither absolute values for global annual PP and EP nor their spatial and temporal variability are well known from direct observations. Changes in ocean circulation and nutrient cycling from climate change will impact PP and EP differently, requiring a better understanding of the controlling mechanisms.
Published by Copernicus Publications on behalf of the European Geosciences Union. Satellite measurements of ocean colour have been used to derive surface water chlorophyll concentrations (Chl), phytoplankton carbon biomass (C phyto ), and PP (Behrenfeld et al., 2006(Behrenfeld et al., , 1997Carr et al., 2006). These methods have the advantage in that they provide large spatial and temporal coverage of vast ocean areas. Reference measurements from shipbased observations, however, are still sparse. Complex algorithms lead stepwise from ocean colour measurements to Chl concentrations and C phyto , and then from Chl, C phyto , light, mixed layer depth, and temperature to PP, and sometimes even further to EP estimates. These steps include a number of assumptions concerning, for example, vertical and temporal resolution of the parameters to be determined, which increases the uncertainty for the results obtained after each step. For example, Carr et al. (2006) examined results from 24 different methods to determine PP from ocean colour and seven general circulation models (GCMs), finding a factor of two difference between global bulk estimates for PP, that range from 35 to 78 Gt C yr −1 . A similar spread is found for both types of methods, satellite colour algorithms (35 to 68 Gt C yr −1 ) and GCMs (37 to 78 Gt C yr −1 ). Nevertheless, patterns of spatial and temporal variability of PP are similar between different approaches, giving a first indication of the spatio-temporal variability of PP.
Export fluxes of organic carbon (OC) are even harder to constrain than PP. They are difficult to be measured directly and in some approaches have to be referred to a certain depth level, which is defined differently across studies (Dunne et al., 2005;Oschlies and Kähler, 2004;Laws et al., 2000;Schlitzer, 2000), complicating comparison. Observationbased estimates suggest that global POC export production is in the range of 11 to 22 Gt C yr −1 (Laws et al, 2000;Schlitzer et al., 2000;Eppley and Peterson, 1979).
Ocean circulation and mixing is an important governing factor for biological productivity and organic matter export. It controls the transport of nutrients into the euphotic zone and thus nutrient availability for marine biological production. Najjar et al. (2007) found that the global carbon export (POC and DOC) varied from 9 to 28 Gt C yr −1 in 13 different ocean circulation models using the same biogeochemical model (OCMIP-2). Those models who are able to realistically reproduce radiocarbon and CFC distributions (Matsumoto et al., 2004) yield POC export in a range of 6-13 Gt C yr −1 . The importance of realistic physics has also been highlighted by Doney et al. (2004) using the same 13 ocean models. The export of organic matter is, however, an important quantity to constrain as it describes the amount of OC that is transported from the surface ocean to depth, causing a vertical gradient of dissolved inorganic carbon (DIC) in the water column (Volk and Hoffert, 1985). Potential changes in export may alter the exchange and partitioning of carbon between oceanic and atmospheric reservoirs.
A quantitative understanding of the processes that control PP and EP and their implementation in coupled climate biogeochemical models is essential to project the effect of future climate change on marine productivity, carbon export fluxes (Bopp et al., 2005;Maier-Reimer et al., 1996) and their possible feedbacks on the climate system (Friedlingstein et al., 2006;Plattner et al., 2001;Joos et al.,1999). Unfortunately, productivity and export are not well constrained by direct observations, making it difficult to validate corresponding results from climate models. Productivity estimates from coupled models and satellite observations are largely independent in construction, and cross-comparison of the two approaches provides a promising way to assess their overall skill and identify the main underlying mechanisms that control PP and EP variability. To do so, this study investigates results from three fully coupled climate carbon cycle models (IPSL, MPIM, NCAR) that include interactions between the atmosphere, ocean circulation and sea-ice, marine biogeochemical cycles and the terrestrial biosphere. As all three models differ in their major components (atmosphere, ocean, terrestrial, and marine biospheres), the aim of this study is to give a description of the present day PP and EP as simulated by coupled models. We use coupled models, because it would not be sufficient to investigate the primary and export production in a (far cheaper) set of forced ocean-only model experiments, since climate in the fully coupled models (e.g. ocean currents and resulting nutrient distributions, cloud cover, and resulting insolation) will most likely differ from any reanalysed state.

Methodology
Modelled circulation fields are compared with observations of temperature (T ), salinity (S), mixed layer depth (MLD) and water mass transports of the Atlantic Meridional Overturning Circulation (AMOC). To assess the models' capability to reproduce El Nino Southern Oscillation (ENSO) variability, maximum entropy power spectra of sea surface temperatures (SST) from the equatorial Pacific are computed. The representation of marine biogeochemical cycles is assessed by comparing modelled with observed PO 3− 4 concentrations, which in the current study are fully prognostic in contrast to the former model simulations of the OCMIP-2 study, where PO 3− 4 was restored (Najjar et al., 2007). The evaluation of PP covers global annual mean fields, global integrals, seasonal cycle and interannual variability. We compare model results with PP derived from satellite measurements of ocean colour and explain the main mechanisms causing interannual variability of simulated PP. Global annual mean fields and global integrals for EP from the models are also compared with observation-based estimates. Thereby, we identify regions where EP reacts most sensitive to interannual climate variability. The present day situation of PP and EP from our results can be taken to estimate the impact of future climate change on marine PP and EP.  Collier and Durack, 2006;Conkright et al., 2002) to assess the reliability of modelled ocean circulation fields and biogeochemical cycles. Furthermore, the representation of the maximum mixed layer depth, which is a dynamically important variable for water mass formation, light limitation, and nutrient entrainment, is assessed by comparison with observations from de Boyer- Montégut et al. (2004). Modelled fields of PP are compared with PP derived from ocean colour (Behrenfeld et al., 2006, Behrenfeld andFalkowski, 1997) (http://web.science.oregonstate.edu/ ocean.productivity/onlineVgpmSWData.php) and EP distributions are compared with results from (inverse) modelling, which refer to a depth of 133 m (Schlitzer, 2000) and 100 m (Laws et al., 2000), respectively. As neither for PP nor for EP appropriate in-situ data are available, the latter is only a model comparison.

Models
All models used in this study are fully coupled 3-D atmosphere-ocean climate models that contributed to the IPCC Fourth Assessment Report (AR4; Solomon et al., 2007;Meehl et al., 2007). The models include carbon cycle modules for the terrestrial and oceanic components (Friedlingstein et al., 2006).

IPSL
The IPSL-CM4-LOOP (IPSL) model consists of the Laboratoire de Météorologie Dynamique atmospheric model (LMDZ-4) with a horizontal resolution of about 3 • ×3 • and 19 vertical levels (Hourdin et al., 2006), coupled to the OPA-8 ocean model with a horizontal resolution of 2 • ×2 • ·cosφ and 31 vertical levels and the LIM sea ice model (Madec et al., 1998). The terrestrial biosphere is represented by the global vegetation model ORCHIDEE (Krinner et al., 2005), the marine carbon cycle by the PISCES model (Aumont et al., 2003). PISCES simulates the cycling of carbon, oxygen, and the major nutrients determining phytoplankton growth (PO 3− 4 , NO − 3 , NH + 4 , Si, Fe). Phytoplankton growth is limited by the availability of nutrients, temperature, and light. The model has two phytoplankton size classes (small and large), representing nanophytoplankton and diatoms, as well as two zooplankton size classes (small and large), representing microzooplankton and mesozooplankton. For all species the C:N:P ratios are assumed constant (122:16:1; Anderson and Sarmiento, 1994), while the internal ratios of Fe:C, Chl:C, and Si:C of phytoplankton are predicted by the model. Iron is supplied to the ocean by aeolian dust deposition and from a sediment iron source. During biological production it is taken up by the plankton cells and released during remineral-isation. Scavenging of iron onto particles is the sink for iron to balance external input. There are three non-living components of organic carbon in the model: semi-labile dissolved organic carbon (DOC), with a lifetime of several weeks to years, as well as large and small detrital particles, which are fuelled by mortality, aggregation, fecal pellet production and grazing. Small detrital particles sink through the water column with a constant sinking speed of 3 m day −1 , while for large particles the sinking speed increases with depth from a value of 50 m day −1 at the depth of the mixed layer, increasing to a maximum sinking speed of 425 m day −1 at 5000 m depth. For a more detailed description of the PISCES model see Aumont and Bopp (2006) and Gehlen et al. (2006). Further details and results from the fully coupled model simulation of the IPSL-CM4-LOOP model are given in Friedlingstein et al. (2006).

MPIM
The Earth System Model employed at the Max-Planck-Institut für Meteorologie (MPIM) consists of the ECHAM5 (Roeckner et al., 2006) atmospheric model of 31 vertical levels with embedded JSBACH terrestrial biosphere model and the MPIOM physical ocean model, which further includes a sea-ice model (Marsland et al., 2003) and the HAMOCC5.1 marine biogeochemistry model (Maier-Reimer et al., 2005). The coupling of the marine and atmospheric model components, and in particular the carbon cycles is achieved by using the OASIS coupler.
HAMOCC5.1 is implemented into the MPIOM physical ocean model (Marsland et al., 2003) using a curvilinear coordinate system with a 1.5 • nominal resolution where the North Pole is placed over Greenland, thus providing relatively high horizontal resolution in the Nordic Seas. The vertical resolution is 40 layers, with higher resolution in the upper part of the water column (10 m at the surface to 13 m at 90 m). The marine biogeochemical model HAMOCC5.1 is designed to address large-scale, long-term features of the marine carbon cycle, rather than to give a complete description of the marine ecosystem. Consequently, HAMOCC5.1 is a NPZD model with two phytoplankton types (opal and calcite producers) and one zooplankton species. The carbonate chemistry is identical to the one described in Maier-Reimer (1993). A more detailed description of HAMOCC5.1 can be found in Maier-Reimer et al. (2005), while here only the main features relevant for the described experiments will be outlined.
Marine biological production is limited by the availability of phosphorous, nitrate, and iron. Silicate concentrations are used to distinguish the growth of diatoms and coccolithophorides: if silicate is abundant, diatoms grow first, thereby reducing the amount of nutrients available for coccolithophoride growth. The production of calcium carbonate shells occurs in a fixed ratio of the phytoplankton growth (0.2). The model also includes cyanobacteria that take up nitrogen from the atmosphere and transform it immediately into nitrate. Please note that biological production is temperature-independent, assuming that phytoplankton acclimate to local conditions. Global dust deposition fields are used to define the source function of bioavailable iron. Removal of dissolved iron occurs through biological uptake and export and by scavenging, which is described as a relaxation to the deep-ocean iron concentration of 0.6 nM. In the experiments used here, export of particulate matter is simulated using prescribed settling velocities for opal (30 m day −1 ), calcite shells (30 m day −1 ) and organic carbon (10 m day −1 ). Remineralisation of organic matter depends on the availability of oxygen. In anoxic regions, remineralisation using oxygen from denitrification takes place.
HAMOCC5.1 also includes an interactive module to describe the sediment flux at the sea floor. This component simulates pore water chemistry, the solid sediment fraction and interactions between the sediment and the oceanic bottom layer as well as between solid sediment and pore water constituents.

NCAR
The physical core of the NCAR CSM1.4 carbon climate model (Doney et al., 2006;Fung et al., 2005) is a modified version of the NCAR CSM1.4 coupled physical model, consisting of ocean, atmosphere, land and sea-ice components integrated via a flux coupler without flux adjustments (Bolville et al., 2001;Bolville and Gent, 1998). The atmospheric model CCM3 is run with a horizontal resolution of 3.75 • and 18 levels in the vertical (Kiehl et al., 1998). The ocean model is the NCAR CSM Ocean Model (NCOM) with 25 levels in the vertical and a resolution of 3.6 • in longitude and 0.8 • to 1.8 • in latitude . The water cycle is closed through a river runoff scheme, and modifications have been made to the ocean horizontal and vertical diffusivities and viscosities from the original version (CSM1.0) to improve the equatorial ocean circulation and interannual variability. The sea ice component model runs at the same resolution as the ocean model, and the land surface model runs at the same resolution as the atmospheric model.
The CSM1.4-carbon model includes a modified version of the terrestrial biogeochemistry model CASA (Carnegie-Ames-Stanford Approach) (Randerson et al., 1997), and a derivate of the OCMIP-2 (Ocean Carbon-Cycle Model Intercomparison Project Phase 2) ocean biogeochemistry model (Najjar et al., 2007). In the ocean model the biological source-sink term has been changed from a nutrient restoring formulation to a prognostic formulation, thus biological productivity is modulated by temperature, surface solar irradiance, mixed layer depth, and macro-and micronutrients (PO 3− 4 , and iron). Following the OCMIP-2 protocols (Najjar et al., 2007), total biological productivity is partitioned 1/3 into sinking POC flux, equivalent to EP, and 2/3 into the formation of dissolved or suspended organic matter, much of which is remineralised within the model euphotic zone. Total productivity thus contains both new and regenerated production, though the regenerated contribution is probably lower than in the real ocean, as only the turnover of semi-labile dissolved organic matter (DOM) is considered. While not strictly equivalent to primary production as measured by 14 C methods, rather net nutrient uptake, NCAR PP is a reasonable proxy for the time and space variability of PP if somewhat underestimating the absolute magnitude. For reasons of simplicity, net nutrient uptake times the C:P ratio of 117 (Anderson and Sarmiento, 1994) is considered here as PP, even though it is not essentially the same. The ocean biogeochemical model includes the main processes of the organic and inorganic carbon cycle within the ocean, and airsea CO 2 flux. A parameterisation of the marine iron cycle has also been introduced (Doney et al., 2006). It includes atmospheric dust deposition/iron dissolution, biological uptake, vertical particle transport and scavenging. The prognostic variables in the ocean model are phosphate (PO 4 ), total dissolved inorganic iron, dissolved organic phosphorus (DOP), DIC, alkalinity, and O 2 . The CSM1.4-carbon source code is available electronically (see http://www.ccsm. ucar.edu/working groups/Biogeo/csm1 bgc/) and described in detail in Doney et al. (2006).

Experiments
All results of the current study are obtained from simulations with the coupled climate-carbon cycle models explained above (IPSL, MPIM, NCAR). These models all simulate fully coupled interactions between the atmosphere, ocean circulation, sea-ice, marine biogeochemical cycles, and terrestrial biosphere. They are a subset of the models that contributed to the C4MIP project (Friedlingstein et al., 2006) and follow the C4MIP protocols. Two of the models (IPSL, MPIM) are only forced by the historical development of anthropogenic CO 2 emissions due to fossil fuel burning and land-use changes from preindustrial to 2000 and the SRES A2 emissions scenario from the year 2000 on. The NCAR model also includes CH 4 , N 2 O, CFCs, volcanic emissions, and changes in solar radiation, as described by Frölicher et al. (submitted). For spinup all models have been integrated for more than one thousand years. In particular, the IPSL biogeochemical model was integrated over 1000 years with a constant ocean circulation field starting with tracer distributions from former model simulations resulting in a total of more than 3000 years integration time for the biogeochemical tracers. In MPIM globally uniform tracer distributions were applied for initialization of a 1650 year simulation at coarse resolution, before another 300 years with final resolution were performed. In NCAR, where the model was started from modern tracer values, but atmospheric CO 2 was held constant at 278 ppm, representing preindustrial conditions, an acceleration technique for the deep ocean (Danabasoglu et al., 1996) was applied, so that a first 350 year integration corresponds to 17500 years for the deep ocean. From this another 1000 year control simulation was started. After these spinups, the fully coupled versions of all three models were integrated for one hundred years (MPIM 150 years), before starting the transient simulations over the industrial period from 1860 (IPSL, MPIM) and 1820 (NCAR) until the year 2100. Such long integration times for spinup and the use of other input fields than climatological data lets the models deviate from observed conditions, allowing for comparison of 3-D modeled fields with climatological values. During the time period investigated , the anthropogenic CO 2 emissions increase from about 7.5 to 8.6 Gt C yr −1 , resulting in a cumulative emission of about 170 Gt of carbon during this interval.
For a joint analysis of the model results and for comparison with observation-based estimates, all variables have been interpolated onto a 1 • ×1 • grid using a gaussian interpolation and climatological mean values have been computed over the period from 1985 to (NCAR: 1985. This study focuses on results from those two decades to describe the present day PP and EP as obtained from coupled model simulations and observation-based estimates.

Temperature and salinity
In a Taylor diagram, all three models show good agreement for 3-D global annual mean distributions of temperature ( Fig. 1). Global SST distributions including the seasonal cycle are even better reproduced, reaching correlation coefficients above R=0.98 and normalised standard deviations close to 1. Salinity distributions are known to be less well reproduced by coupled climate models, which is largely attributed to deficiencies in the hydrological cycle of the atmosphere models . This is also the case for the models of the current study. However, correlation coefficients for both the global annual mean 3-D salinity distributions and the sea surface salinity (SSS) values including the seasonal cycle are still on the order of R=0.78-0.90. In contrast to temperature, the agreement between modelled and observed salinity distributions is poorer when going from the global annual mean 3-D field to seasonal surface water values, which is due to a stronger influence of possible misfits from the atmospheric hydrological cycle, as mentioned above. While correlation coefficients remain similar, the normalised standard deviations are considerably higher than observed. Generally, in terms of T and S, all three models of the present study perform as well as the majority of models that contributed to the IPCC AR4 (Meehl et al., 2007;Schneider et al., 2007).  Collier and Durack, 2006;Conkright et al., 2002). For MPIM the normalised standard deviation of MLD compared to de Boyer-Montégut et al., (2004) is 6.0 and 4.7 for MLD max and seasonal MLD, respectively, and therefore not displayed in the diagram. Bottom: Taylor diagram showing the correspondence between model results and observation-based estimates for primary production (PP; squares), chlorophyll (CHLA; diamond) and particulate organic carbon (POC) export production (EP; circles and triangles). White symbols show results for annual mean 2-D fields, grey symbols include the seasonal cycle for PP. The angular coordinate indicates the correlation coefficient (R), the radial coordinate shows the normalised standard deviation (std model /std obs ). A model perfectly matching the observations would reside in point (1,1).

Mixed-layer depth
The mixed-layer depth (MLD) is a dynamically important variable for upper ocean water mass transport (Gnanadesikan et al., 2002), and especially the maximum (MLD max ) during winter time strongly affects the formation of mode, intermediate, deep, and bottom waters. Furthermore, ocean mixing plays an important role in surface ocean nutrient availability and thus biological productivity (Najjar et al., 2007). Different methods to determine MLD may yield different results, complicating the use of MLD for model-data intercomparisons. For example, there is a difference between MLD if either T and S (or density) are horizontally gridded first or if density is defined for individual profiles first and then interpolated in space (de Boyer-Montégut et al., 2004).
In the current study we use results from de Boyer-Montégut et al. (2004), where MLD is defined to be the depth level where surface water density is offset by 0.03 kg m −3 or potential temperature is by 0.2 • C lower than SST. These definitions, applied to local profiles first and then interpolated in space, are suggested to be the optimal solution for observation-based MLD. In IPSL MLD is the depth where in situ density is 0.01 kg m −3 higher than surface density. Similarly, in MPIM, it is the depth where in situ density exceeds surface water density by 0.125 kg m −3 . In the NCAR model the vertical mixing scheme is the K-profile parameterization (KPP) scheme of Large et al. (1994), where the mixed-layer depth depends on the depth of mixing due to turbulent velocities of unresolved eddies.
The MPIM model strongly overestimates MLD max in the Southern Ocean (Fig. 2), while the other models show too shallow mixing in this area as compared to climatological MLD observations (de Boyer-Montégut et al., 2004). In the low latitudes, all models correspond reasonably well to the observations, but in the intermediate and northern extratropics (20-70 • N) the models overestimate zonal average MLD max up to a factor of two. Spatial correlations of modelled and observed MLD max are weak and on the order of R=0.3-0.4. For MPIM the overestimation of MLD max in the Southern Ocean is known to be caused by reduced ice cover in the Weddell Sea (Marsland et al., 2003). This misfit leads to a normalised standard deviation of about 6.0 in a Taylor diagram (not shown).
A recalculation of MLD as the maximum of the first vertical derivative of sea water density was applied to T and S from the models and observations (WOA; Conkright et al., 2002). This analysis was performed to avoid inconsistencies that arise from the application of different MLD definitions for the models and the observations (de Boyer- Montégut et al., 2004). However, the results yield correlation coefficients (R) between 0.03 and 0.16 for MLD max and R=0.01 including seasonal MLD variability (small triangles in Fig. 1). This is even weaker than the comparison of MLDs based on different criteria, and also much worse than T and S distributions alone, on which their calculation is based. One possibility for the poor match is that errors in T and S may add up in the calculation of density and thus MLD.

Atlantic meridional overturning circulation
Another common diagnostic of simulated ocean circulation fields is to compare modelled and observed water mass transports Schmittner et al., 2005;Large et al., 1997). The strength of the Atlantic Meridional Overturning Circulation (AMOC) is an important measure for ocean circulation, quantifying the amount of deep waters formed in the North Atlantic, which is of high climatic relevance for the global redistribution of heat and energy by the ocean. In the current study AMOC is defined as the maximum meridional water mass transport below 300 m depth in the North Atlantic, given in Sverdrup (1 Sv=10 6 m 3 s −1 ). In MPIM the simulated water mass transport lies well within the range of observation-based estimates, while IPSL is at the low and NCAR at the high end (IPSL: 11.4±1.3 Sv; MPIM: 18.1±0.9 Sv; NCAR: 23.6±1.0 Sv, OBS: 11.5-24.3 Sv; Cunningham et al., 2007;Talley et al., 2003;Smethie and Fine, 2001).

El Nino southern ocean variability
The fully coupled carbon-cycle climate models used in the current study generate their own internal climate variability including coupled ocean-atmosphere modes such as El Nino Southern Oscillation (ENSO). To assess the models' capability to reproduce ENSO variability, the power spectra of SSTs averaged over the Nino3 Box (150 • W-90 • W, 5 • S-5 • N) are calculated (Fig. 3). For IPSL and NCAR there are maxima of the power spectra between two and seven years, which is the typical range of El Nino frequency (Randall et al., 2007), while MPIM does not show a maximum, but also strongly increasing power within this range. This may still be reasonable, as this analysis was performed over a 20 year time-period, which may be too short to exhibit representative ENSO dynamics. The short time interval of investigation is also the reason why no results for frequencies longer than 5 years are obtained.

Phosphate concentrations
Phosphate concentrations (PO 3− 4 ) are a suitable indicator to assess biogeochemical cycling in the models, as PO 3− 4 is not affected by air-sea gas exchange, which is treated differently in the models. Furthermore, in the current study PO 3− 4 is a fully prognostic tracer in contrast to the models contributing to the OCMIP-2 study, where surface PO 3− 4 was restored (Najjar et al., 2007). The comparison with climatological PO 3− 4 values (WOA; Collier and Durack, 2006;Conkright et al., 2002) reveals a reasonable reproduction of the annual mean PO 3− 4 spatial patterns by all models with high correlation coefficients around R∼0.80. IPSL and MPIM perform especially well with normalised standard deviations of 0.97 (Fig. 1). For all models the agreement gets weaker when going from the annual mean 3-D distribution of PO 3− 4 to surface water values including the seasonal cycle. Maps of surface water PO 3− 4 concentrations, averaged over the top 0-100 m, show that MPIM and NCAR strongly overestimate PO 3− 4 concentrations in the surface waters between 40 • N and 60 • S (Fig. 4), a deviation that is not captured by the Taylor diagram (Fig. 1).

Factors limiting phytoplankton growth
The factors limiting phytoplankton growth in the models are computed from Michaelis-Menten equations for nutrient concentrations, according to MM=N/(K+N), whereby MM is the Michaelis-Menten coefficient, N is the nutrient concentration (PO 3− 4 , NO − 3 , iron, and silicate) and K is the halfsaturation constant as used in each model. For the purpose of analysis, the nutrient yielding the lowest MM-value is considered to be the most limiting. Other factors (temperature and light) are assumed to be limiting when MM of the limiting nutrient is still above 0.7. In Fig. 5 the limiting factors are shown when the respective variable is limiting phytoplankton growth during at least one month of the year, showing that in some regions different variables may limit productivity at different times of the year. In the MPIM model, overall iron limitation is caused by the combination of too little dust deposition and, more importantly, a too high halfsaturation value for iron. The latter is computed via the halfsaturation value for PO 3− 4 (K PO4 =0.1µM) and an Fe:P ratio of 5 nM µM −1 , corresponding to K Fe = 0.5 nM. The NCAR model has a K Fe of 0.03 nM, but iron is known to be too strongly scavenged, especially in the subtropical Pacific, also resulting in too strong iron limitation. This excessive iron limitation in MPIM and NCAR causes stronlgy reduced surface water PO 3− 4 -uptake, resulting in elevated PO 3− 4 concentrations (Fig. 4).
In the IPSL model where K Fe values are 0.02 and 0.1 nM for nanophytoplankton and diatoms, respectively, iron is mostly limiting in the Equatorial East Pacific, the Southern Ocean, and the North Pacific, regions known to be high nitrate low chlorophyll areas (HNLC) that are strongly iron limited. Also in IPSL, NO − 3 is the most limiting factor in the intermediate to low latitudes. In the high latitudes also temperature and light are important for nanophytoplankton growth and diatom production is limited by the availability of silicate.
3.2 Annual mean and seasonal cycle of primary productivity and export production Vertical integrals of PP have been computed over the entire depth of the water column in order to compare with estimates from satellite observations of ocean colour and to derive global integrals of PP (Table 1). In MPIM PP is operationally restricted to 0-90 m and in NCAR to 0-75 m, while in IPSL it extents below 100 m water depth in the oligotrophic subtropical gyres. The global annual amount of PP ranges from 24 Gt C yr −1 (MPIM) to 31 Gt C yr −1 (IPSL), and is considerably lower than satellite-based estimates of around 48 Gt C yr −1 (Behrenfeld et al., 2006) and also lower than the range of 35-70 Gt C yr −1 obtained by Carr et al. (2006). Consequently, PP from the coupled models is still lower than the low-end satellite estimates. Despite some spread in results from the different PP algorithms, these can be used for comparison with results from climate models, because the satellite-based estimates show patterns of spatial and temporal variability that are similar between different approaches (Carr et al., 2006). The discrepancies between different algorithms largely reflect differences in the description of chlorophyll-specific carbon fixation efficiencies.
The spatial distributions of modelled PP agree only moderately with the pattern of observation-based PP (Figs. 1 and  6). In general, satellite-based and model PP are high in the equatorial upwelling regime, especially the Equatorial East Pacific and the Equatorial East Atlantic. PP is also elevated in the North Atlantic and the Southern Ocean north of the  polar front, while it is low in the subtropical oligotrophic gyres. High PP values as derived from observation-based estimates for the coastal zones are strongly underestimated by all three models, which is expected due to the coarse model resolution. For MPIM and NCAR the fairly low total PP is a result of strong iron limitation (Fig. 5). In the MPIM model this causes too high PP in the low-latitude Atlantic and too little elsewhere. Furthermore, dust input in MPIM is at the lower end of observed values (Timmreck and Schulz, 2004), leading to low iron deposition from the atmosphere, which is another factor increasing the iron stress for marine productivity. The lower PP values in the NCAR model also reflect the fact that the calculated productivity is intermediate between new and primary production, which partly explains the lower than observed PP values (Fig. 6).
The seasonal variability of PP is shown in Hovmöller diagrams with the zonally averaged PP plotted versus time from observations and models (Fig. 6). Areas of highest productivity are observed in the North Atlantic, where also seasonal variability is highest. In the oligotrophic gyres PP is low throughout the year and variability is also reduced. Along the equator, observed PP has moderate values with low seasonal variability, while in the southern intermediate latitudes (30-50 • S) a secondary maximum with high PP and high variability occurs. The models capture the general distribution of higher absolute PP values and higher seasonal variability in the intermediate to high latitudes. IPSL has too high PP values along the equator and too little meridional extension of the oligotrophic gyres. Also in the Southern Ocean, PP is overestimated and extending farther southwards than in the observations. In MPIM, the periods of high productivity in the higher latitudes are very short, indicating that here PP occurs in short (strong) pulses, while in the equatorial region PP is higher than the observations with some moderate seasonal variability. The pattern of spatio-temporal variability in NCAR is similar to the one from the observations, but like in IPSL, PP in the intermediate southern latitudes is biased high. The area of the oligotrophic gyres is reduced in the Atlantic, while in the Pacific the expansion of the gyres is too large (Fig. 6).
As mentioned above, the derivation of PP from satellite data entails some uncertainties. It is a stepwise approach starting from measurements of ocean colour from which chlorophyll concentrations (and sometimes organic carbon biomass) and finally, PP is derived. While the first steps consider concentrations, PP is a time-dependent rate, requiring even more complex assumptions about the underlying mechanisms. With each further step from ocean colour to PP, the errors propagate, increasing the uncertainty in the value for global annual PP. For IPSL, where chlorophyll concentrations are available, a comparison of modelled and satellitebased chlorophyll distributions shows a better agreement than PP values (Fig. 7). Although the model underestimates the surface Chl concentrations, especially in high (northern) latitudes and around the equator, the pattern and the order of magnitude of spatial variability for Chl are slightly closer to observations than those obtained for PP estimates (Fig. 1).
Export Production (EP) describes the amount of particulate organic carbon (POC) that is transported from the surface ocean to depth across a certain depth level (IPSL: 100 m; MPIM: 90 m; NCAR: 75 m). In this study only the material settling through gravitational sinking is regarded, while the total export also encompasses subduction and mixing of suspended particles and dissolved organic matter (DOM) due to water mass transports. The fields of annual mean EP have similar patterns to those of PP for all three models (not shown), i.e. as a first approximation those areas with higher PP also have higher EP. In NCAR this relation is prescribed by using a fixed ratio (1/3) of PP that is exported as POC, while for the other models the amount of EP and its ratio to PP is less straightforward, depending on particle flux and food-web dynamics. The global annual rates of modelled EP are given in Table 1, ranging from 5 Gt C yr −1 (MPIM) to 9 Gt C yr −1 (NCAR), being thus below observation-based estimates reaching from 11 to 22 Gt C yr −1 (Schlitzer, 2000;Laws et al., 2000;Eppley and Peterson, 1979).
Highest EP in IPSL and NCAR occurs in the intermediate to high latitudes between 40-60 • North and South, respectively (Fig. 8). A secondary maximum is situated around the equator, where upwelling of nutrient rich deep waters permits high PP (Fig. 6) and thus high EP. Between those areas, in the latitude bands of the oligotrophic subtropical gyres (∼10-40 • North and South) EP is low. In MPIM, zonal average EP increases from low values in the high latitudes to its maximum around the equator, however, in general EP seems to be underestimated everywhere except for the equatorial region if compared to the other models and the observation-based estimates. This distribution will also be caused by the strong iron limitation (Fig. 5).
In a Taylor diagram the correlation coefficients for the spatial distributions of annual mean EP from the models are rather poor with highest values for IPSL (R=0.35). These correlations are also lower than those determined for PP (Fig. 1). One problem for the comparison of different EP estimates are the different reference levels, depending on the definition of EP, as mentioned above. Consequently, the shallowest estimate (NCAR, 75 m) should give highest values and the deepest estimate (Schlitzer,133 m) lowest export fluxes, which, however, is not the case (Table 1). Another probable reason for the mismatch between modelled and observation-based EP is the uncertainty of observationbased estimates for EP. Similar to the PP estimates, they also vary by a factor of two, from global annual mean values of 11 to 22 Gt C yr −1 . Furthermore, the two estimates of Schlitzer (2000) and Laws et al. (2000), which predict about the same global amount of EP (11 Gt C yr −1 ), show very different spatial distributions (Fig. 8) and also spatial correlations between the two are not better than those from comparison with modelled EP (Fig. 1). Discrepancies between the two fields of observation-based EP estimates, especially in the northern hemisphere and the Southern Ocean around 60 • S (Fig. 8), can partly be explained by the fact that satel- Fig. 9. Left column: time series of observation-based (top) and modelled (others) primary production (PP) anomalies for the global (green; left scale) and the low-latitude, permanently stratified ocean (black; left scale), which has annual mean SSTs above 15 • C. The anomalies are calculated as the difference between the monthly PP value an the climatological mean of the corresponding month. Right column: time series of the observation-based (top) and modelled (others) PP anomalies for the permanently stratified ocean (black; scale on the left) overlaid by anomalies for stratification (red) and SST (blue). For the NCAR model results from the Nino3 Box are displayed (lower right panel), which yields PP anomalies which are an order of magnitude lower than for the global and the low-latitude cases. Please note that the scales for the latter two indices (SI, SST) have been inverted to better show the inverse relationships.
lite observations may have difficulties in coastal areas due to the high abundance of suspended matter and can thus overestimate Chl and consequently EP, whereas in the Southern Ocean deep Chl maxima are probably not captured by the satellite sensors (Schlitzer, 2002).  Primary production is known to be sensitive to climate impacts like for example El Nino Southern Oscillation (ENSO). Consequently, during an El Nino period, PP in the tropical Pacific is reduced, whereas at a La Nina situation it is enhanced (Behrenfeld et al., 2001;Chavez et al., 1999). From satellite observations it has also been shown that the global signal of interannual PP variability can largely be attributed to the permanently stratified, low-latitude oceans (Behrenfeld et al., 2006). What is more, such anomalies are highly correlated with shifts in the climate system in a way that stronger stratification and the resulting surface ocean warming, which correspond to more El Nino-like conditions, result in negative PP and chlorophyll anomalies over much of the tropics and subtropics. Stronger stratification results in less nutrient supply from deep waters, which in turn limits phytoplankton growth. Algorithms that derive PP from ocean colour make only indirect assumptions about nutrient concentrations via the temperature effect on PP. More detailed answers regarding the links between stratification, temperature, nutrients and biological productivity can be provided by ocean biogeochemical models. Therefore, anomalies of different parameters (PP, EP, SST, or stratification) are computed as the difference of the monthly value to the climatological average of the respective month. All three models examined in the current study follow the behaviour described by Behrenfeld et al. (2006), where the global signal of PP anomalies is largely controlled by the permanently stratified, low-latitude oceans that have annual average SSTs above 15 • C (Fig. 9). However, MPIM strongly overestimates the amplitude and frequency of interannual PP variability while in NCAR PP variability is slightly too low. It is interesting to see that for all models and the observation-based values, the percentage of PP in the low-latitude, permanently stratified ocean (PP strat ) to global PP (PP glob ) almost equals the fraction of the stratified areas to the global ocean surface area, while the anomalies of PP glob and PP strat are positively correlated (Table 1). An equivalent correlation between the globally integrated PP anomalies and those from the high northern or southern latitudes can not be found, which underlines the dominant role of the low-latitude ocean in setting the global signal of PP variability.
Observation-based PP strat anomalies are negatively correlated with changes in stratification and SST over the same area ( Fig. 9; Behrenfeld et al., 2006). Note that in interpreting Fig. 9, one should focus on the magnitude and frequency of the PP variability, not the phasing of specific PP events. Since the models are fully coupled and thus have no real time information other than from CO 2 emissions, except for NCAR that also includes other greenhouse gases and volcanic forcing (Frölicher et al., submitted), they each generate their own unique internal climate variability that can only statistically be compared with other models and observations. For the model results, a stratification index (SI) was calculated as by Behrenfeld et al. (2006), as the density difference between the sea surface and 200 m water depth. In the IPSL model, PP integrated over the entire domain of the low-latitude, stratified water is strongly anti-correlated with SST and SI. In the NCAR simulation, the biological-physical anti-correlations are more apparent when the analysis is restricted to the area of the Nino3 Box (150-90 • W, 5 • S-5 • N) (Table 2, Fig. 9). For MPIM no correlation can be found between the respective anomalies of PP and stratification (SI) or SST, neither for the whole low-latitude, permanently stratified domain nor for any other individual sub region. This may be because in MPIM next to overall iron limitation the equatorial Atlantic ocean is dominated by strongly oscillating predator-prey cycles, which lead to phase shifts in phytoplankton and zooplankton abundance. This effect also explains the high amplitude and frequency of interannual variability of PP simulated by MPIM (Fig. 9).
The slopes that can be derived from the anomalies of PP strat versus stratification (S-SLOPE) and SST (T-SLOPE), whereby the latter are also used as an indicator for stratification, are similar in IPSL and the observation-based Each data point represents a monthly value from the time series shown in Fig. 9. Slopes and coefficients of determination for the regression lines are given in Table 1. estimates (Fig. 10). In the satellite observations for the area of the permanently stratified oceans there is a PP anomaly of -876 Tg C month −1 per unit stratification increase (kg m −3 ) and a decrease of -151 Tg C month −1 per degree SST increase (Table 1). For IPSL, S-SLOPE is slightly weaker (-787 Tg C month −1 ) and T-SLOPE somewhat stronger (-246 Tg C month −1 ). In NCAR both slopes are weaker and correlations are insignificant for the entire low-latitude, stratified domain, but considerably higher correlations are found for the region of the Nino3 Box (Table 2). This area is, however, too small to explain the bulk of the PP variability for the permanently stratified ocean, which governs the global signal in the NCAR simulations.

Mechanisms of primary production variability
A further step is to examine the mechanisms behind the climate-productivity correlations as found in IPSL for the entire low-latitude ocean domain and in NCAR for the tropical East Pacific. To find out which mechanisms probably contribute to PP variability, maps of cross correlations of different anomalies averaged over the whole region (Fig. 11) versus the local PP variability are drawn, highlighting those regions in the models where PP reacts most sensitive to changes in other variables like stratification, SST and surface water nutrient concentrations. The coefficient of determination R 2 is multiplied by the sign of the regression slope to obtain R 2 * which shows next to its strength also the direction of the correlation (positive or negative). Displaying R 2 * instead of R, that would include the direction of the correla- tion (positive or negative), was chosen, as it permits a better distinction between areas of high and low correlation than R does. To isolate the large-scale climate impact on productivity, the anomalies of temperature, stratification and nutrient concentrations are averaged over the whole low-latitude, permanently stratified ocean, rather than showing the local signal on each grid-point. The pattern found for anomalies of PP and stratification (SI) in the IPSL model is similar to the one seen in the observation-based estimates with strongest anti-correlations (negative R 2 * -values) occurring at the borders of the equatorial tongue (Fig. 11). High positive coefficients of determination (R 2 * ) between PP and (NO − 3 ) concentrations in the same area indicate that stratification and SST changes in the equatorial Pacific result in changes in the upwelling of nutrients, which further leaves an imprint on PP. As the equatorial Pacific is known to be mainly iron limited (Fig. 5), one would assume to find also a strong impact of iron variability on PP towards the center of the high productivity tongue. The absence of such a signal in IPSL, however, may be caused by the formulation of the iron cycle, where iron concentrations are restored to a minimum value of 0.01 nM. This baseline concentration represents a non-accounted source of iron, which could arise from processes that are not explicitly taken into account in the model, like temperature or light effects on iron availability, iron released from ligands and dissolved or particulate matter, variable iron content in deposited dust, different ratios of bioavailable versus dissolved iron from recycling (e.g. microversus macrozooplankton), changes in phytoplankton size and/or physiology like half-saturation constants or iron demand. The iron restoring formulation allows to correctly represent the width of the equatorial tongue in chlorophyll and the location of the iron-to-nitrate limitation transition, thus yielding a better representation of nutrient co-limitations (Fig. 5). By doing so, the natural variability of iron is partly suppressed, dampening the signal that otherwise would be transferred into PP variability. Nevertheless, IPSL shows the best representation of interannual climate/PP variability both temporally (Fig. 9) and spatially (Fig. 11). This is due to the fact that next to the location of the iron-to-nitrate limitation transition, the impact of ENSO variability on the supply of NO − 3 is well reproduced. In NCAR highest anti-correlations for PP and SI anomalies are found in the equatorial Pacific (Fig. 11), although much weaker than in IPSL and the observations. Especially for the area of the Nino3 Box only, higher correlations between PP and SI/SST than for the entire low-latitude, stratified domain have already been mentioned above. The respective correlation coefficients are given in Table 2, showing also that in NCAR PP in the area of the Nino3 Box is indeed controlled by iron availability. There is no further correlation with PO 3− 4 concentrations, as due to iron limitation PO 3− 4 concentrations higher than observed occur. Since iron is supplied to the surface ocean by dust input and upwelling of remineralised iron from below, surface water iron concentrations are partly decoupled from changes in stratification (climate), however, (anti-) correlations between iron and PP (stratification) in the area of the Nino3 Box (Table 2) are still moderate to high. In MPIM, where PP is totally dominated by iron stress and predator-prey oscillations, there are no correlations between anomalies of PP and PO 3− 4 or NO − 3 concentrations.

Export production
The pattern of the sensitivity of PP to climate is transferred into export production (EP) variability. EP reacts most sensitive to changes in PP in those areas, where PP reacts most to climate variability (Fig. 12). In IPSL, variability in EP is strongly correlated with the average PP variability at the borders of the high productivity tongue in the equatorial Pacific, a pattern which is also reproduced by the EP correlations with NO − 3 anomalies (Fig. 12). In MPIM there are no considerable correlations between EP and PP or other climate and nutrient anomalies when regarding the entire lowlatitude ocean. In the NCAR model, EP shows the strongest correlation with PP variability in the North Atlantic, where PP (and thus EP) exhibits the strongest interannual variability. Coefficients of determination different from 1 are found in NCAR, even though EP is fixed to be 1/3 of PP, as local EP variability is correlated with PP variability integrated over the entire low-latitude, permanently stratified domain. This is also the reason why negative correlations in all models may occur. Surprisingly, neither IPSL nor NCAR show considerable correlations of EP with changes in the mixed layer depth. climate models (Meehl et al., 2007;Randall et al., 2007). In terms of biogeochemical cycling MPIM and NCAR have shown to be too strongly iron limited (Fig. 5) and iron cycling in the models has shown to be the central point in reproducing the observed climate-productivity relations.
Maps of surface water iron concentrations (Fig. 13) show that although the range of iron concentrations in all three models is similar, the patterns of distribution are very different. This can be explained partly by the different iron sources in the models, which are dust input and upwelling from below in MPIM and NCAR, and an additional sediment iron source in IPSL. Please note that there is no interannual or longer-term variability in the dust sources of all three models. The fact that relatively high iron concentrations are achieved in the HNLC Southern Ocean for MPIM and NCAR, suggests that in NCAR either the applied half-saturation constant (K Fe ) is too high or that limitation by temperature or light prevails (Fig. 5). In MPIM, where no explicit iron halfsaturation is used, but nutrient co-limitation is computed via K PO4 =0.1 µM and the Fe:P ratio of 5 nM µM −1 it may be a too high Fe:P ratio and/or K PO4 that result in too strong iron limitation. For IPSL contour lines in Fig. 13 mark those areas where iron concentrations during the 20-year period of investigation during at least one month reach the lower boundary value of 0.01 nM, below which iron is restored. The map shows that the determined relationships between climate, stratification, nutrient concentrations and finally PP (and EP), reproducing the observations very well (Fig. 11), are not directly influenced by this artificial iron source.
From the coefficients of determination (R 2 ) and the signs of the regression slopes of the variability of PP and SI (stratification index) versus different nutrient concentration anomalies, the chain of cause and effect from climate impacts to nutrient concentrations and further to PP and finally to EP, may be tentatively reconstructed (Table 1). Accordingly, in IPSL the global signal of PP variability is well explained by the behavior of the entire permanently stratified, low-latitude ocean. Even though in some cases there are moderate correlations between PP and iron variability the slope is negative, which means in the opposite direction than expected (Table 2). This shows the effect of restoring the iron concentrations on hiding the potential impact of iron limitation on PP variability. Instead, PP variability is dominated by NO − 3 and PO 3− 4 (Fig. 11, Table 2). Both nutrients show high correlations with climate indicators like SI and SST as well as with PP for the entire stratified, low-latitude domain and also for the Nino3 Box only (Table 2). In MPIM and NCAR PP variability is largely independent of PO 3− 4 and other nutrients (MPIM), because too strong iron limitation allows high macronutrient concentrations well above observed levels (Fig. 4). Another reason for the absence of a better correlation between the anomalies of PP and nutrients of the whole low-latitude area in NCAR may be the fact that model PP stands for net nutrient uptake, which includes processes like grazing and heterotrophic respiration. Under possible more El Nino-like conditions with surface ocean warming and stronger stratification in the lowlatitude ocean, nutrient supply to the surface ocean is expected to be reduced, resulting in lower PP and EP. This is supported by the IPSL simulation. As a consequence, in a future warming climate with conditions resembling a permanent El Nino, both PP and EP probably will change, which has already been shown by other climate model simulations Boyd and Doney, 2002;Bopp et al., 2001, Frölicher et al., submitted). T-SLOPE was determined in the present study from observation-based estimates and the IPSL model to quantify PP sensitivity to SST (Table 1, Fig. 10). As the model simulations were run until the year 2100 using the A2 scenario, results are available to check whether T-SLOPE also holds to predict the future impact of climate change on PP in the low-latitude ocean. Accordingly, one would expect PP (EP) in the low-latitude ocean to decrease by 3 Gt C yr −1 (0.6 Gt C yr −1 ) per 1 • C temperature increase. For this particular ocean area the IPSL model predicts an increase of the average SST of 1.6 • C from the years 1985-2005 until 2090-2099. This should cause a decrease in PP (EP) by 4.8 Gt C yr −1 (1 Gt C yr −1 ), representing a decline of 27%. However, the model predicts PP over the same time period and area to reduce only by 1.6 Gt C yr −1 (-9%) and EP by 0.7 Gt C yr −1 (-19%). This shows that T-SLOPE, which was derived from modelled interannual variability during the two decades between 1985 and 2005, can not straightforwardly be used for extrapolation into future climate conditions and also that the relation of EP versus PP will shift. Consequently, to estimate the impact of future climate change on marine productivity and carbon export further mechanisms and, of course, also the high latitude ocean will have to be considered. For example, Le Quéré et al. (2007) demonstrated the importance of an increase of near surface wind speeds over the Southern Ocean between 1984(Marshall, 2003 on the marine carbon cycle, a trend that is not reproduced by the models in the current study during the investigated time period. Next to a change in the air-sea CO 2 exchange (Le Quéré et al. 2007), such a shift may increase the supply of iron from below, which in turn might stimulate phytoplankton growth in the Southern Ocean. A more detailed study on the longterm shifts in marine biogeochemical cycles under climate change will be done in a complementary analysis that uses the continuation of the here presented model simulations in a scenario of future climate change (SRES A2) until the year 2100.

Conclusions
The current study has illustrated a strong link between marine productivity and climate variability in coupled climate carbon cycle models, which has already been observed from satellite records (Behrenfeld et al., 2006). A detailed examination of biogeochemical cycling in the models has revealed the importance of the modelled iron cycle on the impact of climate variability on marine productivity. Only one model (IPSL) is able to reproduce the observed relationship between climate (stratification, SST) and PP. The use of an iron restoring formulation in IPSL may be suitable for simulations of the present-day situation, as the areas with strong anti-correlations between productivity and stratification (SST) seem to be not directly influenced (Fig. 13). Remote effects by advection, however, can not be excluded. This has to be regarded with care when applying the model to simulations of future climate change.