Neural network-based estimates of Southern Ocean net community production from in-situ O 2 / Ar and satellite observation : a methodological study

Introduction Conclusions References


Introduction
The Southern Ocean plays an important role in the global carbon cycle.The current annual global ocean uptake of atmospheric carbon dioxide (CO 2 ) is about 2 petagrams (Pg) of carbon, half of which is taken up by the vast Southern Ocean south of 30 • S (Takahashi et al., 2012).Atmospheric CO 2 absorbed by the ocean can be transferred from the surface to the deep ocean via various physical, chemical, and biological mechanisms associated with the solubility and biological pumps (Volk and Hoffert, 1985;Carlson et al., 2010).Biological carbon export from the ocean surface is a function of various processes, including net community production (NCP), which reflects the metabolic balance between gross primary production (GPP) and community respiration (Codispoti et al., 1986;Minas et al., 1986).It describes the net rate at which CO 2 is transformed to particulate and dissolved organic carbon (POC and DOC).For the present study, we use NCP estimates derived from in situ measurements of the elemental ratio of O 2 / Ar.The O 2 / Ar method measures biological O 2 supersaturation in the mixed layer (Craig and Hayward, 1987) and yields NCP estimates over the O 2 residence timescale (one to two weeks) (Reuer et al., 2007;Cassar et al., 2007Cassar et al., , 2009Cassar et al., , 2011)).On this timescale, the NCP derived from this method is tightly linked to the export of organic carbon from the mixed layer at steady state, under the assumptions that both vertical mixing of O 2 -depleted waters from below and accumulation of POC and DOC in the mixed layer are negligible (Cassar et al., 2009(Cassar et al., , 2011;;Jonsson et al., 2013).Although we use NCP and carbon export production interchangeably in this study, it should be noted that under some circumstances, the assumption of steady state is violated (Hamme et al., 2012;Jonsson et al., 2013).

Published by Copernicus
While in situ O 2 / Ar measurements shed new light on the NCP distribution and variability, the Southern Ocean remains seriously undersampled.The difficulty in obtaining a largescale picture of the carbon export owes to the unavailability of direct satellite measurements.In addition, NCP is highly variable in space and time and cannot be derived from linear interpolation between in situ measurements.Field experiments also reveal that the plankton ecosystem and CO 2 flux variability are not dominated by just one single mechanism but by a confluence of several processes that shift in relative importance over time and space (Banse, 1996;Abbott et al., 2000Abbott et al., , 2001;;Cassar et al., 2011;Tortell et al., 2012), which are difficult to capture in biogeochemical models.
An alternative strategy is to use a data-driven modeling approach.We may achieve a more comprehensive characterization of temporal and spatial variability of NCP by examining the statistical relationships between NCP and physical as well as biogeochemical properties that potentially have impacts on carbon export.In addition to mixed layer depth (MLD) and light (i.e., photosynthetically available radiation (PAR)) (Cassar et al., 2011), POC, Chl, sea surface temperature (SST), and sea surface height (SSH) are likely important factors regulating or correlated with NCP in the Southern Ocean.POC production is the dominant form of NCP in the Southern Ocean (Ogawa et al., 1999;Wiebinga and de Baar, 1998;Kaehler et al., 1997;Hansell and Carlson, 1998;Sweeney et al., 2000;Schlitzer, 2002;Ishii et al., 2002;Allison et al., 2010), and Chl concentration is commonly used to estimate net primary production (NPP) from satellites (Behrenfeld and Falkowsky, 1997;Moore and Abbott, 2000;Campbell et al., 2002;Carr et al., 2006;Bissinger et al., 2008;Friedrichs et al., 2009;Saba et al., 2011;Friedland et al., 2012;Nevison et al., 2012;Olonscheck et al., 2013).SST has been used to derive export and export efficiency based on the relationship with NPP and through its influence on heterotrophic activity (Laws et al., 2000(Laws et al., , 2011;;Laws, 2004).SSH yields information on oceanic eddies, fronts, and nutrient transport that are crucial to spatial variation of biological activity (Abbott et al., 2000(Abbott et al., , 2001;;Glorioso et al., 2005;Kahru et al., 2007;Gruber et al., 2011).
Advances in remote sensing and statistical algorithms now permit satellite data-driven modeling of NCP.Satellite-borne sensors have accumulated records for a decade or longer of PAR, POC, Chl, SST, and SSH of sufficient resolution and coverage in space and time.Southern Ocean MLD products became available in recent years from Argo float profiles (Wong, 2005;Sallée et al., 2006;Schneider and Bravo, 2006;Dong et al., 2008) as well as from high resolution ocean general circulation models (OGCMs) (Aoki et al., 2007a;Sterl et al., 2012).In this study, we combine the in situ NCP measurements from 60 crossings spanning more than a decade with gridded datasets of NCP predictors -PAR, POC, Chl, SST, SSH, and MLD -to generate weekly gridded maps of NCP estimates over the Southern Ocean from 1998 through 2009.We generate these NCP predictions through the use of self-organizing map (SOM) analysis, a type of clustering approach that has arisen in the field of artificial neural networks (Kohonen, 2001).SOM analysis has gained in popularity in the atmospheric and ocean sciences over the past decade, with applications in categorizing atmospheric teleconnection patterns (Reusch et al., 2005;Johnson et al., 2008;Johnson and Feldstein, 2010;Johnson, 2013) and in generating maps of pCO 2 for the North Atlantic (Friedrich and Oschlies, 2009;Telszewski et al., 2009) and for the global ocean (Sasse et al., 2013).
In the present application, we follow the general approach of Friedrich and Oschlies (2009) and Telszewski et al. (2009), whereby we use the SOM with the combined purpose of cluster analysis and nonlinear, nonparametric regression between a set of predictors and NCP.Under this approach, which we describe more thoroughly in Sect.3, we allow the data to determine the potentially complex relationships between the predictors and NCP.Thus, the predictor-NCP relationships are unconstrained by any preconceived, uncertain functional forms and are determined entirely from the observed data, which contrasts with previous studies of Southern Ocean NCP.Nevertheless, we find that our estimates of NCP agree broadly with previous estimates while also providing additional information on temporal and spatial variability.The remainder of the paper is organized as follows.In Sect. 2 we describe the data used in the study.Section 3 provides a description of the SOM methodology for generating weekly NCP maps and for calculating error estimates.In Sect. 4 we present our results, noting some of the most salient features from the constructed NCP dataset.Section 5 provides a discussion and conclusions.

Data
We make extensive use of gridded data products and cruise measurements in the Southern Ocean domain poleward of 20 • S and for the period between 1998 and 2009.The gridded and research cruise data are described below.

Gridded predictor data
We consider six gridded data products, PAR, POC, Chl, SST SSH, and MLD, as potential predictors of NCP for use in the SOM analysis and for the generation of weekly NCP maps, as described more thoroughly in Sect.3.
We utilize satellite PAR and POC from the Moderate Resolution Imaging Spectroradiometer flown on the Aqua satellite (MODIS-Aqua) Level-3 mapped product with temporal and spatial resolutions of 8 days and 9 km, respectively, for the period 10 July 2002-30 December 2009.The weekly averaged Chl are constructed from the daily 9 km maps of Sea-viewing Wide Field-of-view Sensor (SeaWiFS), version 5.2, for the period 7 January 1998-26 December 2007(O'Reilly et al., 1998).For SST, we use NOAA Optimum Interpolation 0.25 • Daily SST blended with Advanced Very High Resolution Radiometer (AVHRR) and Advanced Microwave Scanning Radiometer (AMSR) version 2 data (OI SST) (Reynolds et al., 2007) for the period 7 January 1998-19 August 2009.The weekly SSH anomaly maps are obtained from the Archiving, Validation and Interpretation of Satellite Oceanographic Data (AVISO) on a global irregular grid with about 1/3 degree spacing (Ducet et al., 2000) from 7 January 1998 to 22 July 2009.To determine the absolute SSH, we added the AVISO SSH anomaly to the sea level climatology of Niiler and Maximenko (Niiler et al., 2003;Maximenko et al., 2009).We choose this particular SSH climatology because it has high spatial resolution (Sokolov and Rintoul, 2007).
Because the coverage of Argo float profiles is not homogenous (Akoi et al., 2007a) and the available gridded Argo data are either of coarser resolutions or shorter time periods (http://www.argo.ucsd.edu/Gridded_fields.html), we choose the MLD of the high resolution OGCM for the Earth Simulator (OFES) (Masumoto et al., 2004;Sasaki et al., 2006Sasaki et al., , 2008)).The OFES is an eddy-resolving quasi-global (75 • N-75 • S) ocean model based on the Geophysical Fluid Dynamics Laboratory Modular Ocean Model version 3 (GFDL MOM3) with 0.1 • horizontal resolution and 54 vertical levels.It provides MLD at 0.1-degree spatial resolution every three days.The model captures realistic upper ocean dynamics, including eddies and heat balance (Sasaki and Nonaka, 2006;Taguchi et al., 2007;Scott et al., 2008;Zhuang et al., 2010;Yoshida et al., 2010;Sasaki et al., 2011;Chang et al., 2012) and has been used to investigate the Southern Ocean dynamical variability (Aoki et al., 2007a(Aoki et al., , b, 2010;;Sasaki and Schneider, 2008;Thompson et al., 2010;Thompson and Richards, 2011).In the present study, we use the MLD from the OFES simulation forced by the QuikSCAT satellite wind field from 28 July 1999 to 28 October 2009.
For the interpolation of the predictor data to the daily ship track locations, all gridded data are first interpolated to daily resolution.Although we interpolate all predictor data to the daily ship tracks, subweekly variability is missing from those predictors of original temporal resolutions of 7-8 days.For the generation of weekly NCP maps, all gridded predictor data are interpolated to a common 1 • × 1 • latitude-longitude grid poleward of 20 • S at weekly temporal resolution.

Research cruise data
In the SOM analysis described below, the predictand of interest is an estimate of NCP from an extensive set of published data obtained from 41 research cruises in the Southern Ocean between 1999 and 2009 (Reuer et al., 2007;Cassar et al., 2007Cassar et al., , 2011)).Figure 1a shows our ship tracks with the time of the cruises color-coded in months.We see that the ship tracks mainly cover regions of high chlorophyll (see Fig. 2c) during the growing season between November and March.The histogram of the ship track NCP distribution is shown in Fig. 1b.From visual inspection, we also exclude spuriously large NCP outliers exceeding 180 mmol C m −2 d −1 .Figure 1c provides a detailed distribution of NCP below the outlier threshold.For all available ship track data, which are sampled unevenly in time, we calculate the daily mean NCP, latitude, and longitude.We then linearly interpolate all available daily gridded predictor data to the ship track locations.Negative NCP values are possibly due to net heterotrophy or measurements contaminated by the upwelling of oxygenundersaturated water.Because we are unable to estimate this potential bias, we exclude all days with negative NCP values prior to the SOM analysis.Overall, we retain 401 days of ship track data for the SOM analysis.All NCP and predictor data are standardized for the SOM analysis.Owing to the skewness of the NCP, Chl, MLD, and POC data, we perform a log10 transformation of these variables prior to the standardization.As a result, the SOM analysis is applied to all predictor and predictand data that approximately follow a Gaussian distribution with a mean of zero and a standard deviation of one.
In this study, the growing season is defined as November through March.Unless otherwise noted, all units are converted to mmol C m −2 d −1 for carbon export by division with a molar photosynthetic quotient for NCP of 1.4 O 2 / CO 2 (Laws, 1991).

Methodology
We construct weekly 1   calculations, we assume that NCP has a potentially complex, nonlinear relationship with these six predictors: NCP = f (PAR, POC, Chl, SST, SSH, MLD). (1) We understand that some of the predictors are not independent, and the information provided by these predictors might be redundant.However, in consideration of variable predictor data availability, as discussed below, such information overlap would be useful in compensation of missing predictors.In order to approximate this functional relationship, we use an artificial neural network approach, selforganizing maps (SOMs), similar to that used by Friedrich and Oschlies (2009) and Telszewski et al. (2009) for generating maps of the North Atlantic pCO 2 .The method of self-organizing maps combines elements of cluster analysis with nonlinear, nonparametric regression (Kohonen, 2001).This particular approach is advantageous for the present purpose because the methodology does not assume a predefined functional form between predictor and predictand; rather, the methodology relies on an unsupervised learning procedure whereby the potentially complex predictor-predictand relationships are determined entirely by the data used to construct the SOM through a process called training.In addition, the methodology readily handles one or more missing predictors when generating NCP maps, which is a useful property given the limited coverage of satellite predictor data over the Southern Ocean for some periods.The approach used here differs from previous SOM studies (Friedrich and Oschlies, 2009;Telszewski et al., 2009) in that we perform a thorough validation analysis to determine an optimal combination of SOM parameters and predictors and to provide estimates of error for weekly NCP predictions.Below, we include a brief description of the SOM methodology and descriptions of the procedures for generating NCP maps and calculating error estimates.Additional discussion is found in the Supplementary Methods section of the supporting material, and a more thorough description of the SOM methodology can be found in the appendix of Johnson et al. (2008).

Self-organizing map methodology and NCP dataset construction
In the present application, the SOM is trained with the sevendimensional (six predictors and the predictand, NCP) daily ship track data, where each daily observation is treated as a seven-dimensional data vector.The NCP mapping is accomplished in two steps: (1) SOM training with ship track data to determine the predictor-NCP clusters, and (2) assignment of weekly gridded predictor data to the best-matching SOM clusters and the concomitant assignment of the associated cluster NCP values to the corresponding grid.The first step generates K clusters that describe prototypical combinations of predictor and NCP values, and the user specifies the number K (the method for determining K is described below in Sect.3.2).In the second stage, for each grid and week the available predictor data are combined into a data vector of up to six dimensions; then this data vector is mapped to the bestmatching SOM cluster on the basis of minimum Euclidean distance.The NCP value associated with that best-matching cluster that is determined in step 1 is then assigned to that particular grid and week.This process is repeated for each available grid and week to construct weekly NCP maps.
As mentioned above, the SOM approach has the advantage of readily handling data even when one or more predictors are missing during both the training and NCP mapping stages.Due to limitations of satellite data coverage and differences in the starting and ending dates of the predictor datasets, most ship track days and weekly grids have at least one missing predictor value.In particular, the large cloud cover over the Southern Ocean, which typically exceeds 70 % south of 40 • S during the growing season (Warren et al., 1988), significantly impairs satellite retrieval of POC and Chl.Table 1 shows the availability of each variable in both the ship track data used to train the SOM and the gridded weekly data used to construct the NCP maps.Some variables such as SST, MLD, and SSH have good spatial and temporal coverage, whereas others are more sparse.Even though POC and Chl are among those with lower data availability, an improvement is apparent from their relatively high coverage of 40-60 %, in contrast to the large cloud cover (> 70 % on average), which is a result of the interpolation of the predictor data (7-or 8-day 1 • × 1 • ) onto daily ship track locations as well as the weekly grids.Overall, only approximately 30 % of all ship track days have all six predictor values available.For cases when one or more predictor values are missing, the SOM algorithm finds the best-matching clusters on the basis of minimum Euclidean distance, just as in the usual case, except that all dimensions corresponding to missing data are ignored.In the process of assigning NCP values to the weekly gridded data, the cluster dimension corresponding to NCP is ignored in every case of cluster assignment because NCP is excluded from the predictor data.The NCP value of the bestmatching cluster is then assigned to the corresponding grid.Thus, this particular application of SOM analysis essentially represents a method of imputation for missing data.

SOM parameter determination and error estimation
Each SOM analysis requires a number of specifications to be chosen prior to the analysis, such as the type of neighborhood function, type of lattice (usually hexagonal or rectangular), number of rows and columns in the lattice (with the total number of neurons equal to the number of rows multiplied by the number of columns), and the final neighborhood radius, which describes how connected the neurons are to their In order to determine an appropriate number of neurons, final neighborhood radius, and predictor combination, we consider several factors, including predictor availability, prior knowledge of Southern Ocean NCP, and a set of crossvalidation tests.For the cross-validation tests, we specify K and the final neighborhood radius, partition the NCP and predictor data into training and validation sets, train the SOM with the training set, and then evaluate NCP predictions with respect to the withheld validation data.We evaluate a large number of SOM parameter combinations by calculating the mean absolute error (MAE), root-mean-square error (RMSE), and mean fractional error (MFE) of the predicted NCP.A more thorough description of the cross-validation tests is found in the Supplementary Methods of the supporting material.
Several predictor-parameter combinations emerge as candidates for the NCP construction from the cross-validation tests with their errors on the lower end of the estimates (in mmol C m −2 d −1 ): MAE ∼ 5.3-9, RMSE ∼ 8.4-13, and MFE ∼ 27-51 %.However, because these error estimates only apply to the ship track NCP of limited spatial coverage over the Southern Ocean, we also consider additional criteria based on prior expectations of NCP variability.We examine the temporal evolution of the monthly, area-integrated NCP south of 40 • S as well as south of 50 • S and the spatial distribution of NCP climatology for the growing season.Although the true temporal evolution of the area-integrated NCP south of 40 and 50 We emphasize that this choice of predictor set does not mean that the other predictors are unimportant for NCP variability; rather, the combination of the redundancy of predictor information (e.g., positive correlations among POC, Chl, and SSH) and variations in data availability suggest that these other predictors do not add sufficient independent information to improve NCP predictions on weekly timescales.Interestingly, as we discuss further below, the NCP climatology has a stronger relationship with the POC than Chl climatology even though POC is not included in the final SOM analysis.This strong correspondence between mean NCP and POC despite the omission of POC as a final predictor should only strengthen the conclusion that POC plays a pivotal role in the spatial variability of NCP in the Southern Ocean.

Bootstrap NCP dataset constructions
In addition to measurement error and random NCP variability unexplained by the predictors, which are captured in the error estimates in Sect.3.2, another important source of error for the long-term mean is the limited data coverage used to construct the SOM.Because we are constructing a gridded NCP dataset over a large domain based on a limited number of research cruise measurements, a small number of measurements may have a disproportionate influence on the regional NCP constructions.To provide a quantitative measure of how this limitation impacts the uncertainty in the NCP climatology constructions, we use a bootstrap approach to construct 100 additional NCP datasets.From these 100 datasets, the NCP climatology variance for a particular location provides an indication of the sensitivity of the NCP estimates to this particular, limited ship track dataset used to train the SOM.
For each of these 100 bootstrap NCP datasets, we perform the NCP construction in the same way as described above but with one distinction: the SOM is trained with resampled ship track data.The resampling procedure, which follows conventional bootstrap procedures, is performed as follows.A boot-strap ship track dataset is constructed by randomly selecting research cruise numbers from 1 through 41 with replacement, and then placing the ship track data from the randomly chosen cruise number into the bootstrap dataset.This process of randomly choosing cruise numbers and placing the corresponding data into the bootstrap ship track dataset is repeated until the bootstrap ship track dataset has the same number of daily NCP and predictor observations as in the original ship track dataset.The SOM is then trained on the bootstrap ship track data with the same parameter and predictor combination as discussed above, and then the bootstrap NCP dataset is constructed based on this modified SOM.

Results
In this section, the basin-scale features in the constructed NCP dataset are first described.In the absence of the basinscale NCP observation, we compare the spatial pattern of the constructed NCP with the satellite-measured POC and Chl because the former accounts for 80-90 % of NCP in the Southern Ocean (Hansell and Carson 1998;Allison et al., 2010) and the latter is often used to derive phytoplankton biomass and NPP.Secondly, we examine the regional values of constructed NCP by comparison with those reported in the literature.We also include the 95 % bootstrap confidence intervals and seasonal standard deviation to indicate the uncertainty and temporal variability, respectively.For comparison with the in situ data that are not on our data grids, a 1-2 • spatial average is taken of the constructed NCP surrounding the point of observation.We note that exact agreement is not expected, given that the in situ derived NCP used for comparison were obtained by various methods that access different temporal and spatial scales of carbon export and that sometimes include different processes.Because our data are resolved on weekly timescales, we only perform comparisons with measurements of weekly or longer temporal resolution.
Thirdly, we show a dominant basin-scale NCP distribution that has emerged from various models and discuss the discrepancies.In addition, because the biological pump is one of the main mechanisms that drive atmospheric CO 2 into the ocean (e.g., Volk and Hoffert, 1985;Carlson et al., 2010) we compare our NCP to the observed air-sea CO 2 flux of the Southern Ocean.To convert from annual means to daily range, we assume that the growing season varies from 90 to 120 days (Heywood and Whitaker, 1984;Sweeney et al., 2000;Reuer et al., 2007;Racault et al., 2012) because of the large spatial and temporal variability in its duration for the Southern Ocean (Lizotte, et al., 2001;Racault et al., 2012;Borrione and Schlitzer 2013).The total area south of 50 • S is approximated to be 45.7 × 10 6 km 2 (Moore and Abbot, 2000).

Southern Ocean NCP climatology
Figure 2a shows the spatial distribution of 12-year growing season NCP climatology (1998-2009), superimposed with the major Southern Ocean fronts (Orsi et al., 1995).An elongated zonal band of high NCP (> 22 mmol C m −2 d −1 ) is seen approximately following the Subtropical Front (STF ∼ 40 • S), where macronutrient-rich subantarctic water converges with the macronutrient-poor subtropical water (Takahashi et al., 2012).It stretches from the southwest Atlantic across the south Indian Ocean to the western South Pacific, then splits with the STF east of the dateline around 170 • W and turns southeastward to about 120 • W. A sharp NCP gradient exists north of the front, with very low NCP throughout most of the subtropics except near large landmasses.Elevated NCP (≥ 20 mmol C m −2 d −1 ) is seen along the Southern Boundary (SBdy), the southernmost limit of the Antarctic Circumpolar Circulation (ACC), and along the Antarctic coast, including the Ross Sea and Amundsen Sea, where strong CO 2 sinks have recently been observed (Arrigo et al., 2008;Tortell et al., 2012).Between the STF and SBdy, we also see high NCP As discussed in the previous section, one of the limitations of this study is the limited availability and spatial coverage of NCP observations used in the SOM analysis for generalizing the relationships between NCP and each of the predictors.As a quantitative indication of how this limitation may impact the constructed NCP climatology, Fig. 3 shows the standard deviation, σ boot , of the growing season mean NCP from the 100 bootstrap NCP datasets (Sect.3.3), superimposed with the locations of each ship track.Largest uncertainties (σ boot ∼ 6-7 mmol C m −2 d −1 ) are found over the Patagonian Shelf, where ship track coverage is lacking.In regions of dense ship track coverage, the uncertainty is generally lower (σ boot < 3 mmol C m −2 d −1 ).The regions of high NCP bootstrap climatology standard deviation provide an indication of where targeted measurements in future studies may help to reduce the uncertainty of Southern Ocean NCP estimates.
The climatologies of POC (2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009) and Chl (1998-2007) are shown in Fig. 2b and c for comparison.Chl is in log scale due to its strong positive skewness.Overall, the regions of high mean NCP (Fig. 2a) correspond well with regions of high mean POC and Chl.The area-weighted, centered pattern correlation coefficients are 0.66 and 0.33 for climatological NCP versus POC and versus log 10 (Chl), respectively.On the basis of these pattern correlations, the climatological POC and Chl fields are able to explain, in a linear sense, 44 and 11 % of the climatological NCP field, respectively.However, the temporal correlation between NCP  and POC and log 10 Chl in the daily ship track data are only 0.20 and 0.23, respectively, which suggests that POC and Chl alone only explain a small percentage of the NCP variability on daily and shorter timescales.The low correlation between NCP and Chl on shorter timescales is consistent with Reuer et al. (2007).Despite this weak linear relationship in the ship track data, we seem to be able to tease out a clear link in the Southern Ocean climatology.

Regional NCP evaluation
In the following, we compare the regional NCP between the constructed data and independent in situ estimates available in the literature.For the NCP values reported for the period prior to our data availability, we provide the climatological values from our data at the sites and calendar days of the measurements.For those reported for the period overlapping our data period (1998)(1999)(2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009), we carry out a real-time comparison.
First we compare our NCP climatology with that off the southeast coast of the South Island of New Zealand, derived from the Munida time series (171.5 • E 45.85 • S) using a 13 C-based diagnostic box model (Brix et al., 2013).Their reported NCP climatology, 14.6-22.3mmol C m −2 d −1 , is in strong agreement with our constructed climatology of 22.1 mmol C m −2 d −1 , with a 95 % bootstrap confidence interval of 16.7-25.2mmol C m −2 d −1 , for the identical  In the Indian and Pacific sectors of the Southern Ocean, most of the in situ derived NCP, wich span from 1976 through 1997, are reasonably close to our values, given the uncertainty of the climatology as given by the bootstrap interval and the temporal variability as given by the seasonal standard deviation (Table 3a).The only exception is the previously reported high-end values over the southern and southwestern Ross Sea, derived using a seasonal DIC (dissolved inorganic carbon) budget approach by Sweeney et al. (2000).These regions are mostly located south of 75 • S (Regions I and II in Sweeney et al., 2000).The reason why our model may underestimate the NCP in the region may be due to the poor coverage of the predictors.The predictor, MLD, covers only a quasi-global domain with the southern boundary at 75 • S. The other two predictors, Chl and PAR, have only less than 60 % coverage in the area during the growing season.In Table 3b,  Originating from upstream shallow sediments, iron carried by ocean currents can fuel productivity in the waters downstream, leading to phytoplankton blooms.Such an island mass effect has been recorded near the Islands of Kerguelen, South Georgia, and Crozet (Bakker et al., 2007;Jouandet et al., 2008;Jones et al., 2012).Here we determine if our NCP data reproduce the island mass effect.Both upstream (outside the bloom) and downstream (inside the bloom) values are listed in Table 3c.Our data capture the upstream and downstream differences in all three island regions.The downstream (inside the bloom) values, however, are smaller than those reported for the Kerguelen Island and South Georgia, possibly due in part to area averaging over a coarse grid in our dataset.

Model comparison
Figures 4 and 5b show the basin-scale export rate estimates from two different models, one based on inverse modeling (GCM fitting to observation) (Schlitzer, 2002) and the other from a satellite NPP-export model calibrated to atmospheric O 2 / N 2 measurements (Nevison et al., 2012).Because only the January climatology is available from Nevison et al. (2012), we include our January map in Fig. 5a for comparison.We see that the spatial patterns in these two models are in broad agreement with the climatology of our data (Figs.2a and 5a), including regions of high carbon export along the zonal band between 40 and 60 • S, in the coastal upwelling zones off Chile and Namibia, and on the Patagonian Shelf.The main difference between the present study (Fig. 2a) and the inverse modeling result of Schlitzer (2002) (Fig. 4) is that in the latter, the climatology is smoother, the zonal band of high export rate is displaced more to the south, and the high export region off the coast of Chile is more spread out.The broader and smoother features in Schlitzer (2002) are likely due to the coarser spatial resolutions available at the time of the study.
The two January climatologies generally differ by less than 10 mmol C m −2 d −1 throughout the Southern Ocean, except for a large discrepancy greater than 100 mmol C m −2 d −1 over the Patagonian Shelf (Fig. 5c).Closer inspection reveals a wide range of NCP values in this region among models: 28-30 mmol C m −2 d −1 in our January map (Fig. 5a), 150-400 mmol C m −2 d −1 in that of Nevison et al. (2012)  The Patagonian Shelf region is known to exhibit highly variable biological activity owing to its uncertain relationships between phytoplankton communities and NCP (Schloss et al., 2007), complicated bathymetry, complex ocean dynamics (Bianchi et al., 2005;Romero et al., 2006;Rivas, 2006;Garcia et al., 2008), and multiple sources of iron, including atmospheric dust (Erickson et al., 2003;Gassó et al., 2010;Signorini et al., 2009;Boyd et al., 2012) and ocean upwelling, sediment resuspension, and shelf transport (Garcia et al., 2008;Signorini et al., 2009;Painter et   al., 2010).However, the scarcity of in situ measurements of longer timescales has hindered progress in establishing a reliable regional climatology and has made model validation challenging.

Air-sea CO 2 flux
We now compare our NCP with the air-sea CO 2 flux obtained from the monthly climatological maps of Takahashi et al. (2009) for the growing season.In Fig. 6a, a zonal band of high CO 2 flux is seen between 40 and 60 • S, similar to the zonal belt of CO 2 flux reported for the February climatology (Takahashi et al., 2012).Albeit with a much coarser  resolution ( 5• long × 4 • lat), the pattern of high CO 2 flux is in good agreement with the high NCP band (see Fig. 2a), outlined by the contour of NCP = 16 mmol C m −2 d −1 in Fig. 6a.The CO 2 flux, which is low compared with NCP, could result from CO 2 outgassing due to warming during the growing season, dampening the biologically driven CO 2 uptake (Takahashi et al., 2012).
Figure 6b shows the monthly mean of the area-integrated NCP, CO 2 flux, and SST south of 50 • S from October to March.The NCP starts to increase steadily from 0.6 Pg C yr −1 in October until it reaches a peak around 1 Pg C yr −1 in December, with a gentle decline from January to March.Similarly, the CO 2 flux changes from 0.2 Pg C yr −1 out of the ocean in October to 0.2 Pg C yr −1 into the ocean in December, peaks at 0.4-0.5 Pg C yr −1 in January and Febru-ary, lagging the NCP peak by 1-2 months, and declines thereafter.We see that the large difference between the CO 2 flux and NCP seems coincident with the fast increase in SST from October to January.This difference becomes smaller as the SST increase slows down from January to March.Overall, this large imbalance in the early growing season is suggestive of the dominance of the warming-induced CO 2 outgassing, but further investigation is warranted.

Discussion and conclusions
In this study we have described the methodology and general features of a 1998-2009 Southern Ocean NCP dataset constructed through a neural network approach.This effort represents the first attempt to construct such a dataset over the Southern Ocean or any large basin entirely on the basis of observed relationships between NCP measurements and NCP predictors.This approach is based on a self-organizing map analysis that assumes no parametric functional form between NCP and the predictors.Overall, we find that our constructed NCP dataset is in good agreement with previously published, independent in situ derived NCP values of weekly or longer temporal resolution through real-time as well as climatological comparisons at different sampling sites (Tables 2-3).One exception is the region south of 75 • S, where the predictor coverage is poor (Sect.4.2).
The growing season climatology of our constructed NCP reveals a pronounced zonal band of high NCP that approximately follows the STF between 40 and 60 • S in the Atlantic, Indian, and western Pacific sectors and turns southeastward shortly after the dateline (Fig. 2a).Other regions of elevated NCP include the area along the SBdy and Antarctic coast, the complex region of Patagonian Shelf and BMC zone, as well as the coastal upwelling zones off Chile and Namibia.This elongated zonal band resembles the observed air-sea CO 2 flux (Fig. 6a).The CO 2 flux is generally smaller than the NCP in the early growing season (Fig. 6b).This difference may result from the rapid temperature increase in the upper ocean during this period, which reduces the CO 2 solubility and possibly results in CO 2 outgassing partially countering the NCP-driven CO 2 uptake (Sect.4.3.2).However, additional investigation of this hypothesis is necessary in future studies.
The NCP climatological pattern is generally consistent with the expected NCP climatology based on the inverse model of Schlitzer (2002) (Fig. 4) and the carbon export model of Nevison et al. (2012) (Fig. 5b) with significant regional variations.The largest discrepancy appears in the Patagonian Shelf, where the estimated climatology ranges from 30 to 400 mmol C m −2 d −1 among models (Sect.4.3.1).Additional field campaigns targeting NCP measurements in this region would help to reduce this uncertainty.
The similarity in the climatological spatial distributions of NCP, POC, and Chl is readily seen but with notable differences, as evidenced by the pattern correlations of 0.33 and 0.66 between NCP versus log 10 (Chl) and NCP versus POC, respectively (Fig. 2, Sect.4.1).The low correlation between NCP and Chl may be due to the nonlinear relationship be- tween Chl and phytoplankton biomass, as the Chl concentration depends on both phytoplankton biomass and cellular pigmentation, which adjusts to growth conditions (Geider et al., 1996(Geider et al., , 1997(Geider et al., , 1998;;Behrenfeld and Boss, 2003;Brown et al., 2003;Le Bouteiller et al., 2003;Behrenfeld et al., 2005;Armstrong, 2006;Schultz, 2008;Westberry et al., 2008;Wang et al., 2009).Another possibility is that the standard ocean-color to Chl algorithm is not well calibrated for the Southern Ocean, as shown in recent studies (Mitchell and Kahru, 2009;Kahru and Mitchell, 2010;Johnson et al., 2013).
The fact that the NCP and POC climatologies bear stronger resemblance is consistent with the previous findings that POC production is the largest contributor to NCP in the Southern Ocean (Ogawa et al., 1999;Wiebinga and de Baar, 1998;Kaehler et al., 1997;Hansell and Carlson, 1998;Sweeney et al., 2000;Schlitzer, 2002;Ishii et al., 2002;Allison et al., 2010).We elaborate on this similarity by multiplying POC with MLD to arrive at a quantity we define as the POC inventory (mmol C m −2 ) and then by comparing the POC inventory with NCP in Fig. 7a.We use the monthly, 3 • × 3 • bin-averaged MLD product (2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009) derived from the Argo float profiles based on a temperature criterion (Kara et al., 2000) for this calculation  (http://apdrc.soest.hawaii.edu/).We see that the overall pattern of the POC inventory is similar to the NCP distribution.
Although the NCP and POC climatologies correspond well, some spatial variations of the POC-NCP relationship are evident.Such variations may result from true physical differences in the POC-NCP relationships and/or to errors related to the NCP estimates and satellite-derived POC estimates (Gardner et al., 2003;Stramski et al., 2008).To explore further, we show in Fig. 7b the scatter plot of NCP against POC, sorted by latitude bands, for each of the Southern Ocean grid points.This figure demonstrates that although there is a positive correlation between NCP and POC, the relationship appears not to be a simple linear relationship, with variations across different latitude bands.For example, the relationship between POC and NCP appears to be stronger for lower mean POC values at lower latitudes but weaker at higher latitudes poleward of 60 • S. If this variation is not due to measurement artifacts, this plot suggests that there may be some regions with high mean POC but relatively low NCP and vice versa.
Another possibility, however, is that there may be errors in the ship track NCP estimates in some regions characterized by strong vertical mixing of O 2 -undersaturated waters to the surface, as pointed out by Reuer et al. (2007).Although we excluded all negative NCP estimates from the SOM analysis, which correspond to regions of upwelled, O 2 -undersaturated water, it is possible that this vertical mixing effect still remains in some nonexcluded, positive estimates of NCP if the biological productivity of O 2 is strong enough.When this effect occurs, some of the regions with low NCP but high POC may have a negative bias from O 2 -undersaturated upwelled water.Future investigations are needed to determine to what degree the low NCP-high POC regions are a result of physical processes, a bias, or some combination of the two.
Strong correspondence between POC and NCP in the Southern Ocean on longer timescales suggests that as satellite POC observation becomes available for a longer time period, it can provide a direct view of carbon export variability with a reasonable amount of uncertainty.However, on shorter timescales, the correspondence between NCP and POC is weaker, as evidenced by the correlation of 0.20 in the daily ship track data (Sect.4.1) and pattern correlations of less than 0.5 in the monthly snapshots (supplementary material).In addition, a major obstacle in monitoring POC variability from satellite is cloud cover, as the Southern Hemisphere belt (between 30 and 65 • S) is among the cloudiest  regions on the planet (Haynes et al., 2011).Therefore, evaluation of NCP variability across a range of timescales requires consideration of the relationships between NCP and multiple variables, as in the present dataset.
Although these results show promise in providing insight into Southern Ocean NCP mean state and variability, substantial uncertainty in the NCP construction remains.On weekly timescales, uncertainty due to NCP variance unexplained by the predictors likely dominates, as we estimated an MAE and RMSE of 6.76 and 11.4 mmol C m −2 d −1 , respectively, based on the ship track data.For longer time averages, i.e. seasonal to decadal, errors in NCP measurements and limitations in ship track coverage likely dominate the uncertainty.As discussed above, efforts to remove possible biases related to the vertical mixing of O 2 -undersaturated water would reduce NCP measurement errors.
Regarding the latter source of uncertainty, additional field campaigns to measure Southern Ocean NCP, particularly in several data-sparse regions, would possibly lead to improved NCP constructions.Through a bootstrap approach to constructing several overlapping NCP datasets, we have quantified the variance in NCP climatology owing to the limitations of ship track coverage.This analysis has identified several regions where bootstrap climatology variance is high but the number of NCP observations is low or zero (see Fig. 3).This finding suggests that targeted measurements in these particular regions may help to constrain the relationships between NCP and each of the predictors, thus resulting in reduced uncertainty in the Southern Ocean NCP climatology and variability.
Notwithstanding these limitations, the dataset we present provides a new opportunity to investigate large-scale variability of NCP and its connections to the Southern Ocean carbon cycle in ways previously not possible in an observationbased dataset.A recent study suggests that a global algorithm for determining NCP may not capture regional NCP differences effectively (Li and Cassar, 2013) and depth-integrated productivity in different ocean regions (Campbell et al., 2002;Emerson et al., 2008) have challenged the NPP models, with particular difficulty, for example, in regions of extreme Chl (Carr et al., 2006) and coastal waters (Saba et al., 2011).Our data-driven approach may provide guidance to help correct for biases in the NPP models.Our constructed dataset may also offer the opportunity to investigate interannual NCP variability, even if only for a period of a little more than a decade (see supporting material for a preliminary example).As more NCP measurements and validation data become available, this dataset shall be continually refined, with the hope that applications expand as errors are reduced.
The Supplement related to this article is available online at doi:10.5194/bg-11-3279-2014-supplement.
Publications on behalf of the European Geosciences Union.C.-H. Chang et al.: Neural network-based estimates of Southern Ocean NCP

Figure 1 .
Figure 1.(a) The ship tracks of the in situ NCP measurements used in this study.The time of the research cruises is color-coded in months.(b) The histogram of the ship track NCP distribution.The red dashed line marks NCP = 180 mmol C m -2 d -1 .(c) Detailed distribution of NCP below the outlier threshold.

Figure 1 .
Figure 1.(a) The ship tracks of the in situ NCP measurements used in this study.The time of the research cruises is color-coded in months.(b) The histogram of the ship track NCP distribution.The red dashed line marks NCP = 180 mmol C m −2 d −1 .(c) Detailed distribution of NCP below the outlier threshold.
Figure2ashows the spatial distribution of 12-year growing season NCP climatology(1998)(1999)(2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009), superimposed with the major Southern Ocean fronts(Orsi et al., 1995).An elongated zonal band of high NCP (> 22 mmol C m −2 d −1 ) is seen approximately following the Subtropical Front (STF ∼ 40 • S), where macronutrient-rich subantarctic water converges with the macronutrient-poor subtropical water(Takahashi et al., 2012).It stretches from the southwest Atlantic across the south Indian Ocean to the western South Pacific, then splits with the STF east of the dateline around 170 • W and turns southeastward to about 120 • W. A sharp NCP gradient exists north of the front, with very low NCP throughout most of the subtropics except near large landmasses.Elevated NCP (≥ 20 mmol C m −2 d −1 ) is seen along the Southern Boundary (SBdy), the southernmost limit of the Antarctic Circumpolar Circulation (ACC), and along the Antarctic coast, including the Ross Sea and Amundsen Sea, where strong CO 2 sinks have recently been observed(Arrigo et al., 2008;Tortell et al., 2012).Between the STF and SBdy, we also see high NCP (> 25 mmol C m −2 d −1 ) in the complex region off southeastern South America between the Río de la Plata and the Falkland Islands, including the Patagonian Shelf and Brazil-Malvinas Confluence (BMC) zone, and in the vicinity of the Crozet Islands (48-60 • E, 42-49 • S), Kerguelen Island (67-95 • E, 45-55 • S), and South Georgia (34-42 • W, 50-55 • S).As discussed in the previous section, one of the limitations of this study is the limited availability and spatial coverage of NCP observations used in the SOM analysis for generalizing the relationships between NCP and each of the predictors.As a quantitative indication of how this limitation may impact the constructed NCP climatology, Fig.3shows the standard deviation, σ boot , of the growing season mean NCP from the 100 bootstrap NCP datasets (Sect.3.3), superimposed with the locations of each ship track.Largest uncertainties (σ boot ∼ 6-7 mmol C m −2 d −1 ) are found over the Patagonian Shelf, where ship track coverage is lacking.In regions of dense ship track coverage, the uncertainty is generally lower (σ boot < 3 mmol C m −2 d −1 ).The regions of high NCP bootstrap climatology standard deviation provide an indication of where targeted measurements in future studies may help to reduce the uncertainty of Southern Ocean NCP estimates.The climatologies ofPOC (2002POC ( -2009) )  andChl (1998Chl ( - 2007) )  are shown in Fig.2b and cfor comparison.Chl is in log scale due to its strong positive skewness.Overall, the regions of high mean NCP (Fig.2a) correspond well with regions of high mean POC and Chl.The area-weighted, centered pattern correlation coefficients are 0.66 and 0.33 for climatological NCP versus POC and versus log 10 (Chl), respectively.On the basis of these pattern correlations, the climatological POC and Chl fields are able to explain, in a linear sense, 44 and 11 % of the climatological NCP field, respectively.However, the temporal correlation between NCP

Figure 3 .
Figure 3. Standard deviation of the growing season mean NCP from the 100 bootstrap NCP datasets (σboot) (mmol C m -2 d -1 , in color) superimposed with the locations of each ship track (black lines).

Figure 3 .
Figure 3. Standard deviation of the growing season mean NCP from the 100 bootstrap NCP datasets (σ boot ) (mmol C m −2 d −1 , in color) superimposed with the locations of each ship track (black lines).
we examine the Atlantic sector.Because the in situ derived NCP are collected during the period 1998-2009, we provide the real-time NCP from our dataset for comparison.Our values agree well with previously reported values.For example, both our study and previous measurements determine relatively low NCP values near the Atlantic Polar Frontal Zone (PFZ) during March of 2008 (middle rows) but much higher values around 37 mmol C m −2 d −1 in the Atlantic-India sector in December of 2006 (bottom row).
Figures 4 and 5b show the basin-scale export rate estimates from two different models, one based on inverse modeling (GCM fitting to observation)(Schlitzer, 2002) and the other from a satellite NPP-export model calibrated to atmospheric O 2 / N 2 measurements(Nevison et al., 2012).Because only the January climatology is available fromNevison et al. (2012), we include our January map in Fig.5afor comparison.We see that the spatial patterns in these two models are in broad agreement with the climatology of our data (Figs.2a and 5a), including regions of high carbon export along the zonal band between 40 and 60 • S, in the coastal upwelling zones off Chile and Namibia, and on the Patagonian Shelf.The main difference between the present study (Fig.2a) and the inverse modeling result of Schlitzer (2002) (Fig.4) is that in the latter, the climatology is smoother, the zonal band of high export rate is displaced more to the south, and the high export region off the coast of Chile is more spread out.The broader and smoother features inSchlitzer (2002) are likely due to the coarser spatial resolutions available at the time of the study.The two January climatologies generally differ by less than 10 mmol C m −2 d −1 throughout the Southern Ocean, except for a large discrepancy greater than 100 mmol C m −2 d −1 over the Patagonian Shelf (Fig.5c).Closer inspection reveals a wide range of NCP values in this region among models: 28-30 mmol C m −2 d −1 in our January map (Fig.5a), 150-400 mmol C m −2 d −1 in that ofNevison et al. (2012) (Fig. 5b), 40-140 mmol C m −2 d −1 in Westberry et al. (2012) (not shown), and 50-60 mmol C m −2 d −1 in Schlitzer's annual mean (2002) (Fig. 4).High daily NCP of 70-90 mmol C m −2 d −1 has been measured over periods of 3-4 days in this region (Schloss et al., 2007), although it is unclear if such high values over short time periods are representative of the monthly climatology.The Patagonian Shelf region is known to exhibit highly variable biological activity owing to its uncertain relationships between phytoplankton communities and NCP(Schloss et al., 2007), complicated bathymetry, complex ocean dynamics(Bianchi et al., 2005;Romero et al., 2006;Rivas, 2006;Garcia et al., 2008), and multiple sources of iron, including atmospheric dust(Erickson et al., 2003;Gassó et al., 2010;Signorini et al., 2009;Boyd et al., 2012) and ocean upwelling, sediment resuspension, and shelf transport(Garcia et al., 2008;Signorini et al., 2009;Painter et

Figure 6 .
Figure 6.Comparison with air-sea CO 2 flux (a) November−March climatology of air-sea CO 2 flux [Takahashi et al., 2009] (mmol C m -2 d -1 , in color) superimposed with the contour of NCP = 16 mmol C m -2 d -1 .(b) Evolution of monthly mean area-integrated (>50 o S) NCP (red), CO 2 flux (blue), and SST (green) from October to March.The left y-axis corresponds with the NCP and CO 2 (Pg/yr) and the right y-axis corresponds with SST ( o C).Note that the air-sea CO 2 flux is defined positive into the ocean.

Figure 6 .
Figure 6.Comparison with air-sea CO 2 flux (a) November-March climatology of air-sea CO 2 flux (Takahashi et al., 2009) (mmol C m −2 d −1 ; in color) superimposed with the contour of NCP = 16 mmol C m −2 d −1 .(b) Evolution of monthly mean area-integrated (> 50 • S) NCP (red), CO 2 flux (blue), and SST (green) from October to March.The left y axis corresponds with the NCP and CO 2 (Pg yr −1 ) and the right y axis corresponds with SST ( • C).Note that the air-sea CO 2 flux is defined positive into the ocean; a positive flux represents transport into the ocean.

For these www.biogeosciences.net/11/3279/2014/ Biogeosciences, 11, 3279-3297, 2014
• × 1 • NCP maps for the period between 1998 and 2009 over the Southern Ocean by calculating NCP from weekly maps of up to six of the gridded predictor variables described in the previous section.

Table 1 .
Liu et al. (2006)entage of predictor variables in the ship track and weekly gridded data used to generate NCP maps.neighbors in the lattice at the end of training.The readers are referred toLiu et al. (2006)for a description of the neighborhood function and lattice.In practice, the performance of the SOM analysis tends to be most sensitive to the chosen number of neurons and to the final neighborhood radius.If the number of neurons (i.e., clusters) is too large and/or the final neighborhood radius is too small, then the clusters may fit the training data too closely and the statistical model may be overfit for NCP prediction.In contrast, if the number of neurons is too small and/or the final neighborhood radius is too large, then the statistical model may not capture the range of NCP variability accurately.
exclude the candidates that show an increase of NCP from February to March.The rest of the candidate NCP constructions show similar climatological features, except for a few that produce unexpectedly high mean NCP in regions of relative minima in both POC and Chl.Because these regions are outside the ship track coverage, we believe the unexpectedly high NCP estimates to be the result of overfitting to the ship tracks, which target high NCP regions.

Table 2 .
Comparison of area-integrated NCP for the Southern Ocean south of 50 • S. NCP (mmol C m −2 d −1 )

Table 3a .
Climatological comparison of independent in situ NCP measurements and the constructed NCP.

Table 3b .
Real-time comparison of independent in situ NCP measurements and the constructed NCP.

Table 3c .
Real-time comparison of island mass effect.HNLC denotes high-nutrient, low-chlorophyll.