Derivation of seawater p CO 2 from net community production identifies the South Atlantic Ocean as a CO 2 source

. A key step in assessing the global carbon budget is the determination of the partial pressure of CO 2 in seawater ( p CO 2 (sw) ). Spatially complete observational fields of p CO 2 (sw) are routinely produced for regional and global ocean carbon budget assessments by extrapolating sparse in situ measurements of p CO 2 (sw) using satellite observations. Within this process, satellite chlorophyll a (Chl a ) is often used as a proxy for the biological drawdown or release of CO 2 . Chl a does not 10 however quantify carbon fixed through photosynthesis and then respired, which is determined by net community production (NCP). In this study, p CO 2 (sw) over the South Atlantic Ocean is estimated using a feed forward neural network (FNN) scheme and either satellite derived NCP, net primary production (NPP) or Chl a to compare which biological proxy is the most accurate. Estimates of p CO 2 (sw) using NCP, NPP or Chl a were similar, but NCP was more accurate for the Amazon Plume and 15 upwelling regions, which were not fully reproduced when using Chl a or NPP. A perturbation analysis assessed the potential maximum reduction in p CO 2 (sw) uncertainties that could be achieved by reducing the uncertainties in the satellite biological parameters This illustrated further improvement for NCP compared to NPP or Chl a . Using NCP to estimate p CO 2 (sw) showed that the South Atlantic Ocean is a CO 2 source, whereas if no biological parameters are used in the FNN (following existing annual carbon assessments), this region becomes a sink for CO 2 . These results highlight that using NCP improved 20 the accuracy of estimating p CO 2 (sw) , and changes the South Atlantic Ocean from a CO 2 sink to a source. Reducing the uncertainties in NCP derived from satellite parameters will further improve our ability to quantify the global ocean CO 2 sink. NCP the area switched to being a source of CO 2 . These results indicate that in regions where biological activity is important in controlling the variability in p CO 2 (sw) , the use of NCP, which is available from satellite data, is important for quantifying the ocean carbon pump, and for providing data in areas that are sparsely covered by observations such as the Southern Ocean.


Introduction
Since the industrial revolution, anthropogenic CO2 emissions have resulted in an increase in atmospheric CO2 concentrations 25 (Friedlingstein et al., 2020;IPCC, 2013). By acting as a sink for CO2, the oceans have buffered the increase in anthropogenic atmospheric CO2, without which the atmospheric concentration would be 42-44 % higher (DeVries, 2014). The long-term absorption of CO2 by the oceans is altering the marine carbonate chemistry of the ocean, resulting in a lowering of pH, a process known as ocean acidification (Raven et al., 2005). Observational fields of the partial pressure of CO2 in seawater (pCO2 (sw)) are one of the key datasets needed to routinely assess the strength of the oceanic CO2 sink (Friedlingstein et al., 30 2020;Landschützer et al., 2014Landschützer et al., , 2020Rödenbeck et al., 2015;Watson et al., 2020b). These methods are reliant on the extrapolation of sparse in situ observations of pCO2 (sw) using satellite observations of parameters which account for the variability of, and the controls on, pCO2 (sw) (Shutler et al., 2020). These parameters include sea surface temperature (SST; e.g. Landschützer et al., 2013;Stephens et al., 1995), salinity and chlorophyll a (Chl a) (Rödenbeck et al., 2015). SST and salinity control pCO2 (sw) by changing the solubility of CO2 in seawater (Weiss, 1974), whilst biological processes such as 35 photosynthesis and respiration contribute by modulating its concentration.
Chl a is routinely used as a proxy for this biological activity (Rödenbeck et al., 2015), but it does not distinguish between carbon fixation through photosynthesis and the carbon respired by the plankton community. Net primary production (the net carbon fixation rate; NPP) is determined by the standing stock of phytoplankton, for which the Chl a concentration is used as a proxy, and modified by the photosynthetic rate and the available light in the water column (Behrenfeld et al., 2016). 40 Photosynthetic rates are, in turn, modified by ambient nutrient and temperature conditions (Behrenfeld and Falkowski, 1997;Marañón et al., 2003). Elevated Chl a does not always equate to elevated NPP (Poulton et al., 2006), and for the same Chl a concentrations, NPP can vary depending on the health and metabolic state of the plankton community. All of these controls are captured by the net community production (NCP), which is the metabolic balance of the plankton community resulting from the carbon fixed through photosynthesis and that lost through respiration. When NCP is positive, the plankton 45 community is autotrophic which implies that there is a drawdown of CO2 from seawater (since the plankton reduce the CO2 in the water column). Where NCP is negative the community is heterotrophic implying a release of CO2 into the ocean (as the plankton produce or release CO2) which can then be released into the atmosphere (Jiang et al., 2019;Schloss et al., 2007). Using NCP to estimate pCO2 (sw) compared to Chl a should theoretically lead to an improvement in the derivation of pCO2 (sw). 50 Many studies have used satellite Chl a to estimate pCO2 (sw) at both regional (Benallal et al., 2017;Chierici et al., 2012;Moussa et al., 2016), and global scales Liu and Xie, 2017). Chierici et al. (2012) attempted to use satellite NPP to estimate pCO2 (sw) in the southern Pacific Ocean, but there was no significant improvement over using satellite Chl a. This is not surprising as NPP captures more of the biological signal, but still lacks any inclusion of respiration which results in the release of CO2 into the water column. To our knowledge the use of satellite NCP to estimate pCO2 (sw) 55 has not been attempted before and could be a means of improving estimates of pCO2 (sw) as long as satellite NCP observations are accurate (Ford et al., 2021b;Tilstone et al., 2015a). These satellite measurements may improve the estimation of pCO2 (sw) as NCP includes the full biological control on pCO2 (sw). This is particularly important in regions where in situ pCO2 (sw) observations are sparse and where interpolation and neural network techniques are therefore likely to struggle (Watson et al., 2020b). 60 The South Atlantic Ocean is under sampled with limited pCO2 (sw) observations (e.g. Fay and McKinley, 2013;Watson et al., 2020b). The region is varied and dynamic as it includes the seasonal Equatorial upwelling, high biological activity on the south-western (Dogliotti et al., 2014) and south-eastern shelves (Lamont et al., 2014), as well as the propagation of the 2. Methods 2.1. Surface Ocean Carbon Atlas (SOCAT) pCO2 (sw) and atmospheric CO2 85 SOCATv2020 (Bakker et al., 2016;Pfeil et al., 2013) individual fugacity of CO2 in seawater (fCO2 (sw)) observations were downloaded from https://www.socat.info/index.php/data-access/. Data were extracted from 2002 to 2018 for the South Atlantic Ocean (10º N-60º S, 25º E-80º W; Fig. 1b). The individual cruise observations were collected from different depths, and are not representative of the fCO2 (sw) in the top ~100 μm of the ocean, where gas exchange occurs (Goddijn-Murphy et al., 2015;Woolf et al., 2016). Therefore, the SOCAT observations were re-analysed to a standard temperature dataset and 90 depth (Reynolds et al., 2002) that is considered representative of the bottom of the mass boundary layer .
This was achieved using the 'fe_reanalyse_socat' utility in the open source FluxEngine toolbox Shutler et al., 2016), which follows the methodology described in Goddijn- Murphy et al. (2015). The reanalysed fCO2 (sw) observations were converted to pCO2 (sw), and gridded onto 1º monthly grids following SOCAT protocols (Sabine et al., 2013). The uncertainties in the in situ data were taken as the standard deviation of the observations in each grid cell, or 95 where a single observation exists were set as 5 μatm following Bakker et al. (2016).

Moderate Resolution Spectroradiometer on Aqua (MODIS-A) satellite observations
4 km resolution monthly mean Chl a were calculated from MODIS-A Level 1 granules, retrieved from National Aeronautics and Space Administration (NASA) Ocean Colour website (https://oceancolor.gsfc.nasa.gov/) using SeaDAS v7.5, and applying the standard OC3-CI Chl a algorithm (https://oceancolor.gsfc.nasa.gov/atbd/chlor_a/). In addition, monthly mean MODIS-A SST and photosynthetically active radiation (PAR) were also downloaded from the NASA Ocean Colour website. 110 Mean monthly NPP were generated from MODIS-A Chl a, SST and PAR using the Wavelength Resolving Model (Morel, 1991) with the look up table described in Smyth et al. (2005). Coincident mean monthly NCP using the algorithm NCP-D described in Tilstone et al. (2015a) were generated using the MODIS-A NPP and SST data. Further details of the satellite algorithms are given in O' Reilly et al. (1998;2019), Hu et al. (2012 for Chl a, Smyth et al. (2005), Tilstone et al. ( , 2009) for NPP and Tilstone et al. (2015a) for NCP. These satellite algorithms were the most accurate for the South Atlantic 115 Ocean in an algorithm inter-comparison which accounting for the uncertainties in both in situ, model and input data (Ford et al., 2021b). All monthly mean data were generated between July 2002 and December 2018 and were re-gridded onto the same 1º grid as the pCO2 (sw) observations. The assessed uncertainties from the literature for each of the input parameters used are given in Table 1. 120

Feed forward neural network scheme
The South Atlantic Ocean was partitioned into 8 biogeochemical provinces (Fig. 1a), following Longhurst et al. (1995) and 125 Longhurst (1998). The pCO2 (sw) observations in the eastern Equatorial Atlantic were sparse, and therefore the Equatorial region was merged into 1 province. In each province the available monthly pCO2 (sw) observations were matched to temporally and spatially coincident pCO2 (atm), MODIS-A, NCP and SST, to provide training data for the feed forward neural network (FNN). Observations in coastal regions (< 200 m water depth) were removed from the analysis, due to the increased uncertainty in ocean colour observations in these areas (e.g. Lavender et al., 2004). Due to constraints on the coverage of 130 ocean colour data, no data were available in austral winter below ~50º S.
The coincident observations in each province were randomly split into 3 datasets: 1.) A training dataset (50 % of the observations) used to train the FNNs; 2.) A validation dataset (30 % of the observations) used to assess the performance of the FNN and to prevent the networks from overfitting; 3.) An independent test dataset (20 % of the observations) to assess the final performance of the FNN, with observations that are independent of the network training. The optimal split (ropt) 135 method of Amari et al. (1997) was used to partition the input data into these three sets, as follows: where m is number of input parameters. For our three input parameters, an optimal split of 60 % training data to 40 % validation data would occur, where we removed 10 % from each dataset to provide a further independent test dataset. A pretraining step was used to determine the optimum number of hidden neurons in the FNN (Benallal et al., 2017;Landschützer 140 et al., 2013;Moussa et al., 2016), to provide the best fit for the observations, whilst preventing over fitting (Demuth et al., 2008).
The FNNs consist of 1 hidden layer with between 2 and 30 nodes depending on the pre-training step and 1 output layer. The networks were trained using the optimum number of hidden neurons, in an iterative process until the Root Mean Square Difference (RMSD) remained unchanged for 6 iterations. The best performing FNN, with the lowest RMSD was then used 145 to estimate pCO2 (sw). The uncertainties in the input parameters were propagated through the FNN, using a Monte Carlo uncertainty propagation, where 1000 calculations were made perturbing the input parameters, using random noise for their uncertainty ( Table 1). The output from the 8 province FNNs were then combined and weighted statistics, which account for both the satellite and in situ uncertainty, were used to assess the overall performance of the FNN (as used within Ford et al., 2021b). The combined 8 FNNs approach will hereafter be referred to as  The approach to training the FNNs was repeated replacing NCP with Chl a or NPP sequentially, to determine if there was an improvement by using NCP. Chl a and NPP estimates were log10 transformed before input into the FNN, due to their respective uncertainties being determined in log10 space (Table 2). A baseline SA-FNN with no biological parameters as input was trained using pCO2 (atm) and MODIS-A SST (SA-FNNNO-BIO-1). A second SA-FNN with no biological parameters (SA-FNNNO-BIO-2) was trained with the addition of sea surface salinity and mixed layer depth from the Copernicus Marine 155 Environment Modelling Service (https://resources.marine.copernicus.eu/) global ocean physics reanalysis product (GLORYS12V1). This parameter combination (pCO2 (atm), SST, salinity and mixed layer depth) has recently been included within a neural network scheme to estimate global fields of pCO2 (sw) (Watson et al., 2020b).
Following these methods, a monthly mean time-series of pCO2 (sw) was generated in the South Atlantic Ocean, applying the SA-FNN approach using NCP (SA-FNNNCP), NPP (SA-FNNNPP), Chl a (SA-FNNCHLA) or no biological parameters (SA-160 FNNNO-BIO-1 and SA-FNNNO-BIO-2). The pCO2 (sw) fields were spatially averaged using a 3×3 pixel filter, but were not averaged temporally as in previous studies (Landschützer et al., , 2016 because averaging temporally could mask features that occur within single months of the year. The uncertainties in the input parameters (Table 1) were propagated through the neural network on a per pixel basis, and combined in quadrature with the RMSD of the test dataset, to produce a combined uncertainty budget for each pixel, assuming all sources of uncertainty are independent and uncorrelated (BIPM, 165 2008;Taylor, 1997).

Atlantic Meridional Transect in situ data
To assess the accuracy of the SA-FNN, coincident in situ measurements of NCP, NPP, Chl a, SST,pCO2 (atm) and pCO2 (sw), with uncertainties, were provided by Atlantic Meridional Transects 20, 21, 22 and 23 in 2010Transects 20, 21, 22 and 23 in , 2011Transects 20, 21, 22 and 23 in , 2012Transects 20, 21, 22 and 23 in and 2013 respectively. All the Atlantic Meridional Transect data described in this section can be obtained from the British 170 Oceanographic Data Centre (https://www.bodc.ac.uk/). Chl a was computed following the methods of Brewin et al. (2016), using underway continuous spectrophotometric measurements, and uncertainties were estimated as ~0.06 log10(mg m -3 ) (Ford et al., 2021b). 14 C based NPP measurements were made based on dawn to dusk simulated in situ incubations, following the methods given in Tilstone et al. (2017), at 56 stations with a per station uncertainty. Uncertainties ranged between 8 and 213 mg C m -2 d -1 and were on average 53 mg C m -2 d -1 . NCP was estimated using in vitro changes in 175 dissolved O2, following the methods of Gist et al. (2009) and Tilstone et al. (2015a) at 51 stations with a per station uncertainty calculated. Uncertainties ranged between 5 and 25 mmol O2 m -2 d -1 and were on average 14 mmol O2 m -2 d -1 .
Underway measurements of pCO2 (sw) and pCO2 (atm) were performed continuously, following the methods of Kitidis et al. (2017). SST was continuously measured alongside all observations (SeaBird SBE45), with a factory calibrated uncertainty of ±0.01 °C. The mean of underway pCO2 (sw), pCO2 (atm), SST and Chl a were taken ±20 minutes around each station where 180 NCP and NPP were measured. These pCO2 (sw) observations (N≈200) were removed from the SOCATv2020 dataset so that the Atlantic Meridional Transect data remained independent from the training and validation datasets.

Perturbation analysis
Following the approach of Saba et al. (2011), a perturbation analysis was conducted, to evaluate the potential reduction in SA-FNN pCO2 (sw) RMSD that could be attributed to the input parameters. The analysis indicates the maximum reduction in 185 RMSD that could be achieved if uncertainties in the input parameters were reduced to ~0. Each of the input parameters; NCP, SST and pCO2 (atm) can have three possible values for each in situ pCO2 (sw) observation (original value, original ± uncertainty; Table 1), enabling 27 perturbations of the input data as input to the SA-FNN. For each in situ pCO2 (sw) observation, the 27 perturbations of SA-FNN pCO2 (sw) were examined, and the perturbation that produced the lowest RMSD and bias combination was selected. The RMSD and bias were calculated between all the in situ pCO2 (sw) and the selected 190 perturbations. The percentage difference between this RMSD and the original RMSD when training the SA-FNN was calculated to indicate the maximum achievable reduction. This approach was conducted for two scenarios; (1) uncertainty in individual input parameters (NCP, SST and pCO2 (atm)) and (2) uncertainty in all input parameters together. The approach was conducted on all three training datasets, and on the Atlantic Meridional Transect in situ data. The analysis was repeated sequentially replacing NCP with Chl a and NPP, to determine if there was a greater maximum reduction in RMSD using 195 NCP. The analysis was also conducted allowing for a 10 % reduction in input parameter uncertainties, to indicate the shortterm reduction in pCO2 (sw) RMSD that could be achieved by reducing the input parameter uncertainties.

Comparison of the SA-FNNNCP with the SA-FNNNO-BIO, SA-FNNCHLA, SA-FNNNPP and 'state of the art' data for the South Atlantic
The most comprehensive pCO2 (sw) fields to date are from Watson et al. (2020bWatson et al. ( , 2020a. The 'standard method' pCO2 (sw) 200 fields within the Watson et al. (2020bWatson et al. ( , 2020a data were produced by extrapolating the in situ reanalysed SOCATv2019 pCO2 (sw) observations using a self-organising map feed forward neural network approach (Landschützer et al., 2016), and will be referred to as 'W2020'. A time-series was extracted from the W2020 data, coincident with SA-FNNNCP, SA-FNNNPP, SA-FNNCHLA and the two SA-FNNNO-BIO variants. For the six methods, a monthly climatology referenced to the year 2010 was computed, assuming an atmospheric CO2 increase of 1.5 μatm yr -1 (Takahashi et al., 2009;Zeng et al., 2014). The 205 climatology should be insensitive to the assumed rise in atmospheric CO2 due to the reference year being central to the time series. The standard deviation of this climatology was also computed on a per pixel basis.
The stations ( Fig. 1) are representative of locations from previous literature that analysed the variability of in situ pCO2 (sw) in the South Atlantic Ocean. For each station, the monthly climatology of pCO2 (sw), representing the average seasonal cycle of pCO2 (sw), and the standard deviation of the climatology, as an indication of the interannual variability, were extracted from 210 the six approaches. The pCO2 (sw) value for each station was the statistical mean of the four nearest data points weighted by their respective proximity to the station coordinate. In situ pCO2 (sw) observations from the SOCATv2020 Flag E dataset were also extracted for stations A and B (Fig. 1a), and a climatology was generated. These observations represent data from the Prediction and Research Moored Array in the Atlantic (PIRATA) buoys at these locations (Bourlès et al., 2008).
The station climatologies for the SA-FNNNO-BIO-1, SA-FNNNO-BIO-2, W2020, SA-FNNCHLA, and SA-FNNNPP were compared 215 to the SA-FNNNCP, by testing for significant differences in the seasonal cycle and annual pCO2 (sw) (offset). The seasonal cycles (seasonality) were compared using a non-parametric Spearman's correlation and deemed statistically different where the correlation was not significant (α < 0.05). A non-parametric Kruskal-Wallis was used to test for significant (α < 0.05) differences in the annual pCO2 (sw), indicating an offset between the two tested climatologies. The Southern Ocean station (station H) was excluded from the statistical analysis due to missing data in the SA-FNN. 220

Estimation of the bulk CO2 flux
The flux of CO2 (F) between the atmosphere and ocean (air-sea) can be expressed in a bulk parameterisation as: Where k is the gas transfer velocity, and αw and αs are the solubility of CO2 at the base and top of the mass boundary layer at the sea surface respectively . k was estimated from ERA5 monthly reanalysis wind speed (downloaded 225 from the Copernicus Climate Data Store; https://cds.climate.copernicus.eu/) following the parameterisation of Nightingale et al. (2000). The parameter αw was estimated as a function of SST and sea surface salinity (Weiss, 1974) using the monthly Optimum Interpolated SST (Reynolds et al., 2002) and sea surface salinity from the Copernicus Marine Environment Modelling Service global ocean physics reanalysis product (GLORYS12V1). The αs parameter was estimated using the same temperature and salinity datasets but included a gradient from the base to the top of mass boundary layer of -0.17 K (Donlon 230 et al., 1999) and +0.1 salinity units . pCO2 (atm) was estimated using the dry mixing ratio of CO2 from the NOAA-ESRL marine boundary layer reference, Optimum Interpolated SST (Reynolds et al., 2002) applying a cool skin bias (0.17K; Donlon et al., 1999) and sea level pressure following Dickson et al. (2007). Spatially and temporally complete pCO2 (sw) fields, which are representative of pCO2 (sw) at the base of the mass boundary layer, were extracted from the SA-FNNNCP, SA-FNNNPP, SA-FNNCHLA, SA-FNNNO-BIO-1, SA-FNNNO-BIO-2 and W2020. 235 The monthly CO2 flux was calculated using the open source FluxEngine toolbox Shutler et al., 2016) between 2003 and 2018 for the six pCO2 (sw) inputs, using the 'rapid transport' approximation (described in Woolf et al., 2016). The net annual flux was determined for the South Atlantic Ocean (10° N-44° S; 25° E-70° W) using the 'fe_calc_budgets.py' utility within FluxEngine with the supplied area and land percentage masks. The mean net annual flux was calculated as the mean of the 15 year net annual fluxes. Positive net fluxes indicate a net source to the atmosphere, and 240 negative net fluxes a sink.

SA-FNN performance and perturbation analysis
The performance of the SA-FNN trained using pCO2 (atm), SST and NCP for the three training datasets is given in Fig. 2. The SA-FNNNCP had an accuracy (RMSD) of 21.68 μatm and a precision (bias) of 0.87 μatm, which was determined with the independent test data (N = 1300). Training the SA-FNN using Chl a or NPP instead of NCP, resulted in a similar performance (Appendix A Fig. A1, Fig. A2). The RMSD for the independent test data was within ~1.5 μatm for Chl a (19.88 265 μatm), NPP (20.48 μatm) and NCP (21.68 μatm) and bias near zero.
The reduction in pCO2 (sw) RMSD that could be achieved if input parameter uncertainties were reduced to ~0 was assessed using the perturbation analysis (Table 2, Appendix A Table A1). This showed that across the three training and validation datasets, satellite NCP uncertainties lead to a 36 % reduction in pCO2 (sw) RMSD, a reduction of 34 % for NPP and 19 % reduction for Chl a. The bias remained near zero for all parameters indicating good precision of the SA-FNN approach (not 270 shown). Applying the Atlantic Meridional Transect in situ data as input to the SA-FNN and using the perturbation analysis, a decrease in pCO2 (sw) RMSD of 25 % for NCP, 13 % for NPP and 7 % for Chl a was observed.
The reduction in pCO2 (sw) RMSD from reducing input parameter uncertainties by 10 % was also assessed through the perturbation analysis (Table 3). This indicated a decrease in pCO2 (sw) RMSD of 8 % for NCP, 5 % for NPP and 2 % for Chl a, again indicating that improving NCP uncertainties has the largest impact on improving the estimated pCO2 (sw) fields. 275  Fig. 3g and Fig. 3d.

Comparison between SA-FNNNCP and other methods
The monthly climatology of pCO2 (sw) generated using the SA-FNNNCP and referenced to the year 2010 showed differences 285 with two published climatologies, especially in the Equatorial region (Appendix B). The monthly climatology for 8 stations ( Fig. 1) were extracted from the SA-FNNNCP, SA-FNNNPP, SA-FNNCHLA, SA-FNNNO-BIO-1, SA-FNNNO-BIO-2 and the W2020, to assess differences between the pCO2 (sw) estimates (Fig. 3). The SA-FNNNCP and SA-FNNNO-BIO-1 showed significant divergence in the Equatorial Atlantic (Figs. 3b, f, g; Fig. 4). At the eastern equatorial station, the interannual variability in pCO2 (sw) from the SA-FNNNCP was high and a minimum occurred between January and April, which gradually increased to a 290 maximum in September and October (Fig. 3b). The SA-FNNNO-BIO-1 showed no seasonality in the pCO2 (sw), and was consistently below the SA-FNNNCP pCO2 (sw). The Gulf of Guinea station showed a similar variability in the SA-FNNNCP pCO2 (sw) except that the maxima was lower at this station (Fig. 3f). The SA-FNNNO-BIO-1 indicated pCO2 (sw) below the SA-FNNNCP throughout the year. The greatest divergence occurred near the Amazon plume (Fig. 3g) where SA-FNNNCP pCO2 (sw) was below or at pCO2 (atm) for all months and there was a large interannual variability in pCO2 (sw). The SA-FNNNO-BIO-1 295 displayed higher pCO2 (sw) and a lower interannual variability (Fig. 3g).
The SA-FNNNCP and SA-FNNNO-BIO-1 showed no significant difference in the seasonal patterns of pCO2 (sw) at stations south of 20 °S (Figs. 3c, d, e; Fig. 4). There was, however, a significant offset at some stations where the SA-FNNNCP generally exhibited lower pCO2 (sw) in austral summer and a higher interannual variation. The SA-FNNNCP was significantly different to the W2020 and SA-FNNNO-BIO-2 at similar stations as the SA-FNNNO-BIO-1 (Fig. 3, Fig. 4). 300 The SA-FNNNCP and SA-FNNCHLA showed significant differences in pCO2 (sw) values in the South Benguela and Amazon Plume. In the South Benguela ( Fig. 3e; Fig. 4), SA-FNNNCP has pCO2 (sw) maxima in austral summer, whereas the SA-FNNCHL maximum occurs in austral winter. In the Amazon Plume there was significant offset between the two methods and the SA-FNNCHL resulted in lower pCO2 (sw) compared to the SA-FNNNCP ( Fig. 3g; Fig. 4). The SA-FNNNCP and SA-FNNNPP had a significant offset at the Eastern Equatorial station (Fig. 3c; Fig. 4), where the SA-FNNNPP indicated lower pCO2 (sw). 305 For the other stations, no significant differences were observed.

Assessment of biological parameters to estimate pCO2 (sw)
In this paper, the differences in estimating pCO2 (sw) using FFNs with satellite derived NCP, NPP or Chl a were assessed. The SA-FNNNCP had an overall accuracy (21.68 μatm; Fig. 2) that is consistent with other approaches that have been developed 315 for the Atlantic (22.83 μatm; Landschützer et al., 2013), and slightly lower than the published global result of 25.95 μatm . Training the SA-FNN using Chl a or NPP showed comparable broad-scale accuracy to NCP.
When the uncertainties in the input parameters were investigated however, differences in the estimates of pCO2 (sw) were apparent. The perturbation analysis indicated that up to a 36 % improvement in estimating pCO2 (sw) could be achieved if NCP data uncertainties were reduced ( Table 2). A similar improvement could be obtained if the NPP uncertainties were 320 reduced ( Table 2). Ford et al. (2021b) showed that up to 40 % of the uncertainty in satellite NCP is attributed to the uncertainty in satellite NPP, which is an input to the NCP approach. This suggests that improvements in estimating NPP from satellite data will lead to a further improvement in estimating pCO2 (sw) from NCP. These improvements could be achieved through advances in the water column light field (e.g. Sathyendranath et al., 2020), better estimation of the vertical variability of input parameters or assignment of photosynthetic parameters (e.g. Kulk et al., 2020), for example. For a 325 discussion on improving satellite NPP estimates we refer the reader to Lee et al. (2015).
To uncouple the Chl a, NPP and NCP estimates and their uncertainties, the perturbation analysis was also conducted on Atlantic Meridional Transect in situ observations. This showed that reducing in situ NCP uncertainties provided the greatest reduction in pCO2 (sw) RMSD, which was three times the reduction achievable using Chl a (Table 2; Table 3). This indicates that the optimal predictive power of Chl a to estimate pCO2 (sw) has been reached and to achieve further improvements in 330 estimates of pCO2 (sw) and reduction in its associated uncertainty, requires the use of NCP.
A reduction of input uncertainties to ~0 is near impossible, but a reduction by 10 % could be feasible (e.g. NCP uncertainty reduced from 45 to 40.5 mmol O2 m -2 d -1 ; Table 1). A perturbation analysis conducted for this showed similar results, with NCP producing the greatest reduction in pCO2 (sw) RMSD of 8 % compared to 2 % for Chl a (Table 3). Thus reducing NCP uncertainties will provide a greater improvement in pCO2 (sw) compared to reducing the uncertainties in Chl a. 335 These improvements in estimating NCP could be achieved through many components. Ford et al. (2021b) showed 40 % of satellite NCP uncertainties were attributed to in situ NCP uncertainties. The in situ bottle incubation measurements could be improved using the principles of Fiducial Reference Measurements (FRM; Banks et al., 2020), which are traceable to metrology standards, referenced to inter-comparison exercises, with a full uncertainty budget. This becomes complicated however, when considering the number of different methods to measure NCP and the large divergence between them 340 (Robinson et al., 2009). A review of these methods has already been conducted (Duarte et al., 2013;Ducklow and Doney, 2013;Williams et al., 2013). The methods broadly fall into the following categories: a.) in vitro incubations of samples under light/dark treatments (Gist et al., 2009) and b.) in situ observations of oxygen to argon (O2/Ar) ratios (Kaiser et al., 2005) or the observed isotopic signature of oxygen (Kroopnick, 1980;Luz and Barkan, 2000). All of these methods are subject to, but do not account for, the photochemical sink which may lead to underestimation of in vitro NCP by up to 22 % 345 (Kitidis et al., 2014). Independent ground measurements that use accepted protocols for the in vitro method are currently made on the Atlantic Meridional Transect, however a community consensus should consider a consistent methodology for NCP. Increasing the number of such observations for the purpose of algorithm development, would further constrain the NCP, but also provide observations across the lifetime of newly launched satellites. The uncertainties on each in vitro measurement are assessed through replicate bottles which could be used to calculate a full uncertainty budget for each NCP 350 measurement when combined with analytical uncertainties. Serret et al. (2015) indicated that NCP is controlled by both the heterogeneity in NPP and respiration. The satellite NCP algorithm applied in this study accounts for some of the heterogeneity in respiration, through an empirical SST to NCP relationship (Tilstone et al., 2015a). Quantifying the variability in respiration could further improve NCP estimates when coupled with NPP rates from satellite observations. 355

Accuracy of SA-FNNNCP pCO2 (sw) at seasonal and interannual scales
The seasonal and interannual variability of pCO2 (sw) estimated using the SA-FNNNCP was compared with the SA-FNNNO-BIO, W2020 (Watson et al., 2020b), SA-FNNCHL and SA-FNNNPP at 8 stations. The stations (Fig. 1) represent locations of previous studies into in situ pCO2 (sw) variability allowing comparisons with literature values. Significant differences between the SA-FNNNCP and SA-FNNNO-BIO were observed at four stations (Fig. 4), especially in the Equatorial Atlantic. 360 At 8° N 38° W (Fig. 3a), Lefèvre et al. (2020) reported pCO2 (sw) to be stable at ~400 μatm, between June and August 2013, and to decrease in September to ~360 μatm, which is attributed to the Amazon Plume propagating into the western Equatorial Atlantic (Coles et al., 2013). Bruto et al. (2017) indicated however, that elevated pCO2 (sw) at ~430 μatm exist from 2008 to 2011. The PIRATA buoy pCO2 (sw) observations (Fig. 3a) clearly highlight the differences between these years, but there are less than 4 years of monthly observations available, and do not resolve the full seasonal cycle. For the station in 365 the Amazon Plume at 4° N 50° W (Fig. 3g), where the effects of the plume extend northwest towards the Caribbean (Coles et al., 2013;Varona et al., 2019),  indicated that this region acts as a sink for CO2 (pCO2 (sw) < pCO2 (atm)), especially between May to July, coincident with maximum discharge from the Amazon River (Dai and Trenberth, 2002). Valerio et al. (2021) indicated pCO2 (sw) varied at and below pCO2 (atm) at 4° N 50° W consistent with the SA-FNNNCP. The interannual variability of pCO2 (sw) has been shown to be high in this region in all months . The SA-370 FNNNCP provided a better representation of the seasonal and interannual variability induced by the Amazon River discharge and associated plume at these two stations compared to the SA-FNNNO-BIO, although differences were small at 8° N 38° W.
The station in the Eastern Tropical Atlantic at 6° S 10° W (Fig. 3b), is under the influence of the equatorial upwelling (Lefèvre, Guillot, Beaumont, & Danguy, 2008), which is associated with the upwelling of CO2 rich waters between June and September. Lefèvre et al. (2008) indicated that peak pCO2 (sw) of ~440 μatm was observed in September, and remained stable 375 until December, before decreasing to a minima of ~360 μatm in May (Parard et al., 2010).  showed however, that the influence of the equatorial upwelling does not reach the buoy in all years, and in some years lower pCO2 (sw) is observed. The PIRATA buoy observations (Fig. 3b) clearly show this seasonality but also highlight the interannual variability in in situ pCO2 (sw). Further north at 4° N 10° W (Fig. 3f), Koffi et al. (2010) suggested that this region follows a similar seasonal cycle as the station at 6° S 10° W, but that pCO2 (sw) is ~30 μatm lower (Koffi et al., 2016). The interannual 380 variability in SA-FNNNCP pCO2 (sw) clearly shows the influence of the equatorial upwelling at these stations, with latitudinal gradients in pCO2 (sw) during the upwelling period , but struggles to identify elevated pCO2 (sw) between December to April shown by the PIRATA buoy observations (Fig. 3b). By contrast, the SA-FNNNO-BIO-1 indicated little influence from the equatorial upwelling and a depressed pCO2 (sw) during the upwelling season.
The two methods converge on the seasonal cycle at the remaining stations although significant offsets in the mean annual 385 pCO2 (sw) remain. The station at 35° S 18° W (Fig. 3c) has consistently been implied as a sink for CO2. Lencina-Avila et al. (2016) showed the region to have 340 μatm pCO2 (sw) and to be a sink for CO2 between October to December. Similarly, Kitidis et al. (2017) implied that the region is a sink for CO2 during March to April. The region has depressed pCO2 (sw) due to high biological activity that originates from the Patagonian shelf and the South Subtropical Convergence Zone. The station at 45° S 50° W (Fig. 3d), has also been implied as a strong, but highly variable sink, where pCO2 (sw) can be between 390 ~280 μatm and ~380 μatm during austral spring, and is constant at ~310 μatm during austral autumn (Kitidis et al., 2017).
The SA-FNNNCP and SA-FNNNO-BIO-1 methods reproduced the seasonal variability in the pCO2 (sw) at these two stations accurately, but only the SA-FNNNCP captures the magnitude of the depressed pCO2 (sw) at 45° S.
Within the southern Benguela upwelling system, pCO2 (sw) at station 33° S 17° E (Fig. 3e) is influenced by gradients in the seasonal upwelling (Hutchings et al., 2009).  showed that pCO2 (sw) varies from ~310 μatm in 395 July to ~340 μatm in December and that the region is a CO2 sink through the year.  suggested however, that this CO2 sink is highly variable during upwelling events, and that recently upwelled waters act as a source (pCO2 (sw) > pCO2 (atm)) of CO2 to the atmosphere (Gregor and Monteiro, 2013). Arnone et al. (2017) indicated elevated pCO2 (sw) during austral spring and autumn at the station, with a ~40 μatm seasonal cycle amplitude The SA-FNNNCP and SA-FNNNO-BIO-1 were able to reproduce the seasonal cycle, although the SA-FNNNCP correctly represented the seasonal 400 magnitude in pCO2 (sw) as reported by  and Arnone et al. (2017).
In summary, for these stations,, the SA-FNNNCP better represents the seasonality and the interannual variability of pCO2 (sw) in the South Atlantic Ocean compared to the SA-FNNNO-BIO-1, especially in the Equatorial Atlantic. The SA-FNNNO-BIO-2 also displayed significant differences to SA-FNNNCP, indicating that the variability in pCO2 (sw) has a strong biological contribution which is not fully represented and explained by the additional physical parameters included in the FNN. The 405 SA-FNNNO-BIO-2 and W2020 both displayed significant differences to the SA-FNNNCP at specific stations (Fig. 4).There are methodological differences between these approaches however. The SA-FNN method uses only in situ pCO2 (sw) observations from the South Atlantic Ocean to train the FNNs. The W2020 uses global in situ pCO2 (sw) observations to train FNNs for 16 provinces with similar seasonal cycles Watson et al., 2020b). The W2020 will therefore be weighted to pCO2 (sw) variability in regions of relatively abundant in situ observations (i.e. Northern 410 Hemisphere) and may not be fully representative of the South Atlantic Ocean. This would explain the SA-FNNNO-BIO-2 and W2020 differences, when driven using the same input variables.
Comparing the SA-FNNNCP and SA-FNNCHLA there were two significant differences (Fig. 4). A difference in the seasonal cycle in the southern Benguela (Fig. 3e) was observed.  showed that the minima pCO2 (sw) in July and maxima in December, consistent with the SA-FNNNCP and SA-FNNNPP whereas the SA-FNNCHL estimated the 415 opposite scenario. Lamont et al. (2014) reported Chl a concentrations to remain consistent in May and October, but NPP rates were significantly higher in October, associated with increased surface PAR and enhanced upwelling. The disconnect between Chl a and NPP can also be observed in the satellite observations (Appendix C Fig. C1) limiting the ability of Chl a to estimate pCO2 (sw), which is highlighted by the failure of the SA-FNNCHLA to identify the seasonal pCO2 (sw) cycle.
A Chl a to NPP disconnect has also been reported in the Amazon Plume (Smith and Demaster, 1996), where Chl a 420 concentrations can be similar but NPP rates significantly different due to light limitation caused by suspended sediments. A significant offset between the SA-FNNNCP and SA-FNNCHLA was observed in this region ( Fig. 3g; Fig. 4).  reported pCO2 (sw) values ranging from 400 ± ~10 μatm in January to ~240 ± ~70 μatm in May. Although, the SA-FNNNCP January estimates are consistent, the May estimates are higher than these in situ measurements. These observations were made further north (6° N) where the turbidity within the plume has decreased sufficiently for irradiance to elevate NPP 425 rates (Smith and Demaster, 1996), which decrease pCO2 (sw). Chl a remains relatively consistent across the plume (not shown), suggesting a disconnect between Chl a and NPP at 4° N 50° W which would lead to lower pCO2 (sw) estimates by the SA-FNNCHLA, where NPP rates are low due to light limitation (Chen et al., 2012;Smith and Demaster, 1996). Respiration would be elevated from the decomposition of riverine organic material reducing NCP further (Cooley et al., 2007;Jiang et al., 2019;. It is noted that the Amazon Plume is a dynamic region with transient, localised biological and 430 pCO2 (sw) features (Cooley et al., 2007;Ibánhez et al., 2015;Valerio et al., 2021) that may be masked by the coarse resolution of estimates available using satellite data. The SA-FNNNCP however, agreed with in situ pCO2 (sw) observations at 4° N 50° W where pCO2 (sw) varied at or below pCO2 (atm) (Valerio et al., 2021).
Though the differences between the SA-FNNNCP and SA-FNNCHLA may appear small, the Amazon Plume and Benguela Upwelling have a higher intensity in the CO2 flux per unit area compared to the open ocean, illustrating a disproportionate 435 contribution to the overall global CO2 sink than their small areal coverage implies (Laruelle et al., 2014). The differences in the pCO2 (sw) estimates result in a 22 Tg C yr -1 alteration in the annual CO2 flux for the South Atlantic Ocean (SA-FNNNCP = +14 Tg C yr -1 ; SA-FNNCHLA = -9 Tg C yr -1 ; Fig. 5f). This unequivocally reinforces the use of NCP to improve basin scale estimates of pCO2 (sw), especially in regions where Chl a, NPP and NCP become disconnected.
Recent assessments of the strength of the global oceanic CO2 sink have been made using pCO2 (sw) fields estimated using no 440 biological parameters as input (Watson et al., 2020b). Our results indicate that the SA-FNNNCP more accurately represented the pCO2 (sw) variability in the South Atlantic Ocean compared to the SA-FNNNO-BIO-2, which included additional physical parameters. Estimating the South Atlantic Ocean net CO2 flux with the SA-FNNNCP pCO2 (sw) produced a 14 Tg C yr -1 source compared to a 10 Tg C yr -1 sink indicated by the SA-FNNNO-BIO-2 (Fig. 5f). The incremental inclusion of parameters to account for the biological signal starting with Chl a (-9 Tg C yr -1 ) then NPP (-7 Tg C yr -1 ) then NCP (+14 Tg C yr -1 ) 445 switched the South Atlantic Ocean from a CO2 sink to a source, which is driven by differences in the pCO2 (sw) estimates in regions that are biologically controlled. This 21 Tg C yr -1 difference between the SA-FNNNCP and SA-FNNNPP is due to additional outgassing in the Equatorial Atlantic provinces of the WTRA and ETRA (Fig 1a; Fig.5f). Compared to the in situ pCO2 (sw) observations at the Equatorial stations (Fig. 3a, b), it is likely that the outgassing is still underestimated by the SA-FNNNCP but does improve these estimates within the upwelling season (June -September). 450
The W2020 identified the South Atlantic Ocean as a source for CO2 of 15 Tg C yr -1 , which is consistent with the SA-FNNNCP (Fig. 5f). The SA-FNNNCP however, indicated the Equatorial Atlantic (10° N to 20° S) as a 20 Tg C yr -1 stronger source and south of 20° S (20° S to 44° S) as a 20 Tg C yr -1 stronger sink. These differences indicate that biologically induced 460 variability in pCO2 (sw) would not be captured by the W2020 and could reduce the variability in the global ocean CO2 sink.. A further SA-FNN trained with pCO2 (atm), SST, salinity, mixed layer depth and NCP indicated a similar CO2 source of 12 Tg C yr -1 (data not shown) as the SA-FNNNCP for the South Atlantic Ocean, highlighting that additional physical parameters cannot fully account for the biological contribution to the variability in pCO2 (sw). This further confirms the importance of using NCP within estimates of the global ocean CO2 sink. 465

Conclusions
In this paper, we compare neural network models of pCO2 (sw) parameterised separately using either satellite Chl a, NPP or NCP as biological proxies. The results suggest that using NCP improved the estimation of pCO2 (sw). The differences between satellite Chl a, NPP or NCP were initially small, but the use of a perturbation analysis to assess the uncertainties in these parameters, showed that NCP has a greater potential uncertainty reduction of up to ~36 % of the RMSD, compared to a ~19 470 % for Chl a. These results were verified using in situ observations from the Atlantic Meridional Transect, which resulted in a 25 % improvement in pCO2 (sw) RMSD when the in situ NCP uncertainties were reduced, compared to 7 % for Chl a and 13 % for NPP.
Monthly climatological estimates of pCO2 (sw) at 8 stations in the South Atlantic Ocean, calculated using satellite NCP were compared with the NPP and the Chl a approaches and two that do not use biological parameters. The NCP approach 475 significantly improved on both approaches with no biological parameters at 4 stations in reconstructing the seasonal and interannual variability, compared to in situ pCO2 (sw) observations. At the remaining 4 stations, differences were also observed although these were not statistically significant. In the eastern Equatorial Atlantic, in the upwelling region, a significant difference between the NCP and NPP approaches. Significant differences between the NCP and Chl a approaches were also observed in the Benguela upwelling and Amazon Plume, where pCO2 (sw) from Chl a suggested that photosynthetic 480 rates were not solely controlled by Chl a. Using NCP to estimate pCO2 (sw) the South Atlantic Ocean was characterised as a net source of CO2, whereas methods that only include physical controls have indicated the region to be a small sink for CO2.
Sequentially using Chl a to estimate pCO2 (sw), then NPP incrementally reduced the South Atlantic CO2 sink and finally using NCP the area switched to being a source of CO2. These results indicate that in regions where biological activity is important in controlling the variability in pCO2 (sw), the use of NCP, which is available from satellite data, is important for quantifying 485 the ocean carbon pump, and for providing data in areas that are sparsely covered by observations such as the Southern Ocean.

Appendices
Appendix A -Feed forward neural network training and perturbation analysis 490

Appendix B -Climatology comparison
A monthly climatology was generated from the SA-FNNNCP monthly timeseries (Fig. B1), referenced to the year 2010, assuming an atmospheric CO2 increase of 1.5 μatm yr -1 (Takahashi et al., 2009;Zeng et al., 2014). The standard deviation of the monthly climatology was computed, as an indication of the interannual variations in the climatology. The ability of the SA-FNNNCP to estimate the spatial distribution of pCO2 (sw) was compared to two methods. 535 Firstly, the SA-FNNNCP climatology was compared to the climatology from Woolf et al. (2019), produced following the statistical 'ordinary block kriging' approach described in Goddijn-Murphy et al. (2015), using the SOCATv4 reanalysed data. The method provides an interpolation uncertainty where in regions of sparse data this becomes larger. Fig. B2 shows the methods produce similar climatological pCO2 (sw) values for the South Atlantic Ocean, with some clear differences along the African coastline, and equatorial region. 540 Secondly, the SA-FNNNCP was compared to a climatology calculated from the 'standard method', a Self Organising Map Feed Forward Neural Network presented in Watson et al. (2020b;W2020). Fig. B3 shows the methods produce similar climatological pCO2 (sw) values for the South Atlantic Ocean, however, clear differences in the Equatorial region occur across all months. In the central South Atlantic Ocean, artefacts form the self organising map can be seen during January and February. 545   (Fig. 1a). Chl a and NPP scale on the left axis, and NCP on the right. Note the different axis limits on each plot.

Data Availability 565
Moderate Resolution Imaging Spectroradiometer on Aqua (MODIS-A) estimates of chlorophyll-a, photosynthetically active radiation and sea surface temperature are available from the National Aeronautics Space Administration (NASA) ocean colour website (https://oceancolor.gsfc.nasa.gov/). Modelled sea surface salinity and mixed layer depth from the Copernicus Marine Environment Modelling Service global ocean physics reanalysis product (GLORYS12V1) are available from https://resources.marine.copernicus.eu/. ERA5 monthly reanalysis wind speeds are available from the Copernicus Climate 570 Data Store (https://cds.climate.copernicus.eu/) pCO2 (atm) data are available from v5.5 of the global estimates of pCO2 (sw) dataset (Landschützer et al., 2016(Landschützer et al., , 2017. In situ observations of fCO2 (sw) from v2020 of the Surface Ocean Carbon Atlas (SOCAT) are available from https://www.socat.info/index.php/data-access/. In situ Atlantic Meridional Transect data can be obtained from the British Oceanographic Data Centre (https://www.bodc.ac.uk/). pCO2 (sw) estimates from the W2020 are available from Watson et al. (2020a). pCO2 (sw) estimates generated by the SA-FNNNCP, SA-FNNNPP, SA-FNNCHLA and SA-575