Reconstruction of global surface ocean p CO 2 using 1 region-specific predictors based on a stepwise FFNN 2 regression algorithm

: Various machine learning methods were attempted in the global mapping of 14 surface ocean partial pressure of CO 2 ( p CO 2 ) to reduce the uncertainty of global ocean 15 CO 2 sink estimate due to undersampling of p CO 2 . In previous researches the predictors 16 of p CO 2 were usually selected empirically based on theoretic drivers of surface ocean 17 p CO 2 and same combination of predictors were applied in all areas unless lack of 18 coverage. However, the differences between the drivers of surface ocean p CO 2 in 19 different regions were not considered. In this work, we combined the stepwise 20 regression algorithm and a Feed Forward Neural Network (FFNN) to selected 21 predictors of p CO 2 based on mean absolute error in each of the 11 biogeochemical 22 provinces defined by Self-Organizing Map (SOM) method. Based on the predictors 23 selected, a monthly global 1° ×


Introduction
As a net sink for atmospheric CO2, global oceans have been thought to have removed about one third of anthropogenic CO2 since the beginning of the industrial revolution (Sabine et al., 2004;Friedlingstein et al., 2019).However, , due to large uncertainty in estimates of surface ocean partial pressure of CO2 (pCO2), the long-term average global ocean sea-air CO2 flux during 2001-2015 estimated based on sea-air pCO2 difference differ from -1.55 to -1.74 PgC yr -1 , and the maximum difference between global sea-air CO2 flux in individual years reached nearly 0.6 PgC yr -1 (Rödenbeck et al., 2014;Iida et al., 2015;Landschützer et al., 2014;Denvil-Sommer et al., 2019).The magnitude and direction of the flux is largely set by the air-sea pCO2 difference.Greater pCO2 of surface water than that of overlying air indicating that CO2 released from oceans to the air, and absorption of CO2 by oceans happened when the pCO2 of surface water was lower than that of air.The ocean in these two scenarios is known as oceanic carbon source and oceanic carbon sink respectively.Sparse and uneven observations of surface ocean pCO2 in time and space severely limited the understanding of interannual variability of oceanic carbon sink, and researches based on different methods were carried out to break this barrier.In earlier studies, traditional unitary and multiple regression methods between surface ocean pCO2 and its drivers was attempted in the mapping of surface ocean pCO2, which were limited in specific regions and sometimes even in specific seasons with a relatively high root mean square error (RMSE) (Sarma et al., 2006;Takahashi et al., 2006;Shadwick et al., 2010;Chen et al., 2011;Marrec et al., 2015).Recent researches on artificial neural networks and other machine learning algorithms, such as feed-forward neural network (FFNN) method (Zeng et al., 2014;Zeng et al., 2015;Moussa et al., 2016;Denvil-Sommer et al., 2019) and self-organization mapping (SOM) method (Friedrich and Oschlies, 2009;Telszewski et al., 2009;Hales et al., 2012;Nakaoka et al., 2013), significantly reduced the bias in the interpolation based on relationships between surface ocean pCO2 and its drivers.In addition, methods such as finding better predictors or combining SOM and other neural networks were also attempt to further decrease the pCO2 predicting error (Hales et al., 2012;Nakaoka et al., 2013;Landschützer et al., 2014;Chen et al., 2019;Denvil-Sommer et al., 2019;Zhong et al., 2020;Wang et al., 2021).However, the selection of predictors in the surface ocean pCO2 mapping was more empirical, focusing on the theoretical drivers of the pCO2 and its variation.Sea surface temperature and salinity, related to the solubility of CO2 in seawater, were considered as the most important and used in almost all related studies (Landschützer et al., 2013;Nakaoka et al., 2013;Moussa et al., 2016;Laruelle et al., 2017;Zeng et al., 2017;Denvil-Sommer et al., 2019).Similarly, the chlorophyll-a concentration is also widely used (Nakaoka et al., 2013;Landschützer et al., 2014;Laruelle et al., 2017;Zeng et al., 2017;Denvil-Sommer et al., 2019), which is related to the phytoplankton uptake of CO2.One more indicator, mixed layer depth, appeared frequently in related studies as a proxy related to the vertical transport of dissolved carbon (Telszewski et al., 2009;Nakaoka et al., 2013;Landschützer et al., 2014;Zeng et al., 2017;Denvil-Sommer et al., 2019).Besides, the sampling information have been also used as indicators, including latitude and longitude (Friedrich and Oschlies, 2009;Jo et al., 2012;Zeng et al., 2015;Zeng et al., 2017;Denvil-Sommer et al., 2019), and sampling time (Friedrich and Oschlies, 2009;Zeng et al., 2015).In recent researches, dry air mixing ratio of atmospheric CO2 (xCO2), related to the CO2 level in air, was also used as a predictor of surface ocean pCO2 (Landschützer et al., 2014;Denvil-Sommer et al., 2019).The sea surface height, which was considered effective in improving the spatial pattern and the accuracy of surface ocean pCO2 mapping at the basin and regional scale, and the monthly anomalies of the most widely used parameters mentioned above were used by the Denvil-Sommer et al (2019).In the research focused on the surface ocean pCO2 mapping of coastal areas, the bathymetry, sea ice and wind speed were also used as indicators (Laruelle et al., 2017).In each of these researches, same combination of indicators was applied in all areas of the global ocean, although the global ocean was divided into several biogeochemical provinces in some of the researches.However, the indicator that plays an important role in the surface ocean pCO2 reconstruction at one region may be not a good predictor of surface ocean pCO2 in other regions, due to complex and variable drivers in different regions.But no widely recognized methods for judging the importance of each predictor in the surface ocean pCO2 mapping are available yet.Thus, we attempted to construct a stepwise FFNN algorithm to rank the importance of predictors and figure out the optimal combination in each biogeochemical province defined by SOM, for decreasing the predication errors in the surface ocean pCO2 mapping.

Data
The surface ocean fugacity of CO2 (fCO2) observation data from the Surface Ocean CO2 Atlas fCO2 dataset version 2020 (SOCATv2020) (Bakker et al., 2016) was used to construct the non-liner relationship between surface ocean pCO2 and predictors.The conversion between fCO2 and pCO2 was following the formula (Körtzinger, 1999) where fCO2 and pCO2 are in micro-atmospheres (µatm), P is the total atmospheric surface pressure (Pa) using the National Centers for Environmental Prediction (NCEP) monthly mean sea level pressure product (Dee et al., 2011), and T is the absolute temperature (K).R is the gas constant (8.314J K -1 mol -1 ).Parameters B (m 3 mol −1 ) and δ (m 3 mol −1 ) are both viral coefficients (Weiss, 1974).
In this work, total 33 indicators were used (Table S1).Where 21 indicators were chosen from previous researches of surface ocean pCO2 reconstruction based on machine learning methods, including sea surface temperature (SST) and sea surface salinity (SSS) using the 1°×1° gridded product (Cheng et al., 2016;Cheng et al., 2017;Cheng et al., 2020) at http://www.ocean.iap.ac.cn/ and the anomalies (SSTanom and SSSanom), chlorophyll-a concentration (CHL-a) and the anomaly (CHL-a anom) using satellite derived monthly product in 9 km resolution (NASA Goddard Space Flight Center, Ocean Ecology Laboratory, Ocean Biology Processing Group, 2018), mixed layer depth (MLD) and sea surface height (SSH) and the anomalies (MLDanom and SSHanom) using the ECCO2 cube92 daily product (Menemenlis et al., 2008), dry air mixing ratio of atmospheric CO2 (xCO2) and the anomaly (xCO2 anom) from the GLOBAL VIEW marine boundary layer product (GLOBALVIEW-CO2, 2011), sea ice area fraction using the monthly product from ECMWF ERA Interim (Dee et al., 2011), 10 meters wind speed using the monthly product from ECMWF ERA Interim (Dee et al., 2011), bathymetry from ETOPO2 (Commerce et al., 2006) , year and month (represented by 1-12), the total number of months since January 1992 (Nmon), the sine of latitude and the sine and cosine of longitude (sLat, sLon and cLon).In addition, 12 parameters which were only used in similar previous research focused on other parameters (Broullón et al., 2019;Broullón et al., 2020), or were possibly related to the driver of surface ocean pCO2 and its variability, were selected to be tested.These parameters included nitrate, phosphate, silicate and dissolved oxygen (DO) using the monthly climatology product from WOA18 (Garcia et al., 2019a, b), sea level pressure (SLP) and surface pressure from the ECMWF ERA Interim (Dee et al., 2011), W velocity of ocean currents (Wvel) at 5, 65, 105 and 195 m depth using the ECCO2 cube92 3-day product (Menemenlis et al., 2008), the Oceanic Nino Index (ONI) (Huang et al., 2017), the Southern Hemisphere Annular Mode Index (SAM) (Marshall, G. J., 2003).Most of these products were retrieved at 1° × 1° resolution.Some products retrieved at higher resolution were downscaled to 1° × 1° resolution.

Biogeochemical provinces defined by the Self-Organizing Map
For applying different combination of indicators in regions based on the differences in the dominated drivers of pCO2 and its variability, the global ocean was divided into a set of biogeochemical provinces using a Self-Organizing Map (SOM) method.biogeochemical provinces.In this study the coastal area was not involved and the boundary was defined as 200m depth.In addition, the pCO2 mapping based on SOM defined provinces tend to be less smooth near the border of different biogeochemical provinces, with obvious border line appearing.However, applying of different predictors may make this problem worse.To obtain a smoother distribution, we defined that the area within 5 1x1 grids of province boundaries as a 'boundary area'.Samples in the boundary area will be used as training samples in all adjacent provinces (Fig. S1).
But this definition does not change the actual spatial coverage of each province, only brings more training samples near the province boundary.

Stepwise FFNN algorithm
For finding better combination of pCO2 predictors, a stepwise Feed-forward neural networks (FFNN) algorithm was constructed.The FFNN is composed of four main parts, which are namely input, hidden, summation and output layer (Fig. 1).The input layer is designed to pass the inputs to the hidden layer and the number of neurons is equal to the dimensions of the input matrix p.The hidden layer includes 25 neurons in the FFNN model, with the tan-sigmoid function as the transfer function.The input p is multiplied by a matrix of weights (w1 in Fig. 1) and the inner product between the result and a bias matrix (b1 in Fig. 1) is calculated as the input of the transfer function in the first hidden layer.In the summation layer, the transfer function f2 is a pure linear function.The output of the hidden layer is multiplied by another matrix of weights and summed.All bias and weights matrixes were randomly assigned in the beginning of FFNN training.Here we set one constant random number stream in the MATLAB, thus the way that the bias and weights matrixes randomly assigned were steady, avoiding the appearance of inconsistent results when algorithm repeats.Loop 2 finished in the similar condition that no significant decrease can be found no matter which indicator was removed in the next two steps (Determine step 2 in the Fig. 2).

pCO2 product
Dataset of parameters except CHL-a start since 1992 or earlier, while CHL-a data ranges from August 2002 to present.In each one of the provinces, the stepwise FFNN algorithm was run out once first based on all samples covered by CHL-a data, then the algorithm was run out secondly based on samples and all indicators except CHL-a and CHL-a anom in the year that CHL-a gridded data was not available.The pCO2 mapping in the year that CHL-a gridded data was not available was carried out based on the predictors selected in the second run.Then the final product was built based on two FFNNs, one trained for the period from August 2002 to August 2019 using one predictor set including CHL-a or CHL-a anom, and the second one for the period from January 1992 to July 2002 using the second predictor set without CHL-a and CHL-a anom.
Although the performance may improve with the number of neurons increasing, the influence of number of neurons on the performance of FFNN pCO2 prediction remains unclear.To further decrease the predicating error between FFNN outputs and SOCAT measurements, the number of neurons was improved by an error test in each province.
The number of neurons increased from 5 to 300 and the corresponding MAE values of each size were record, and then the number of neurons with lowest MAE was applied.
This test avoided the appearance of insufficient learning capacity for complex nonlinear relationship due to too few neurons and overfitting problem due to too many neurons.
Finally, based on the indicators selected by the stepwise FFNN algorithm and improved FFNN size, a monthly global 1°×1° surface ocean pCO2 product from January1992 to August 2019 was constructed.

Validation
To better estimate the predicating error of FFNN, the MAE and additionally the RMSE which was widely used in previous researches, were calculated using a K-fold cross validation method.To avoid overfitting caused by a lack of independence between the training samples and testing samples, the SOCAT samples were put in chronological order and then divided into group of years (Table 1) (Gregor et al., 2019).In this paper, the value of K was set as 4. Thus, among every 4 neighboring years, three group samples Table 1.The procedure of K-fold validation.(The K value was set as 4, so iterations repeated four times until all samples have been set as testing samples once.In each iteration, samples in 7 years were set as testing samples (green cells) and in the rest 21 years as training samples (white cells) to increase the independency.)3 Results and discussion

Biogeochemical provinces and corresponding predictors of pCO2
11 biogeochemical provinces generated from the SOM method after the separated small 'island' was removed and the province separated by lands was divided manually (Fig. 3).The results of the stepwise FFNN algorithm in each province were shown in the Table 3.The indicators were listed in the order that the stepwise FFNN algorithm printed recommended predictors out.The indicator printed earlier was relatively more recommended and played an important role in the prediction of pCO2 based on FFNN.
Applying of these indicators as the predictors of surface ocean pCO2 effectively decreased the predicating error between the FFNN outputs and pCO2 values from validation samples, thus it is reasonable to consider that these indicators were highly 1992 1993 1994 1995 1996 1997 1998 1999 … 2012 2013 2014 2015 2016 2017 2018 2019   FFNN training FFNN testing   1992 1993 1994 1995 1996 1997 1998 1999 … 2012 2013 2014 2015 2016 2017 2018 2019   1992 1993 1994 1995 1996 1997 1998 1999 … 2012 2013 2014 2015 2016 2017 2018 2019   1992 1993 1994 1995 1996 1997 1998 1999 … 2012 2013 2014 2015 2016 2017 2018 2019   1 st iteration 2 nd 3 rd 4 th iteration related to the drivers of pCO2 and its variability.Indicators representing sampling position were also listed as recommended predictors in some provinces, including latitude, longitude and sampling time, suggesting that relatively steady spatial or temporal variability pattern of surface ocean pCO2 existed in these biogeochemical provinces.For example, month was considered as a recommended predictor in most provinces.Especially in the province P4 subpolar Atlantic and P5 north subtropical Atlantic, the parameter month was relatively more recommended.While pCO2 in these areas regularly peaked and bottomed out in summer and winter (Takahashi et al., 2009;Landschützer et al., 2016;Landschützer et al., 2020).Similarly, latitude and the sine and cosine of longitude were listed as recommended predictors of pCO2 in most provinces, suggesting an obvious spatial distribution pattern of pCO2, which was not learned sufficiently by the FFNN model from existing indicators and the indicators related to spatial position were applied as supplementary.
Figure 3.The map of biogeochemical provinces As basic parameters highly related to the ocean environment, the temperature and salinity was considered as parts of the most important predictors of surface ocean pCO2, and was applied in the pCO2 prediction in almost all previous relating researches based on various method (Jo et al., 2012;Signorini et al., 2013;Landschützer et al., 2014;Marrec et al., 2015;Chen et al., 2016;Moussa et al., 2016;Chen et al., 2017;Laruelle et al., 2017;Zeng et al., 2017;Chen et al., 2019;Denvil-Sommer et al., 2019).The results of stepwise FFNN algorithm also supported this.Temperature was listed as a recommended predictor in all biogeochemical provinces, suggesting that temperature was the one of the most important drivers of pCO2 and its variability in these provinces.
Similarly, the result of stepwise FFNN algorithm provides evidence for the importance of salinity in the predication of pCO2, which was also listed as a predictor in most provinces.The dry air mixing ratio of atmospheric CO2 (xCO2) and the monthly anomaly of xCO2 were also recommended predictors in most of the biogeochemical provinces, suggesting that the exchange of CO2 across the sea-air interface was also an important driver of surface ocean pCO2.As a widely used predictor in the pCO2 prediction, the chlorophyll-a concentration (CHL-a) played an important role in fitting the influence of biological activities on pCO2 in previous researches (Landschützer et al., 2014;Zeng et al., 2017;Laruelle et al., 2017;Denvil-Sommer et al., 2019).
Especially in the province P10 subpolar Southern Ocean and P11Southern Ocean ice, the CHL-a was listed as the most recommended predictor in the result of stepwise FFNN algorithm.While in some other provinces (P1 Arctic Ocean and P5 north subtropical Atlantic), the CHL-a were considered redundant that no effective decrease of MAE between FFNN outputs and pCO2 measurements appeared when CHL-a data was used.Similar with the period that CHL-a was not available (represented by the subscript 'b'), the phosphate, nitrate, silicate or dissolved oxygen were recommended instead.In the province P1 Arctic Ocean, the silicate concentration and temperature were considered as the most crucial predictor of pCO2.In the province that CHL-a or CHLa anom were selected as predictors, the pCO2 data was divided into two periods.The period that CHL-a data available was represented by the subscript 'a', such as P2a, including global grids from 2002 to 2019 except polar grids in winter.The period that CHL-a data not available was represented by the subscript 'b', such as P2b, including global grids from 1992 to 2001 and additionally some polar grids in winter from 1992 to 2019.

pCO2 product
Based on the predictors given by the stepwise FFNN algorithm in each biogeochemical province, a FFNN size (representing the number of neurons in the hidden layer) improving validation was applied to further decrease the predication error.
The MAE values based on same samples and FFNN model with different number of neurons were calculated, then the number of neurons corresponding to the lowest MAE were applied (Fig. 4a).The MAE in most provinces tend to decrease first and then increase when the number of neurons in the hidden layer of FFNN model increased from 5 to 300.Based on the variation of MAE with the number of neurons in the FFNN hidden layer, the optimal FFNN size in each province was considered as the number of neurons when the MAE was lowest.The result and corresponding MAE were shown in  Then the RMSE and mean residuals in each grid were calculated based on the Kfold cross validation method.In most grids, the RMSE was lower than 10 μatm and the mean residuals was close to zero (Fig. 5).However, the prediction error in the north subpolar Pacific, the east equatorial Pacific and the Southern Ocean near the Antarctic continent was obviously higher than other areas.Also, distribution of mean residuals suggested that surface ocean pCO2 in the Indian Ocean tend to be overestimated by the FFNN models.While in other regions the distribution of mean residuals was more discrete and no obvious pattern was found.outputs were quite close to the pCO2 values from SOCAT v2020 samples (Fig. 6).
Comparing the results based on different combination of predictors, the results of  in some provinces were relatively small.The reason for higher MAE and RMSE showed by the FFNN2 may be the application of latitudes and longitudes as predictors in both the FFNN1 and FFNN3 but not in the FFNN2.In the province P10 subpolar Southern Ocean, latitudes and longitudes were considered not good predictors by the stepwise FFNN algorithm and the results of three validation groups were extremely close.

Validation based on independent observations
The FFNN outputs based on different combination of predictors were compared with independent observations from the Ocean Time-series (HOT) (Dore et al., 2009), Bermuda Atlantic Time-series Study (BATS) (Bates, 2007) and The European Station for Time Series in the Ocean Canary Islands (ESTOC) (González-Dávila and Santana-Casiano, 2009) (Fig. 7).Compared with the independent observations from the HOT station, the three validation groups both show close results, which were also similar with each other in the seasonal and interannual variability of pCO2.From 1992 to 2019, the RMSE between FFNN1 outputs and HOT observations was only 9.29 μatm, lower than the 10.85 μatm of FFNN2 and the 10.70 μatm of FFNN3.The monthly mean pCO2 of FFNN2 during winter was lower than the HOT observations and pCO2 values of other validation groups, while the FFNN1 and FFNN3 outputs were closer to the HOT observations.MAE between predicted pCO2 and HOT observations were also lower in the validation group FFNN1, which was only 7.17 μatm, compared to the 8.61 μatm of FFNN2 and the 8.44 μatm of FFNN3.Higher bias generated in the winter bottom and summer peak, which was showed more obviously in the monthly average of pCO2 (Fig. 7b).Compared with other validation groups, the result of FFNN1 was closer to the monthly average values of the HOT observations.Same conclusion can be obtained in the ESTOC and BATS station located in the province P5 north subtropical Atlantic.The RMSE between FFNN1 outputs and independent observations were 13.03 μatm in the BATS station and 11.35 μatm in the ESTOC station, lower than that of other validation groups.The RMSE between FFNN2 outputs and independent observations was 16.15 μatm in the BATS station and 14.51 μatm in the ESTOC station.For the group FFNN3, the RMSE was 13.09 μatm in the BATS station and 13.01 μatm in the ESTOC station.
All results were extremely close to the independent observations, but the RMSE and MAE of FFNN1 were lower.Similar with the situation in the HOT station, the FFNN1 was most close and the FFNN3 second.Based on the better performance of FFNN1, in which the predictors selected by stepwise FFNN algorithm were used, we may conclude that the stepwise FFNN algorithm can effectively find better combination of predictors to fit the diver of surface ocean pCO2 and obtained lower error.

Climatological spatial distribution
The climatological average distribution of pCO2 suggested a significant spatial variability (Fig. 8), which is consistent with the average distribution of SOCAT observations.In the Pacific Ocean, the high pCO2 areas showed by the stepwise-FFNN product (Fig. 8b), including the equatorial areas, east temperate areas and north subpolar areas, were highly consistent with the SOCAT datasets (Fig. 8a).Similarly, the distribution of pCO2 in the Atlantic Ocean was also close.However, the stepwise-FFNN
the inputs pool and the rest 32 indicators waiting to be tested.Then 32 MAE values of predicting pCO2 by each one of the rest indicators in the indicators pool with the addition of all indicators in the inputs pool were calculated out.If the MAE in the lowest situation, represented by the MAEi, decreased compared to the E0, the i th indicator was considered as a good indicator and was moved from the indicators pool to the inputs pool as well.Then the value of E0 was replaced by the MAEi (Selection step in the Fig.2).The part 1, including loop 1, Selection step and Determine step in the Fig.2, was repeated that the good indicators were selected out in one-by-one step and moved to the inputs pool in the way that the E0 decreases in the fastest way, until no indicator was left in the indicators pool or no decrease can be found no matter which indicator was added in the next two steps (Determine step 1 in the Fig.2).At this time the part 1 of stepwise FFNN algorithm finished, and all indicators left in the indicators pool were considered redundant.The loop K-fold validation in the second part run out in a opposite way that the MAE was calculated with the indicators were removed from the inputs pool one by one in the way that the E0 decreases the fastest (Loop 2 in Fig.2).The second part was aimed to remove the indicator that can be represented by other indicators in the inputs pool (Removal step in the Fig.2), and were used for training FFNN model and the rest one was used for testing.Total 4 iterations were carried out, where testing year changed in each iteration.After 4 iterations finished, all samples have been used for testing only once, and the MAE and RMSE between FFNN output and the testing samples was calculated.The performance of the predictor selection algorithm was estimated by comparing the MAE and RMSE result of the FFNN based on stepwise selected indicators with the result based on indicators used in previous researches in each biogeochemical province (Table2).All validation groups were applied with same FFNN and same samples from SOCAT, with the only differences in predictors.Same K-fold validation procedure was applied for three validation groups based on different pCO2 predictors.Thus, three results were generated to estimate whether the stepwise FFNN algorithm can effectively find better combination of pCO2 predictors.Finally the pCO2 data generated in all validation groups were further compared with the completely independent observations from the Hawaii Ocean Time-series (HOT, 22° 45'N, 158° 00'W, since October 1988)(Dore et al., 2009), Bermuda Atlantic Time-series Study (BATS, 31°50'N, 64°10'W, since October 1988)(Bates, 2007) and The European Station for Time Series in the Ocean Canary Islands (ESTOC, 29°10′N, 15°30′W, from 1995 to 2009) (González-Dávila and Santana-Casiano, 2009) time series station.These observations were not included in the SOCAT dataset.

SSHanom*:
Due to insufficient coverage of CHL-a data in the polar areas and during the period before 2002.

Fig. 4b .
Fig.4b.The MAE and RMSE of global estimates between predicted pCO2 and measurements from SOCAT v2020 further decreased to 11.32 and 17.99 μatm respectively after applying optimal FFNN size in each province.

Figure 4 .
Figure 4. MAE of different FFNN size in each biogeochemical province.a): MAE between predicted pCO2 and SOCAT observations were calculated using same samples and FFNN with different number of neurons.b): the optimal FFNN size was referring to the number of neurons when MAE is lowest.

Figure 5 .
Figure 5. Global maps of (a) RMSE and (b) mean residuals between predicted pCO2 and SOCAT observations

FFNN1(
based on stepwise FFNN algorithm，this paper) and FFNN3 (based on 15 predictors from Denvil-Sommer, et al. 2019) were more precise than that of FFNN2 (based on 10 predictors from Landschützer, et al. 2014).Where the plots in the result of FFNN1 was most concentrated along the y=x line, suggesting extremely close FFNN outputs with the measured pCO2 values from SOCAT, with the RMSE of 17.99 μatm in the global open oceans.The RMSE of FFNN1 was lower than that of FFNN2 (22.95 μatm) and FFNN3 (19.17 μatm).

Figure 7 .
Figure 7. Validation based on independent observation from time series stations.a) and b): the Hawaii Ocean Time-series (HOT) (Dore et al., 2009); c) and d): the Bermuda Atlantic Time-series Study (BATS) (Bates, 2007); e) and f): the European Station for Time Series in the Ocean Canary Islands (ESTOC) (González-Dávila and Santana-Casiano, 2009) time series station.FFNN1 was based on predictors selected by the stepwise-FFNN algorithm.FFNN2 and FFNN3 were based on predictors from Landschützer et al., 2014 and Denvil-Sommer et al., 2019 respectively.SOCATv2020 represents the monthly mean pCO2 of SOCAT observations in the corresponding grids of each time series station.

Table 4 .
Performance of the pCO2 prediction based on different predictors