Global high resolution monthly pCO 2 climatology for the coastal ocean derived from neural network interpolation

Abstract. In spite of the recent strong increase in the number of measurements of the partial pressure of CO2 in the surface ocean (pCO2), the air–sea CO2 balance of the continental shelf seas remains poorly quantified. This is a consequence of these regions remaining strongly under-sampled in both time and space and of surface pCO2 exhibiting much higher temporal and spatial variability in these regions compared to the open ocean. Here, we use a modified version of a two-step artificial neural network method (SOM-FFN; Landschutzer et al., 2013) to interpolate the pCO2 data along the continental margins with a spatial resolution of 0.25° and with monthly resolution from 1998 to 2015. The most important modifications compared to the original SOM-FFN method are (i) the much higher spatial resolution and (ii) the inclusion of sea ice and wind speed as predictors of pCO2. The SOM-FFN is first trained with pCO2 measurements extracted from the SOCATv4 database. Then, the validity of our interpolation, in both space and time, is assessed by comparing the generated pCO2 field with independent data extracted from the LDVEO2015 database. The new coastal pCO2 product confirms a previously suggested general meridional trend of the annual mean pCO2 in all the continental shelves with high values in the tropics and dropping to values beneath those of the atmosphere at higher latitudes. The monthly resolution of our data product permits us to reveal significant differences in the seasonality of pCO2 across the ocean basins. The shelves of the western and northern Pacific, as well as the shelves in the temperate northern Atlantic, display particularly pronounced seasonal variations in pCO2,  while the shelves in the southeastern Atlantic and in the southern Pacific reveal a much smaller seasonality. The calculation of temperature normalized pCO2 for several latitudes in different oceanic basins confirms that the seasonality in shelf pCO2 cannot solely be explained by temperature-induced changes in solubility but are also the result of seasonal changes in circulation, mixing and biological productivity. Our results also reveal that the amplitudes of both thermal and nonthermal seasonal variations in pCO2 are significantly larger at high latitudes. Finally, because this product's spatial extent includes parts of the open ocean as well, it can be readily merged with existing global open-ocean products to produce a true global perspective of the spatial and temporal variability of surface ocean pCO2.

Abstract. In spite of the recent strong increase in the number of measurements of the partial pressure of CO 2 in the surface ocean (pCO 2 ), the air-sea CO 2 balance of the continental shelf seas remains poorly quantified. This is a consequence of these regions remaining strongly under-sampled in both time and space and of surface pCO 2 exhibiting much higher temporal and spatial variability in these regions compared to the open ocean. Here, we use a modified version of a two-step artificial neural network method (SOM-FFN;  to interpolate the pCO 2 data along the continental margins with a spatial resolution of 0.25 • and with monthly resolution from 1998 to 2015. The most important modifications compared to the original SOM-FFN method are (i) the much higher spatial resolution and (ii) the inclusion of sea ice and wind speed as predictors of pCO 2 . The SOM-FFN is first trained with pCO 2 measurements extracted from the SO-CATv4 database. Then, the validity of our interpolation, in both space and time, is assessed by comparing the generated pCO 2 field with independent data extracted from the LD-VEO2015 database. The new coastal pCO 2 product confirms a previously suggested general meridional trend of the annual mean pCO 2 in all the continental shelves with high values in the tropics and dropping to values beneath those of the atmosphere at higher latitudes. The monthly resolution of our data product permits us to reveal significant differences in the seasonality of pCO 2 across the ocean basins. The shelves of the western and northern Pacific, as well as the shelves in the temperate northern Atlantic, display particularly pronounced seasonal variations in pCO 2, while the shelves in the south-eastern Atlantic and in the southern Pacific reveal a much smaller seasonality. The calculation of temperature normalized pCO 2 for several latitudes in different oceanic basins confirms that the seasonality in shelf pCO 2 cannot solely be explained by temperature-induced changes in solubility but are also the result of seasonal changes in circulation, mixing and biological productivity. Our results also reveal that the amplitudes of both thermal and nonthermal seasonal variations in pCO 2 are significantly larger at high latitudes. Finally, because this product's spatial extent includes parts of the open ocean as well, it can be readily merged with existing global open-ocean products to produce a true global perspective of the spatial and temporal variability of surface ocean pCO 2 .

Introduction
The quantitative contribution of the coastal ocean to the global oceanic uptake of atmospheric CO 2 is still being debated (Borges et al., 2005;Chen and Borges, 2009;Cai, 2011;Wanninkhof et al., 2013;Gruber, 2015), but several recent studies have suggested that the flux density, or uptake per unit area, is greater over continental shelf seas than over the open ocean Laruelle et al., 2014). Laruelle et al. (2014) used more than 3 × 10 6 pCO 2 measurements from the SOCATv2 database Bakker et al., 2016 to demonstrate very strong disparities in air-seawater CO 2 exchange at the regional scale and pro-G. G. Laruelle et al.: Global coastal pCO 2 maps nounced seasonal variations, especially at temperate latitudes. Furthermore, it was suggested that, despite the presence of a seasonally varying sea-ice cover, Arctic continental shelves are a regional hotspot of CO 2 uptake (Bates et al., 2006;Laruelle et al., 2014;Yasunaka et al., 2016). However, even with this much larger dataset compared to previous studies, large regions of the global coastal ocean remained either void of data or very poorly monitored in space and time, including the seasonal cycle. These data gaps not only limit our ability to reduce uncertainties in flux estimates and to unravel whether they differ from the adjacent open ocean but also hamper the identification and quantification of the many processes controlling the source-sink nature of the coastal ocean (Bauer et al., 2013). Laruelle et al. (2014) attempted to overcome this limitation by combining various upscaling methods depending on data density in different regions, e.g., resorted to using annual means, wherever the seasonal coverage was deemed to be insufficient. However, they could not overcome the limitation that the data alone are insufficient to assess whether there are any trends in coastal fluxes. This is a serious gap when considering that the influence of human activity on coastal system is increasing rapidly (Doney, 2010;Cai, 2011;Regnier et al., 2013;Gruber, 2015).
In the open ocean, novel statistical methods relying on artificial neural networks (ANNs) have permitted the generation of a series of high-resolution continuous monthly maps for ocean surface CO 2 partial pressures (pCO 2 ) (e.g., Landschützer et al., 2013;Sasse et al., 2013;Nakaoka et al., 2013;Zeng et al., 2014). Although differing in their details (see, e.g., Rödenbeck et al., 2015, for an overview), these products typically have a nominal spatial resolution of 1 • and monthly temporal resolution. By filling in the spatial and temporal gaps, these products greatly facilitate the calculation of the air-sea CO 2 exchange, as they do not require separate assumptions about the surface ocean pCO 2 in areas lacking data. Such methods are well suited to resolve spatial gradients, and they also permit us to determine seasonal and interannual variations and trends in pCO 2 (e.g., Landschützer et al., 2014Landschützer et al., , 2015Landschützer et al., , 2016Zeng et al., 2014). Because of the small relative contribution of the coastal ocean to the total oceanic surface area and the relatively coarse spatial resolution of the ANN-based surface ocean pCO 2 products so far, they are not well suited to resolve the high spatiotemporal variations of the surface ocean pCO 2 fields along the shelves.
Reproducing the complex seasonal dynamics of the CO 2 exchange at the air-water interface in the coastal ocean is of particular importance considering that they often have large intra-annual variability (Signorini et al., 2013). For instance, in temperate climates, it is common for continental shelf waters to turn from CO 2 sinks for the atmosphere during spring to CO 2 sources during summer (Shadwick et al., 2010;Cai, 2011;Laruelle et al., 2014Laruelle et al., , 2015. Shelf waters are also typically characterized by small-scale physical features such as coastal currents, river plumes and eddies inducing sharp bio- geochemical fronts (Liu et al., 2010) that markedly influence the spatial patterns of the pCO 2 fields (e.g., Turi et al., 2014).
To resolve the high spatial and temporal variability in airsea CO 2 exchange over the global shelf region, the two-step ANN method developed by Landschützer et al. (2013) is modified here for the specific conditions that prevail in these environments. Our calculations are performed at a much finer resolution of 0.25 • and new environmental drivers such as sea-ice cover are used to account for the potentially significant role of sea ice in the CO 2 exchange (Bates et al., 2006;Vancoppenolle et al., 2013;Parmentier et al., 2013;Moreau et al., 2016;Grimm et al., 2016). The definition of the coastal/open-oceanic boundary varies strongly from one study to the other (Walsh, 1988;Laruelle et al., 2013), with a potentially large impact on the shelf CO 2 budget (Laruelle et al., 2010). Here, we use a very wide definition for this boundary (i.e., 300 km width or 1000 m depth) to secure spatial continuity between our new shelf pCO 2 product and those already existing for the open ocean (Landschützer et al., , 2016Rödenbeck et al., 2015). Our approach leads to the first continuous and monthly resolved pCO 2 maps over the 1998-2015 period across the global shelf region, permitting us to study the seasonal dynamics of these regions in relationship to that of the adjacent open ocean.

Methods
The method used in this study is a modified version of the SOM-FFN method developed by Landschützer et al. (2013) to calculate monthly resolved pCO 2 maps of the Atlantic Ocean at a 1 • resolution and later applied to the entire global open ocean . The reconstruction of a continuous pCO 2 field involves establishing numerical relationships between pCO 2 and a number of independent environmental predictors that are known to control its variability in both time and space. The first step of the method relies on the use of a neural network clustering algorithm (self-organizing map, SOM) to define a discrete set of biogeochemical provinces characterized by similar relationships between the independent environmental variables and a monthly resolved pCO 2 field. The second step consists in deriving nonlinear and continuous relationships between pCO 2 and some or all of the aforementioned independent variables using a feed-forward network (FFN) method within each biogeochemical province created by the SOM. The method is extensively documented in Landschützer et al. (2013Landschützer et al. ( , 2014 but the specific modifications introduced in this study to better simulate the characteristics of the shelves, the choice of environmental drivers, their data sources and the definition of the geographic extent of this analysis are described in the following sections. Figure 1 summarizes the different steps involved in the calculations of the SOM-FFN.

Data sources and processing
All the datasets used in our calculations were converted from their original spatial resolutions to a regular 0.25 • resolution grid. The temporal resolution of all datasets is monthly (i.e., 216 months over the entire period) except for the bathymetry, which is assumed constant over the course of the simulations, and wind speed, whose original resolution is 6 h. For the latter, monthly averages are calculated for each grid cell to generate monthly values. Sea-surface temperature (SST) and sea-surface salinity (SSS) maps were taken from the Met Office's EN4, which consists of quality-controlled subsurface ocean temperature and salinity profiles and their objective analyses (Good et al., 2013). The bathymetry was extracted from the global ETOPO2 database (US Department of Commerce, 2006). The sea-ice concentrations were taken from the global 25 km resolution monthly data product compiled by the NSIDC (National Snow and Ice Cover Data; Cavalieri et al., 1996). Wind speed data were extracted from ERA-Interim reanalysis (Dee et al., 2011). The chlorophyll surface concentrations were extracted from the monthly 9 km resolution SeaWIFS data product prior to 2010 and from MODIS for later years (NASA, 2016). The list of all data products used in the calculations and the transformations applied to produce monthly 0.25 • resolution forcing files are summarized in Table 1.
Finally, the surface ocean pCO 2 were taken from the gridded SOCATv4 product (Sabine et al., 2013;Bakker et al., 2016) while those used for the validation stem from the LDEOv2015 database . With our definition of the coastal zone, SOCATv4 contains ∼ 8 × 10 6 data points and LDEO ∼ 5.6 × 10 6 , with over 70 % of the data shared with SOCATv4. Because of this significant overlap between both data products, we created two entirely independent datasets by randomly assigning each of those common data point to either database to ensure that each data only belongs to one dataset. The resulting datasets are named SOCAT * and LDEO * , with the former being used for training and the latter for validation. Prior to the creation of both datasets, all data from SOCAT were converted from fCO 2 (fugacity of CO 2 in water) to pCO 2 using the formulation reported in Takahashi et al. (2012). The data densities of SOCAT * and LDEO * are shown in Fig. 2 and reveal a heterogeneous spatial coverage. Northern temperate shelves are generally well covered, especially in the northern Atlantic. In this region, the data density is better in SOCAT * than LDEO * thanks to the addition of many European cruises in the SO-CAT database. In contrast, equatorial regions remain undersampled, especially in the Indian Ocean. Because of the difficulty of sampling in waters seasonally covered in ice, polar regions are very unevenly represented in SOCAT * and LDEO * . Luckily, some areas, such as some parts of Antarctica and Bering Sea do contain enough data to train and validate the SOM-FFN. Overall SOCAT * contains roughly 40 % more data than LDEO * .

Modifications of the SOM-FFN method
The specific characteristics of the continental shelves motivated a number of modifications of the global ocean SOM-FFN method, including a 16-fold increase in spatial resolution from 1 to 0.25 • , the addition of new environmental vari-ables as biogeochemical predictors and a shortening of the simulation period to the period of 1998 through 2015. All these modifications are detailed here below.
The higher resolution of 0.25 • × 0.25 • results in over 2 million grid cells that help to better track the global coastline and its complex geomorphological features (Crossland et al., 2005;Liu, 2010). It is also common along eastern and western boundary currents to find continental shelves as narrow as 10-20 km, i.e., an extension that is significantly smaller than a single cell at 1 • resolution. Additionally, biogeochemical fronts associated with river plumes, coastal currents and upwelling are characterized by spatial scales of the order of tens of kilometers or even smaller (Wijesekera et al., 2003). The chosen resolution is also identical to the gridded coastal pCO 2 product from the SOCAT initiative (Sabine et al., 2013;Bakker et al., 2014).
The definition of the geographic extent of the shelf region excludes estuaries and other inland water bodies but uses a wide limit for the outer continental shelf that encapsulates all current definitions of the coastal ocean. This approach facilitates future integration with existing global ocean data products (e.g., Landschützer et al., 2016;Rödenbeck et al., 2015) and model outputs, which typically struggle to represent the shallowest parts of the ocean (Bourgeois et al., 2016). The outer limit used here is given by whichever point is the furthest from the coast: either 300 km distance from the coastline (which roughly corresponds to the outer edge of territorial waters; Crossland et al., 2005) or the 1000 m isobaths . The resulting domain (Fig. B in the Supplement) covers 77 million km 2 , more than twice the surface area generally attributed to the coastal ocean (Walsh et al., 1998;Liu et al., 2010;Laruelle et al., 2013).
The predictor variables for the SOM-FFN networks were chosen based on a set of trial-and-error experiments with the selection criteria being the quality of fit, i.e., the best reconstruction of the available observations. The first step of the SOM-FFN calculations, i.e., the SOM-based clustering, relies on the assignment of the surface ocean data to biogeochemical provinces sharing common spatiotemporal patterns of SST, SSS, bathymetry, rate of change in sea-ice coverage, wind speed and observed pCO 2 . Chlorophyll a is not included in the list of environmental factors used to generate the biogeochemical provinces because of the incomplete data coverage at high latitude in winter due to cloud coverage. Both the use of wind speed and the rate of change in monthly sea-ice concentration are novelties compared to the setup of Landschützer et al. (2013). The latter is calculated from the gridded monthly sea-ice concentration field of Cavalieri et al. (1996). It allows us to account for the complex processes occurring in melting and forming sea ice that are known to strongly influence the dynamics of the carbon within sea-icecovered areas (Parmentier et al., 2013). This first step is performed without any data normalization of the datasets, as this permits us to give more weight to the pCO 2 data. Based on a series of simulations using different numbers of biogeo-chemical provinces, we found that a clustering of the data into 10 biogeochemical provinces minimized the average deviation between simulated and observed pCO 2 (see below) while ensuring that at least 1000 different grid cells can be used for validation against LDEO * in each province.
In the second step of the estimation procedure, i.e., the application of the FFN, SST, SSS, bathymetry, sea-ice concentration and chlorophyll a are used as predictors to establish the nonlinear relationships between these predictors and the target pCO 2 (for data sources, see below). Similar to the SOM in step one, the selected variables not only comprise proxies representing the solubility and biological pumps of the coastal ocean but also yield the best fit to the data. These calculations are done iteratively using a sigmoid activation function on an incomplete dataset in order to perform an assessment on the remaining data after each iteration, until an optimal relationship is found. Additionally, as performed in Landschützer et al. (2015), the output pCO 2 data were smoothed using the spatial and temporal mean of each point's neighboring pixels in both time and space within the 3-pixel neighborhood domain. This operation is performed iteratively and does not significantly alter the results, but it ensures smoother transitions in the pCO 2 field at the boundaries between the provinces. This smoothing method yielded good results for the open Southern Ocean where marked pCO 2 fronts are also observed  and was deemed relevant here due to the potentially strong pCO 2 gradients characterizing the shelves.
Another change from the most recent global ocean SOM-FFN application (Landschützer et al., 2016) is the different temporal extension of the simulation period, which covers the period from 1998 to 2015 instead of 1982 to 2011. This overall shortening was necessary because one of environmental driver, i.e., chlorophyll data derived from SeaW-IFS, only starts in September 1997 (NASA, 2016). Monthly chlorophyll data throughout the entire simulation period were preferred here over the use of a monthly climatology as done in Landschützer et al. (2016) to better capture interannual variability. At the same time, we have been able to extend the coastal product by 4 years to the end of 2015.

Model training and evaluation
We evaluated the coastal SOM-FFN product using the root mean square error (RMSE) metric, calculated as the difference between estimated and observed pCO 2 . During the early development stage, preliminary simulations were performed using only data from SOCAT v2.0 (Pfeil et al., 2013;Sabine et al., 2013) to train the FFN algorithm. Each simulation was carried out using different subsets of environmental predictors extracted from the complete set (SST, SSS, bathymetry, sea-ice concentration and chlorophyll a). The results obtained were then compared to the more complete dataset of SOCAT * , which contains 40 % more shelf pCO 2 measurements from 1998 to 2015 . This  process allowed us, for each province, to calculate the RMSE for several combinations of independent predictor variables for the pCO 2 . Next, the combinations of predictors displaying the lowest RMSE were kept for the final simulations, which then used all data from SOCAT * . Thus, the pCO 2 calculations in each province potentially rely on a different set of predictors (Table 1). The coastal SOM-FFN results are validated through a comparison with the LDEO * dataset through the calculation of residuals and RMSE. Additionally, a model-to-model comparison is also performed with the global ocean results of Landschützer et al. (2016) in the regions where the domains overlap. To perform this latter analysis, the coastal high-resolution coastal pCO 2 product generated here was aggregated to a regular monthly 1 • resolution to match the grid used by Landschützer et al. (2016).
Finally, the ability of the coastal SOM-FFN to capture seasonal variations is assessed by comparing the cell-average simulated monthly pCO 2 to monthly means for cells extracted from the LDEO * database. The cells retained for this analysis are all those for which the average for each month could be calculated from measurements performed on at least 3 different years.
3 Results and discussion

Biogeochemical provinces
Despite the fact that the SOM is not given any prior knowledge regarding space and time, the spatial distribution of the 10 biogeochemical provinces is mostly controlled by latitudinal gradients and distance from the coast ( Fig. 3; highresolution monthly maps are also available in the Supplement). Although the exact spatial extent of each province varies from one month to the next following the seasonal variations of the environmental forcing parameters, each province roughly corresponds to one type of climatological setting. Nevertheless, because of these spatial migrations, most cells belong to different provinces depending on the month (see Fig. B). These seasonal migrations are mostly driven by changes in temperature, sea-ice cover, pCO 2 and, to a lesser degree, salinity. P1, P2 and P4 (Province 1, 2, etc.) are three of the largest provinces, covering a total of 35.7 × 10 6 km 2 and representing warm tropical regions with bottoms at shallow to intermediate depths. During summer, the spatial coverage of P4 expands north-and southward as a consequence of warming. P2 represents tropical regions with deeper bottom depths. P1 and P2 display less seasonal changes in their spatial distribution than P4 due to weaker seasonal temperature changes. P3 and P6, which cover a combined 9.2 × 10 6 km 2 , are found in the Southern Hemisphere and correspond to sub-polar and temperate regions, respectively. Their spatial distributions are subject to marked latitudinal migrations throughout the year as a result of the large amplitude changes in seasonal temperature observed in midlatitude coastal waters (Laruelle et al., 2014). Similarly, P7 corresponds to temperate northern hemispheric waters and displays marked seasonal changes including the shelves of the Norwegian Basin in summer and most of the Mediterranean Sea in winter. P5, P8, P9 and P10 together cover 22.7 × 10 6 km 2 . These provinces are partly (seasonally) covered by sea ice with an average spatial ice cover over the study period of 57, 39, 54 and 46 % for P5, P8, P9 and P10, respectively. P5 represents the shelves of Antarctica all year round. P8 includes large fractions of the enclosed seas at higher northern latitudes such as the Baltic Sea and Hudson Bay while P9 (only 2.9 × 10 6 km 2 ) represents permanently deep and cold polar regions. P5 and P10 represent most of the polar shelves (P5 for the Antarctic and P10 for the Arctic) and are covered in sea ice at levels of 57 and 46 %, respectively. The regions experiencing most notable shifts in province allocation during the year include the northern polar regions and the temperate narrow shelves of the Atlantic and Pacific, particularly western Europe and eastern North America and East Asia (see Fig. B).

Performance of the coastal SOM-FFN
The mean climatological pCO 2 estimated by the coastal SOM-FFN for annually and seasonally averaged conditions are reported in Fig. 4. Before briefly analyzing the main spa- tial and temporal variability of the pCO 2 fields (Sect. 3.3), we evaluate here the overall performance of our interpolation method globally and at the level of each province, including its ability to capture the seasonal cycle.

Comparison with training data (SOCAT * )
Within each province, the pCO 2 simulated by the coastal SOM-FFN are compared to the measurements extracted from SOCATv4 (Table 2). Globally, the average difference between observed and simulated pCO 2 is almost null (overall bias = 0.0 µatm) and the absolute bias is lower than 4 µatm in all 10 provinces. The average RMSE over all provinces of 32.9 µatm is comparable with those reported for other statistical reconstructions of coastal pCO 2 fields summarized by (Chen et al., 2016), although none of these studies were performed at global scale and many rely on different statistical approaches often using remote sensing data. This RMSE is about twice that achieved for the open ocean , reflecting the larger spatiotemporal variability in the coastal ocean and more complex processes governing that variability. Considering these complexities, achieving RMSE at the global scale in the same range as those reported for regional coastal studies is quite good.
Significant variations in both bias and RMSE can be observed between provinces (Tabl e 2). P1 and P3 have the best Table 2. List of the biogeochemical provinces, their geographic distribution and the environmental predictors used to calculate surface ocean pCO 2 . SSS stands for sea-surface salinity, SST for seasurface temperature, Bathy for bathymetry, Ice for sea-ice cover, Chl for chlorophyll concentration and Wind for wind speed. fit between simulated and observed pCO 2 with RMSE lower than 20 µatm. In five provinces that cover a cumulated surface area of 31.2 × 10 6 km 2 (P1, P2, P3, P6 and P9), RM-SEs do not exceed 25 µatm. In P8, however, the maximum RMSE is found with a value of 46.8 µatm. Overall, the performance of the SOM-FFN deteriorates for provinces regularly covered by sea ice (P5, P8-10) in which data coverage is relatively low (RMSE > 34 µatm). This trend is consistent with the spatial distribution of the average residual errors between the pCO 2 field generated by the model and pCO 2 data extracted SOCAT * (Fig. 5a). The residuals are obtained by subtracting the observed values from model output in each grid cell for every month where observations are available. Thus, positive values correspond to cells where the simulated pCO 2 overestimates the field data, while negative values represent cells where the simulated pCO 2 underestimates the field data. The bulk of the residuals fall in the −20 to 20 µatm range in temperate and tropical regions, except for very shallow regions that are under the influence of a large river such as the Mississippi. There, the SOM-FFN often underestimates the observed pCO 2 . There also exist coastal areas where the SOM-FFN underestimates the observed pCO 2 such as Nova Scotia, the southwestern coast of England and the shelves of California and Morocco. The complex hydrodynamics of those regions (some of them being characterized as upwelling regions) may explain the weaker performance of the SOM-FFN. At high latitudes, the performance of the model deteriorates somewhat. For example, Bering Sea contains cells with very high (> 50 µatm) and very low average residuals (< −50 µatm).

Evaluation with LDEO * data
The comparison of our results with the data from LDEO * yields a global bias of 0.0 µatm (calculated as the average difference between observed and SOM-FFN estimated pCO 2 ) for the entire shelf domain. However, the spread is relatively large with an average RMSE of 39.2 µatm. This average RMSE is 19 % larger than the one obtained when comparing the SOM-FFN results with the SOCAT * dataset, which has been used to train the model. A province-based analysis reveals strong differences in the calculated RMSEs, ranging from 20 to 53 µatm (Table 2, LDEO * ). A review of various statistical models used to generate continuous global ocean pCO 2 maps, including some using remote sensing data and algorithms, reports RMSE or uncertainties typically varying within the 10-35 µatm range (Chen et al., 2016) with outliers as high as 50 µatm in the Mississippi Delta (Lohrenz and Cai, 2006). This report also shows that open-ocean estimates generally yield RMSEs lower than 17 µatm, in agreement with Landschützer et al. (2014), whereas coastal estimates are associated with much higher uncertainties. This is likely because these coastal regions have complex biogeochemical dynamics and high frequency variability that cannot be fully captured with the current generation of data interpolation techniques using the limited available predictor data.
In our simulations, the province averaged biases are larger than those calculated with SOCAT * but their absolute value remains small and never exceed 3.9 µatm (P8). Provinces P1, P2, P3 and P6 have RMSE < 30 µatm, which compares with the most robust pCO 2 regional coastal estimates from the literature (Chen et al., 2016). Together, these four provinces account for 37 % of our domain. P4, P5 and P9 display RMSE of between 33 and 38 µatm for P4 and P9, respectively. Overall, these seven provinces, covering the entire tropical and temperate latitudinal bands and some subpolar regions, account for > 72 % of the shelf surface area and yield RMSEs of less than 38 µatm and absolute biases of less than 2.3 µatm. Provinces in the polar regions (P5, P7, P8 and P10) overall display larger deviations with respect to the LDEO * dataset, but the absolute value of their biases never exceeds 3.9 µatm. Their RMSEs all fall in the 51-53 µatm range. This suggests a significantly lower performance of the SOM-FFN in regions partly covered in sea ice. This can be attributed to the limited number of available data points and their very heterogeneous distribution in time and space, as well as to the very limited range of variation of some of the controlling variable such as temperature and salinity. The relatively good performance of the model in tropical region might be partly attributed to the relatively small seasonal variations in pCO 2 within these areas. The residuals calculated by subtracting the SOM-FFN results from LDEO * are very similar to those obtained by subtracting the SOM-FFN results from SOCAT * (Fig. 5b). The residual errors have a nearly Gaussian distribution for every biogeochemical province with the exception of province P8 (Fig. 6). In this case, the distribution not only has the highest spread but also is skewed toward high values.
In order to evaluate the contribution of the newly added predictors compared to the oceanic setup of the SOM-FFN , the model was also trained with- out wind speed and sea-ice cover. The RMSEs obtained with those simulations (Table 4) are significantly higher than those obtained using all predictors (Table 3). However, the overall bias remain small. The results of those simulations are presented in the table below and allow us to quantify the effect of addition of new predictors on the performance of the model. For instance, it can be noticed that the global RMSE increases significantly (from 39.2 to 48 µatm in the comparison with LDEO * when chlorophyll, sea ice and wind speed are not taken into account and from 39.2 to 45 µatm when only sea ice and wind speed are not taken into account). This deterioration of the performance of the model, however, does not evenly affect all provinces. Provinces located at high latitudes (i.e., P8, P9 and P10) perform significantly worse without the inclusion of wind speed and sea ice.
Finally, while the use of residuals and RMSE provides valid quantitative assessment of the model performance, it does not provide insights regarding its ability to reproduce the seasonal pCO 2 cycle. To address this issue, Fig. 7 displays observed mean monthly pCO 2 extracted from LDEO * and calculated by the coastal SOM-FFN for the 40 locations where the LDEO * database has the most data (> 40 month). The error bars associated with the observations reflect the interannual variability. Overall, the coastal SOM-FFN captures the timing of the seasonal pCO 2 cycle in most locations well with pCO 2 minima and maxima occurring at the same time in our results and in the uninterpolated LDEO * data. The pCO 2 maximum generally taking place in early summer is accurately captured by the coastal SOM-FFN. In terms of amplitudes in the pCO 2 signal, the coastal SOM-FFN and the LDEO * data reveal primarily how different the seasonal pCO 2 cycle is from one region to the other, with very low amplitude (< 40 µatm) in some sub-tropical areas, amplitudes > 100 µatm at high northern and southern latitudes, and sometimes very sharp increases during summer, like off the coast of Japan. In most regions, the SOM-FFNbased reconstructions are able to capture these variations and predict seasonal amplitudes comparable to those observed in the data. However, in cells for which the difference between observed and simulated seasonal pCO 2 amplitude is larger than 20 %, the coastal SOM-FFN tends to systematically underestimate the amplitude of the seasonal pCO 2 cycle. This limitation of our model might result from the often short timescales associated with the continental influences in nearshore locations, which are not captured by the environmental predictors used in our calculation. It may also be the result of very short-term events that are aliased in our monthly average calculations.

Comparison with global SOM-FFN
The comparison of our coastal SOM-FFN results with those of Landschützer et al. (2016) for the overlapping grid cells (Table 3) reveals significant differences between both interpolated data products with a RMSE between 24 and 32 µatm for most provinces except P7, P9 and P10 (53, 55 and 37 µatm, respectively). These RMSE values are comparable but slightly lower than those obtained for the comparison with the LDEO * database, in line with those observed with the SOCAT * database. The differences (coastal SOM-FFN minus global SOM-FFN), however, are much larger than those observed between our results and the LDEO * database and highlight the current knowledge gap regarding the mean Figure 7. Climatological monthly mean pCO 2 extracted from the LDEO * database (points) and generated by the artificial neural network (lines) for grid cells with more than 40 months of data. The error bars associated with the data represent the interannual variability, reported as the highest and lowest recorded values for a given month at a given location. state and variability of the transition zone. They range from −17.9 to 11.7 µatm from one province to the next but only amount to −0.6 µatm when considering the cells from all provinces at once.
The overlapping cells used for the comparison with Landschützer et al. (2016) are mostly located over 100 km away from the coastline and therefore both the open ocean and our new shelf ocean dataset are constrained by fairly differ- Table 3. Root mean square error between observed and calculated pCO 2 in the different biogeochemical provinces. The SOM-FFN results are compared to data extracted from the LDEO database (Takahashi et al., 2014)  Overall, the occurrence of large residuals in the shallowest cells of our calculation domain in our results (Fig. 5) suggest that the very near-shore processes controlling the CO 2 dynamics likely are the most difficult to reproduce at the global scale. However, the added value of performing our simulations at the spatial resolution of 0.25 • is exemplified by the ability of our model to capture the plumes of larger rivers such as the Amazon, where pCO 2 is significantly lower than that of the surrounding waters (Cooley et al., 2007;Ibanez et al., 2015). High annual mean values of pCO 2 , close to or above atmospheric levels, are estimated around the Equator up to the tropics. This is consistent with previous studies that identified tropical and equatorial coastal regions as weak CO 2 sources for the atmosphere (Borges et al., 2005;Cai, 2011;Laruelle et al., 2010Laruelle et al., , 2014. A hotspot of very high pCO 2 emerges from our analysis around the Arabian Peninsula, extending into the eastern Mediterranean Sea, as well as into the Red Sea and the Persian Gulf. These regions are poorly monitored and it remains difficult to assess if pCO 2 values in excess of 450 µatm are realistic or not, but the limited body of available literature suggests that very high pCO 2 are indeed observed in these regions (Ali, 2008;Omer, 2010). The very high temperature and salinity conditions observed in the Red Sea, in particular, reduce the CO 2 solubility and induce very high pCO 2 conditions. However, these predicted pCO 2 lie outside of the range used for the training of the SOM-FFN (typically 200-450 µatm) and should thus be considered with caution. Along the oceanic coast of the Arabian Peninsula, the SOM-FFN predicts pCO 2 ranging from 365 to 390 µatm all year round and thus does not capture the well-known increase in pCO 2 resulting from the monsoon-driven summer upwelling in the region (Sarma, 2003;Takahashi et al., 2009). In both hemispheres, pCO 2 values in the 325 to 370 µatm range are generally reconstructed at temperate latitudes, i.e., up to 50 • N and 50 • S, respectively. The northern high latitudes generally have very low pCO 2 values, down to 300 µatm and below, a result that is consistent with the Arctic shelves contributing a large proportion (up to 60 %) of the global coastal carbon sink (Bates and Mathis, 2009;Cai, 2011;Laruelle et al., 2014). Several hotspots of pCO 2 with values as high as 450 µatm can be observed nevertheless north of 70 • N, most notably along the eastern coast of Siberia in winter (see Fig. P in the Supplement), which displays a large zone characterized by pCO 2 > 400 µatm centered at the mouth of the Kolyma River. Such high pCO 2 values have been punctually observed in Arctic coastal waters (Anderson et al., 2009) and could result from the discharge of highly oversaturated riverine waters. But, overall, pCO 2 measurements over Siberian shelves are rare. Thus, our results should be considered with caution in this region because of the scarcity of data to train and validate the coastal SOM-FFN. It should also be noted that the vast majority of this high pCO 2 region is covered by sea ice (Fig. 4b, c) and, although the model estimates pCO 2 values over the entire domain, only ice-free (or partially ice-free) cells will contribute to the CO 2 exchange across the air-sea interface (Bates and Mathis, 2009;Laruelle et al., 2014).

Temporal variability
The reconstructed pCO 2 field is also subject to large seasonal variations (see Fig. P and A in the Supplement). To explore these variations further, Fig. 8 reports seasonal-mean latitudinal profiles of pCO 2 for continental shelves neighboring the eastern Pacific, Atlantic, Indian and western Pacific. The analysis excludes continental shelves at latitudes higher than 65 • because a large fraction of these shelves are seasonally covered by sea ice. The latitudinal pCO 2 profiles reveal that, in most regions, highest and lowest pCO 2 values are observed during the warmest and coldest months, respectively. This trend is particularly pronounced at temperate latitudes where the seasonal pCO 2 amplitude can reach 60 µatm and is exemplified by regions such as the western Mediterranean Sea or the eastern coast of America, which become supersaturated in CO 2 compared to the atmosphere during the summer months. However, there are a few other regions where the lowest pCO 2 is found in the summer, such as the Baltic Sea (Thomas and Schneider, 1999). Around the Equator, the magnitude of the seasonal variations in pCO 2 is limited and does not exceed 30 µatm.
Although the general latitudinal trend of the annual mean pCO 2 is similar across all continental shelves, significant differences in the seasonality can be observed across the largest ocean basins. In particular, most of the eastern Pacific shelves, except for latitudes north of 55 • N, display limited seasonal change in pCO 2 (typically below 30 µatm), while the western Pacific shelves have seasonal pCO 2 amplitudes that can exceed 50 µatm in temperate regions and 100 µatm at high latitudes (above 55 • N). Along the Atlantic shelves, the seasonal signal is more pronounced in the north compared to the south, in agreement with Laruelle et al. (2014). Overall, the northern Pacific (north of 55 • N) displays the most pronounced seasonal change in pCO 2 with a difference of 80 µatm between summer and winter. In the Indian Ocean, the seasonal dynamics of pCO 2 is partly regulated by seasonal upwelling induced by the monsoon (Liu et al., 2010). In this basin north of the Equator, April, May and June are the months with the highest pCO 2 and the seasonal variations do not exceed 30 µatm. In contrast, the seasonal cycle is quite pronounced in the Indian Ocean south of the Equator (∼ 50 µatm).
Latitudinal profiles of SST (Fig. 8b) are similar in all coastal oceans with minimal seasonal variations around the Equator and amplitudes as large as 20 • C at temperate latitudes. The comparison between the seasonal pCO 2 and SST profiles allows us to assess the contribution of temperatureinduced changes in CO 2 solubility to the seasonal pCO 2 variations in the continental shelf waters. However, other factors such as seasonal upwelling and biological activity also strongly influence coastal pCO 2 and contribute to the complexity of the seasonal pCO 2 profiles. To quantify the effect of temperature on seasonal variations of pCO 2 the latter is normalized to the mean temperature at different latitudes in each oceanic basin (Fig. 8) using the formula proposed by Takahashi et al. (1993): where pCO 2(SSTmean) is the temperature normalized pCO 2 , pCO 2,obs is the observed pCO 2 at the observed temperature T obs and T mean is the yearly mean temperature at the considered location. In seawater, an increase in water temperature induces a decrease in gas solubility which leads to a higher water pCO 2 . Thus, comparing pCO 2(SSTmean) with observed pCO 2 monthly values provides a quantitative estimate of the influence of seasonal temperature change on the seasonality of pCO 2 . For most latitudes and oceanic basins, pCO 2 is minimum in late winter or early spring, i.e., at the time when pCO 2(SSTmean) has its maximum. pCO 2 also generally displays a maximum in summer, while pCO 2(SSTmean) reaches its minimum then (Fig. 9). The amplitude of the changes in pCO 2(SSTmean) is quite consistent across oceans and about 2 to 3 times larger than that of pCO 2 . Between 45 and 60 • N, the variations in pCO 2(SSTmean) largely exceed 100 µatm (up to 220 µatm at 60 • N in the western Pacific). In these regions, the magnitude of the seasonal temperature changes is also maximum and reaches 20 • C between winter and summer (Fig. 5). A seasonal signal in pCO 2 with a minimum in late winter or spring when pCO 2(SSTmean) is maximal can also be identified. However, the magnitude of the seasonal variations in pCO 2 is significantly smaller than those of pCO 2(SSTmean) , suggesting that other processes such as biological uptake or transport/mixing partly offsets the temperature effect on solubility. In the subpolar western Pacific shelves (60 • N), a second pronounced dip in pCO 2 following a weaker one in spring is observed in summer, which suggests the occurrence of a pronounced summer biological activity taking up large amounts of CO 2 . This would also explain the sharp increase in pCO 2 in the following month, as a result of the degradation of organic matter synthesized during the summer bloom. Although this region is also the one sub-jected to the strongest seasonal temperature, the amplitude of the seasonal pCO 2(SSTmean) which reaches 220 µatm suggests that non thermal processes drive most of the seasonal pCO 2 variations in the regions. At 20 • N, the amplitude of the changes in both pCO 2 and pCO 2(SSTmean) are lower than at higher latitudes. pCO 2 varies by ∼ 30 µatm between summer and winter in all oceanic basin while the seasonal variations in pCO 2(SSTmean) are more pronounced in the Pacific (∼ 60 µatm) than in the Atlantic or the Indian oceans. In the Southern Hemisphere, the seasonal variations in pCO 2 are not as pronounced as in the Northern Hemisphere, suggesting that the changes induced by the solubility pump are compensated by biological activities. At 10 and 30 • S, the seasonal variations in pCO 2 rarely exceed 30 µatm in either basin with a minimum observed around August.

Summary
This study presents the first global high-resolution monthly pCO 2 maps for continental shelf waters at an unprecedented 0.25 • spatial resolution. We show that when tailored for the specific conditions of shelf systems, the SOM-FFN method previously employed in the open ocean is capable of reproducing well-known and well-observed features of the pCO 2 Figure 9. Seasonal cycle of observed (continuous lines) and temperature normalized pCO 2 (pCO 2(SSTmean) dashed lines) at five different latitudes in four oceanic basins. field in the coastal ocean. Our continuous shelf product allows us, for the first time, to analyze the dominant spatial patterns of pCO 2 across all ocean basins and their seasonality. The data product associated to this paper consists of a netcdf file containing the pCO 2 for ice-free cells at a 0.25 • spatial resolution for each of the 216 month of the simulation period (from January 1998 to December 2015). Twelve maps representing mean pCO 2 fields calculated for each month over the simulation period are also provided. This data product can be combined with wind field products such as ERA-interim (Dee, 2010;Dee et al., 2011) or CCMP (Atlas et al., 2011) to compute spatially and temporally resolved air-sea CO 2 fluxes across the global shelf region, including the Arctic. Maps including pCO 2 for ice-covered cells are also available but should be treated with care because the dynamics of CO 2 fluxes through sea ice are still poorly understood and airsea gas transfer velocities in partially sea-ice-covered areas cannot be predicted from classical wind speed relationships (Lovely et al., 2015) Data availability. Version 4 of the SOCAT database  can be downloaded from www.socat.info/upload/ SOCAT_v4.zip. The observation-based global monthly gridded sea-surface pCO 2 product is provided by Landschützer et al. (2015; https://doi.org/10.3334/CDIAC/OTG.SPCO2_ 1982_2011_ETH_SOM-FFN), was downloaded from http: //cdiac.ornl.gov/ftp/oceans/SPCO2_1982_2011_ETH_SOM_FFN and is now available at https://www.nodc.noaa.gov/ ocads/oceans/SPCO2_1982_2015_ETH_SOM_FFN.html. The LDEOv2015 database  https://doi.org/10.3334/CDIAC/OTG.NDP088(V2015)) was downloaded from http://cdiac.ornl.gov/oceans/LDEO_Underway_ Database/. The global atmospheric reanalysis ERA-interim datasets (Dee et al., 2011;http://doi.wiley.com/10.1002/qj.828) are accessible on the European Centre for Medium-Range Weather Forecasts (ECMWF) website. SST and SSS were extracted from the Met Office's EN4 dataset (Good et al., 2013;doi:10.1002/2013JC009067). The bathymetry used is the global ETOPO2 database (US Department of Commerce, 2006), which can be downloaded from http://www.ngdc.noaa.gov/mgg/fliers/06mgg01.html. The sea-ice concentrations are derived from the global 25 km resolution monthly data product compiled by the NSIDC (National Snow and Ice Cover Data; Cavalieri et al., 1996).