Estimating Marine Carbon Uptake in the Northeast Pacific Using a Neural Network Approach

. The global ocean takes up nearly a quarter of anthropogenic CO 2 emissions annually, but the variability of this uptake at regional scales remains poorly understood. Here we use a neural network approach to interpolate sparse observations, creating a monthly gridded seawater partial pressure of CO 2 ( p CO 2 ) data product from January 1998 to December 2019, at 1/12  x1/12  spatial resolution, in the Northeast Pacific open ocean, a net sink region. The data product (ANN-NEP; NCEI Accession 0277836) was created from p CO 2 observations within the 2021 version of the Surface Ocean CO 2 Atlas (SOCAT), 15 and a range of predictor variables acting as proxies for processes affecting p CO 2 to create non-linear relationships to interpolate observations at a spatial resolution four times greater than leading global products and with better overall performance. In moving to a higher resolution, we show that the internal division of training data is the most important parameter for reducing overfitting. Using our p CO 2 product, wind speed, and atmospheric CO 2 , we evaluate air-sea CO 2 flux variability. On sub-decadal to decadal timescales, we find that the upwelling strength of the subpolar Alaskan Gyre, driven by large-scale 20 atmospheric forcing, acts as the primary control on air-sea CO 2 flux variability ( r 2 = 0.93, p <0.01). In the northern part of our study region, divergence with atmospheric CO 2 is enhanced by increased local wind stress curl, enhancing upwelling and entrainment of naturally CO 2 -rich subsurface waters, leading to decade-long intervals of strong winter outgassing. During recent Pacific marine heatwaves from 2013 on, we find enhanced atmospheric CO 2 uptake (by as much as 45%) due to limited wintertime entrainment. Our product estimates long-term surface ocean p CO 2 increase at a rate below the atmospheric trend 25 (1.4  0.1  atm yr -1 ) with the slowest increase in the center of the subpolar gyre where there is strong interaction with subsurface waters. This mismatch suggests the Northeast Pacific Ocean sink for atmospheric CO 2 may be increasing.

Abstract.The global ocean takes up nearly a quarter of anthropogenic CO 2 emissions annually, but the variability in this uptake at regional scales remains poorly understood.
Here we use a neural network approach to interpolate sparse observations, creating a monthly gridded seawater partial pressure of CO 2 (pCO 2 ) data product from January 1998 to December 2019, at 1/12 • × 1/12 • spatial resolution, in the northeast Pacific open ocean, a net sink region.The data product (ANN-NEP; NCEI Accession 0277836) was created from pCO 2 observations within the 2021 version of the Surface Ocean CO 2 Atlas (SOCAT) and a range of predictor variables acting as proxies for processes affecting pCO 2 to create nonlinear relationships to interpolate observations at a spatial resolution 4 times greater than leading global products and with better overall performance.In moving to a higher resolution, we show that the internal division of training data is the most important parameter for reducing overfitting.Using our pCO 2 product, wind speed, and atmospheric CO 2 , we evaluate air-sea CO 2 flux variability.On sub-decadal to decadal timescales, we find that the upwelling strength of the subpolar Alaskan Gyre, driven by large-scale atmospheric forcing, acts as the primary control on air-sea CO 2 flux variability (r 2 = 0.93, p < 0.01).In the northern part of our study region, divergence from atmospheric CO 2 is enhanced by increased local wind stress curl, enhancing upwelling and entrainment of naturally CO 2 -rich subsurface waters, leading to decade-long intervals of strong winter outgassing.During recent Pacific marine heat waves from 2013 on, we find en-hanced atmospheric CO 2 uptake (by as much as 45 %) due to limited wintertime entrainment.Our product estimates longterm surface ocean pCO 2 increase at a rate below the atmospheric trend (1.4 ± 0.1 µatm yr −1 ) with the slowest increase in the center of the subpolar gyre where there is strong interaction with subsurface waters.This mismatch suggests the northeast Pacific Ocean sink for atmospheric CO 2 may be increasing.

Introduction
As countries around the world consider updating their carbon emission reduction commitments (United Nations Environment Programme, 2022), we require a better understanding of global carbon sinks and how they may be shifting under climate change.The global ocean takes up nearly a quarter of anthropogenic carbon dioxide (CO 2 ) emissions annually (Friedlingstein et al., 2022), but the temporal and spatial variability in the marine sink remains unclear on decadal or longer timescales (McKinley et al., 2011;Fay and McKinley, 2013;Wanninkhof et al., 2013;Gruber et al., 2023).Potential future changes in the marine sink associated with climate change are also unclear (O'Neill et al., 2016).Extending the spatial and temporal coverage of the partial pressure of CO 2 in seawater (pCO 2 ) observations can help address this knowledge gap (Aricò et al., 2021).Benefitting from the increasing abundance of CO 2 measurements at sea Published by Copernicus Publications on behalf of the European Geosciences Union.and community synthesis efforts (e.g., through the Surface Ocean CO 2 Atlas (SOCAT); Bakker et al., 2016), a variety of interpolation approaches have evolved capable of creating continuous observation-based estimates of pCO 2 (Denvil-Sommer et al., 2019;Zhong et al., 2022;Laruelle et al., 2017;Nakaoka et al., 2013;Chen et al., 2019;Ritter et al., 2017;Landschützer et al., 2013).However, their global focus and coarse resolution limits their interpretation at regional scales (Olivier et al., 2022).Only recently, higher-resolution regional pCO 2 maps have been developed for the California Current System (Sharp et al., 2022) to overcome the limitations of coarse global-scale pCO 2 products.These seawater pCO 2 products, combined with wind speed and atmospheric pCO 2 , have informed regional to global air-sea CO 2 flux estimates of multiyear variability (Landschützer et al., 2019(Landschützer et al., , 2016(Landschützer et al., , 2015;;Wang et al., 2021;Hauck et al., 2020).
No high-resolution observation-based air-sea CO 2 flux estimate currently exists for the North Pacific Ocean.The northeast Pacific Ocean has been characterized as a net annual sink for atmospheric CO 2 (Wong et al., 2010;Franco et al., 2021;Sutton et al., 2017;Duke et al., 2023b).The region is divided by two dominant oceanographic features: the Alaskan Gyre system to the north and the North Pacific Current to the south (Franco et al., 2021).With respect to surface ocean carbon measurements, the Alaskan Gyre system remains extremely sparsely sampled.The seasonal air-sea CO 2 flux of the gyre has been described as being strongly influenced by gyre upwelling with outgassing in the winter and uptake in the summer (Brady et al., 2019;Palevsky et al., 2013;Chierici et al., 2006).Along the easternmost part of the North Pacific Current, most of our understanding comes from a limited region: the Ocean Station Papa mooring at 50 • N, 145 • W (Sutton et al., 2017) and the Line P Program (Freeland, 2007).This region has well-documented seasonal cycles (Sutton et al., 2017), interannual variability (Wong and Chan, 1991;Wong et al., 2010), and long-term trends (Franco et al., 2021;Sutton et al., 2019).CO 2 uptake is mainly driven by direct ventilation of the shallow upper water column, with a small seasonal change in surface ocean pCO 2 (Wong et al., 2010;Sutton et al., 2017).The estimated long-term trend in surface ocean pCO 2 appears to be increasing at less than the atmospheric rate of increase (Franco et al., 2021).
Understanding what drives air-sea CO 2 fluxes on seasonal, interannual, and decadal timescales in the northeast Pacific Ocean will provide information on how the regional sink may change in the future.This region is already experiencing persistent marine heat waves with dramatic temperature anomalies observed during 2014 to 2016 and 2018 to 2020 events (Freeland and Ross, 2019;Bond et al., 2015), with future events predicted to become longer-lasting, more frequent, more extensive, and more intense (Frölicher et al., 2018).The impact of large-scale climate-driven decadal oscillations on the marine carbon system is just beginning to be explored in models (Hauri et al., 2021).Furthermore, this region has been targeted as a potential site of marine carbon dioxide removal, as a negative emissions technology aimed at meeting emission reduction goals continues to grow in interest and investment (Cooley et al., 2022).Some proposed approaches look to artificially stimulate biological carbon drawdown (GESAMP, 2019;NASEM, 2021).The northeast Pacific Ocean, as an iron-limited high-nutrient lowchlorophyll region (Dugdale and Wilkerson, 1991;Aumont et al., 2003;Martin et al., 1994;Freeland et al., 1984), has already been the location of geoengineered biological carbon drawdown experiments (Boyd et al., 2007(Boyd et al., , 2005;;Wong and Johnson, 2002;Ianson et al., 2012).Thus, a firm understanding of processes driving carbon fluxes and the establishment of environmental baselines in the region is critical.
Our aim is to investigate drivers of air-sea CO 2 flux variability in the northeast Pacific (NEP) Ocean, building a novel regional high-resolution artificial neural network (ANN) approach adopted from an existing global setup (Landschützer et al., 2013).In Sect.2, we describe the creation of a gridded pCO 2 data product (herein referred to as ANN-NEP; NCEI Accession 0277836; Duke et al., 2023a) monthly from January 1998 to December 2019 at 1/12 • × 1/12 • spatial resolution in the northeast Pacific open ocean (approximately 9 km by 5 km, latitude by longitude).In Sect.3, we show that the high-resolution regional pCO 2 product is robust enough to recreate training observation data while generalizing well compared to independent withheld observation data.We also show that stepping to a higher resolution regionally with appropriate tuning of the internal training and evaluation data ratio does not hinder product performance.In Sect.4, our results show that the upwelling strength of the subpolar Alaskan Gyre and surface ocean connectivity to subsurface waters act as the primary controls on air-sea CO 2 flux variability in our study area.We conclude by calculating longterm trends in surface ocean pCO 2 and carbon uptake, examining trends relative to connectivity to subsurface waters.

Data and methods
Our study area comprises the region between latitudes 45 and 62 • N and longitudes 120 and 155 • W (Fig. 1), with the open-oceanic and coastal boundary defined as 300 km offshore following Laruelle et al. (2017).We limit our study region to the open-ocean regions with reduced variability and related drivers compared to the continental shelf regions.Creating a product on the continental shelf and in the nearshore areas requires different neural network considerations and is associated with high uncertainties (Laruelle et al., 2017).This work represents a 4-fold increase in spatial resolution over previous multiyear global open-ocean products, usually coarser than 1/4 • (Landschützer et al., 2020b).The increased resolution derives from high-resolution predictor data used to create the product (Table 1).To interpolate the existing CO 2 observations in this domain, we adapt the artificial neural network (ANN) self-organizing-map feed-forward-network (SOM-FFN) approach developed by Landschützer et al. (2013Landschützer et al. ( , 2014)).In a first step, the method divides the region of interest into dynamic zones with similar biogeochemical features (i.e., SOM biogeochemical provinces), using a self-organizing-map approach.In a second step, a feed-forward neural network is used for interpolating pCO 2 observations in each of the pre-determined provinces of step one.Specifically, nonlinear functional relationships are created between pCO 2 observations (or neural network target data), where they exist in our study domain, and independent predictor variables (or neural network input data) that are known to drive the marine carbon cycle (see Sect. 2.1 below).Once the relationships are established, they can be applied where no observations exist to fill space and time gaps and create continuous sea surface pCO 2 maps from 1998-2019.

Predictor data
The chosen predictor variables for this study (Table 1) had all been used previously in observation-based pCO 2 interpolated products (Denvil-Sommer et al., 2019;Zhong et al., 2022;Landschützer et al., 2014;Gregor et al., 2018;Telszewski et al., 2009).Sea surface temperature (SST) comes from the satellite-based European Space Agency Climate Change Initiative (Merchant et al., 2019; ESA Sea Surface Temperature Climate Change Initiative (SST_cci): Level 4 Analysis Climate Data Record, version 2.1), as does chlorophyll-a concentration which served as a proxy for biological processes (ESA Ocean Colour Climate Change Initiative, 2022).The remaining physical process predictor data (e.g., sea surface salinity (SSS), sea surface height (SSH), and mixed layer depth (MLD)) are obtained from the Copernicus Marine Environment Monitoring Service global ocean eddy-resolving reanalysis (Global Ocean Physical Reanalysis Product, EU Copernicus Marine Service Information GLOBAL_REANALYSIS_PHY_001_030).Jointly assimilated observations include satellite altimeter data and in situ vertical profiles of temperature and salinity informing the MLD reanalysis product (Table 1).The ocean general circulation model is based on the Nucleus for European Modelling of the Ocean (NEMO) platform, driven at the surface by the European Centre for Medium-Range Weather Forecasts ERA-Interim winds (Jean-Michel et al., 2021).Both chlorophyll a and mixed layer depth were log 10 -transformed to produce a distribution of values closer to normal before being used in either SOM-FFN step.Atmospheric pCO 2 in microatmospheres was downloaded from Landschützer et al. (2020b), derived from the National Oceanic and Atmospheric Administration Earth System Research Global Monitoring Laboratory (https://gml.noaa.gov/ccgg/globalview/,last access: August 2022) atmospheric mole fraction of CO 2 (χCO 2 ) and SST (Reynolds et al., 2002) as well as sea level pressure (Kalnay et al., 1996) following Dickson et al. (2007).Finally, the monthly pCO 2 climatology of Land-schützer et al. (2020a) was used as an additional input parameter solely for defining the SOM biogeochemical provinces.

pCO 2 observations
ANN target pCO 2 data come from the Surface Ocean CO 2 Atlas (SOCAT) v2021 (Bakker et al., 2016), and there are additional data from the Fisheries and Oceans Canada February 2019 Line P cruise (https://www.waterproperties.ca/linep/, last access: August 2022; Fig. 1c).Sea surface CO 2 fugacity (f CO 2 ) was converted to sea surface pCO 2 (Text S1 in the Supplement; Körtzinger, 1999).pCO 2 observations were bin-averaged into monthly 1/12 • latitude by 1/12 • longitude grid cells computing the mean and standard deviation within each grid cell.Of the 8 712 264 grid cells that represent the surface ocean gridded in three dimensions over 264 months  at a 1/12 • × 1/12 • resolution in the study area, just 0.39 % have an associated gridded pCO 2 value (Fig. 1).

Evaluation
In constructing the optimal ANN architecture, a series of SOM-FFN tuning tests were conducted comparing ANN output to training and independent withheld data.The ANN performance for each tuning test was evaluated using five statistical metrics: root mean squared error (RMSE), the coefficient of determination (r 2 ), mean absolute error (MAE), mean bias (calculated as the mean residual), and the slope of the linear regression (c 1 ) between the ANN and the corresponding gridded SOCAT pCO 2 observations.Independent withheld data came from randomly selected SOCAT data using associated expocodes corresponding to unique complete underway cruise tracks or mooring deployments.We tested 100 random independent withheld data splits and selected one representative of basin-wide observational coverage (summer and southern sampling bias), with winter, spring, and fall data present (Figs. 1 and S1 in the Supplement).These independent withheld data represented approximately 5 % of the total study area gridded pCO 2 data, with coverage during all seasons over a range of latitudes (Fig. S1).Ensuring that selected independent withheld data are random yet also representative of the full domain, without withholding critical end range training data, is difficult.Community-based best practices are likely needed going forward to ensure continuity in reported observation-based pCO 2 product uncertainty based on independent withheld data (Sect.3.2).

Neural network construction
SOM-FFN tuning tests occurred in series using the MAT-LAB Neural Network Toolbox, with sequential improvements impacting future tests.The optimization of the SOMderived biogeochemical provinces involved trial-and-error testing of various parameters including SOM biogeochemhttps://doi.org/10.5194/bg-20-3919-2023 Biogeosciences, 20, 3919-3941, 2023  1) shows relative location of the subpolar Alaskan Gyre (negative SSH values) and the North Pacific Current (SSH approximately equal to 0).Ocean Station Papa is labeled and marked with a black circle, while Line P is labeled and marked with a red line.ical province count, predictor variable choice, and static or varying province shape with each time step (Landschützer et al., 2013).The choice of four SOM biogeochemical provinces represented the lowest number of SOM biogeochemical provinces for a typical clustering structure to emerge (Fig. S2), while keeping the ratio of gridded pCO 2 observations to the total grid cells within each province similar (0.38 ± 0.06 %).The best SOM predictor variables were SST, SSS, MLD (Table 1), and the Landschützer et al. (2020a) pCO 2 climatology.We did not normalize predictor data (e.g., force a mean of 0 and standard deviation of 1), implicitly weighting SOM predictors toward the pCO 2 climatology as its range is at least 1 order of magnitude greater than that for SST, SSS, and log(MLD) (Landschützer et al., 2013).As a result, our dynamic provinces follow the seasonal variations in the pCO 2 climatology (Landschützer et al., 2020a).Thus, non-static provinces, which changed shape from 1 month to the next over a climatology, proved the most useful in clustering seasonal cycle variability.This clustering does lead to clearly unphysical fronts as an artifact of the approach.
In reaching an optimal FFN architecture (i.e., number of inputs, number of hidden layers and neurons in each hidden layer), trial-and-error testing of tuning parameters explored predictor variable choice, FFN training algorithm and activation functions, pre-training to determine the number of neurons in the first hidden layer, introducing a second hidden layer with a static number of neurons, and changing the internal data division ratio (optimized at 94 : 6; see Sect.3.4 below).
To emphasize interannual and longer-term trends within the six predictor variables (Table 1), each predictor variable is used in two different forms: first in its raw form and second after de-seasonalizing, bringing the total number of FFN predictors used to 12. To de-seasonalize a variable, within each grid cell, the monthly anomaly was calculated by subtracting the climatological monthly mean, removing the seasonal cycle from the data (the same approach is used when looking at anomaly values in our results; Sect.4).Where no chlorophyll-a satellite data were available, the ANN was run again with the remaining predictors and output was merged to fill empty grid cells (Landschützer et al., 2014).The Levenberg-Marquardt backpropagation training algorithm and hyperbolic tangent sigmoid activation function (i.e., trainlm and tansig respectively in MATLAB) were found to deliver the best fit.The number of neurons within the first hidden layer varied by province and the optimal number of neurons was determined in a pre-training run, where we increased the number of neurons parabolically from two up to a number where the ratio between the training sample size and the number of weights did not exceed 30 (i.e., a number that was determined by trial and error).The best output performance of the pre-training determines the best neuron setup which was then further used for the actual ANN training.
To avoid overfitting, we split all the internal training data into two subsets (i.e., one actual training dataset and one internal evaluation dataset).While most studies use a fixed ratio (usually 80 : 20) between these sets, we used the optimal ratio determined by a criterion suggested in Amari et al. (1997) that is dependent on the number of degrees of freedom and hence varies with the optimal number of neurons determined in the pre-training (see Sect. 3.4 below).While the training dataset is used to reconstruct the nonlinear relationship between input data (Table 1) and pCO 2 observations, the internal evaluation data are used to stop the training before the network starts overfitting the training data.Specifically, we stopped the training when six consecutive iterations did not reduce the network's error compared to internal evaluation data (Hsieh, 2009).The addition of a second hidden layer with a static neuron number of five was found to slightly improve performance within the evaluation metrics.

Cross-evaluation and ensemble
In order to further decrease the risk of overfitting, we used a 10-fold cross-evaluation approach (Li et al., 2020a, b) and a bootstrapping method (Landschützer et al., 2013).Here, all SOCAT cruises (apart from the independent withheld data; Sect.2.3) were randomly divided into 10 equal subsamples using SOCAT expocodes prior to gridding.One subsample was used as 10-fold evaluation data (10 % of all data) and was excluded from training, while the remaining nine subsamples were used together as training data (90 % of all data).The cross-evaluation process was repeated 10 times, with each of the 10 subsamples used exactly once as the 10-fold evaluation dataset.We performed 10 rounds of training with each 10-fold training data subsample where we randomly split the ANN internal training and evaluation data based on the optimal ratio determined through testing (Sect.3.4).The robustness and reliability of an ANN have been shown to be significantly improved by combining several ANNs into an ANN ensemble model (Sharkey, 1999;Linares-Rodriguez et al., 2013;Fourrier et al., 2020).The 10 different ANN outputs trained on 10 different 10-fold training data subsamples were used as an ANN ensemble, where the 10 outputs were https://doi.org/10.5194/bg-20-3919-2023  Biogeosciences, 20, 3919-3941, 2023 averaged to obtain the final ANN-NEP pCO 2 product (Fourrier et al., 2020).

Computation of air-sea fluxes
Using the ANN-NEP pCO 2 product, the air-sea CO 2 flux (F CO 2 ), was calculated using Eq. ( 1): based on solubility (∝) as a function of temperature and salinity using the data presented in Table 1 (Weiss, 1974), gas transfer velocity (k), and the gradient between pCO 2 in the surface ocean and the atmosphere ( pCO 2 ).Here, the gas transfer velocity is a function of wind speed retrieved from monthly 1/4  Wanninkhof, 2014) and the overall product uncertainty in estimated pCO 2 (θ pCO 2 ; Eq. 2; see Sect.3.2 below).As the uncertainty in pCO 2 is dominated by the uncertainty in estimated surface ocean pCO 2 , we neglect the small contribution from atmospheric CO 2 (< 1 µatm; Landschützer et al., 2014).
3 Network performance

Evaluation comparing to SOCAT data
Overall, the final high-resolution regional artificial neural network northeast Pacific pCO 2 product (ANN-NEP) obtains good fits, with an overall r 2 of better than 0.8 and an RMSE of around 11 µatm between the estimated pCO 2 and the gridded SOCAT pCO 2 data across both the training data (Fig. 2a) and independent withheld data (Fig. 2b).The mean bias is negligible (< 0.8 µatm; smaller than observational uncertainty).These results also apply within individual calendar years and within monthly groupings across all years, indicating that the temporally inhomogeneous data distribution over the time range and between seasons does not have a measurable effect on the estimates (Table S1 in the Supplement).There is no clear spatial structure to the residuals, with no specific region displaying persistently positive or negative residuals (Fig. S3).When compared to local pCO 2 mooring data from Ocean Station Papa (which is included in SOCAT; Fig. 1; Sutton et al., 2017), the ANN-NEP product also performs well (r 2 = 0.86; 133 months; not shown).The ANN ensemble model mean approach demonstrated improved performance metrics when compared to each individual ensemble member.The ensemble median was nearly equivalent to the ensemble mean (r 2 = 0.99; not shown).Overall, individual ensemble members showed little deviation (RMSE < 8 µatm) from the ensemble mean (Fig. 2c), with the ensemble mean still improving estimate robustness and reducing overtraining as evident in comparing the final ANN product to independent withheld data (Fig. 2b) and the mean RMSE of individual ensemble members to independent withheld data (13 ± 1 µatm; Fig. S4a).Each individual ensemble member also performed relatively well compared to the 10 % subsample of corresponding 10-fold evaluation data (mean RMSE = 17 ± 2 µatm; Fig. S4b).The mean standard deviation across all grid cells within the 10-fold ensemble is 2.2 ± 1.3 µatm (mapped in Fig. S5).
Observational uncertainty (θ obs = 3.1 µatm) is the measurement uncertainty in pCO 2 in the field, evaluated as the average of the uncertainty assigned to each data point according to its SOCAT quality control (QC) flag (between 2-5 µatm).
Gridding uncertainty (θ grid = 1.5 µatm) is associated with gridding SOCAT observations into monthly 1/12 • × 1/12 • bins, evaluated as the average standard deviation among pCO 2 values within each grid cell with at least three data points.ANN interpolation uncertainty (θ map = 11.1 µatm) is uncertainty introduced by interpolating the pCO 2 observations using the SOM-FFN approach, evaluated as the RMSE from the ANN ensemble output compared to the independent withheld SOCAT data (Fig. 2b).One limitation of our approach in assessing the uncertainty in the ANN interpolation method is that it is only applicable to grid cells where observations are available.Consequently, location-specific seasonal biases, especially at high latitudes with limited wintertime observations (Fig. 1a, b), may not be fully captured or accounted for.The standard deviation of the ensemble (ensemble spread) gives an indication of how robust our estimate is from one run to the next using different 10-fold training data (Sect.2.5; Keppler et al., 2020).ANN run randomness uncertainty (θ run = 2.2 µatm) comes from the mean standard deviation between 10-fold ensemble members (Sects.2.5 and 3.1), which is less than the comparison of each member of the ensemble with the ensemble mean (Figs.S4; 2c).
Overall product uncertainty combining all four components according to Eq. ( 2) is 12 µatm, with the contribution of ANN interpolation uncertainty being the largest.Our product uncertainty is comparable to reported open-ocean uncertainty values from global products (Landschützer et al., 2014) as well as a regional product in the California Current System (Sharp et al., 2022).Combining the reported uncertainty in the gas transfer velocity (Sect.2.6) and the overall pCO 2 product uncertainty yields an average uncertainty of ±0.24 mol m −2 yr −1 in the air-sea gas flux, with the largest fraction of the error stemming from the uncertainty in the gas transfer velocity.The total uncertainty in the flux corresponds to roughly 20 % of individual grid cell calculated flux values.

Improvement relative to a global product
The ANN-NEP pCO 2 product created here shows improved performance over the Landschützer et al. (2020b) global product at each time step within the study area when compared to SOCAT data gridded at 1/12 • × 1/12 • (Fig. 3), illustrating the importance of regional high-resolution estimates in resolving fine-scale variations.Across all evaluation metrics the global product does not perform as well in the region compared to SOCAT training data (RMSE = 14; r 2 = 0.74; mean bias = −2; c 0 = 0.68; MAE = 10; compared to Fig. 2a).This improvement suggests that a regional highresolution product can narrow the range of variability in predictor data within the SOM clustering step and present pCO 2 observation data with a greater correlation to the FFN.In the Landschützer et al. (2020b) global product, there is often only one SOM biogeochemical province covering the whole region, forcing nonlinear relationships in the FFN to be built around greater variability in pCO 2 observation data from a wider range of geographic areas.The ANN-NEP regionally specific four SOM biogeochemical province grouping could alleviate this shortcoming in the FFN step.The improvement in our high-resolution product is particularly evident in the seasonal amplitude, where differences between ANN-NEP and Landschützer et al. (2020b) exceed the product uncertainty in 25 % of grid cells (Fig. S6a).The largest seasonal amplitude differences occur in the north Alaskan Gyre region and south of the North Pacific Current (Fig. S6a).The additional spatial resolution and temporal details in the regional high-resolution product provide key information to inform future observation programs including potential mooring lohttps://doi.org/10.5194/bg-20-3919-2023 Biogeosciences, 20, 3919-3941, 2023 cations.The value added in stepping to a high-resolution regional product proves particularly useful in resolving biogeochemical gradients within the subpolar Alaskan Gyre system in our study area (Sect.4).
Using the same ANN configuration between the different resolutions (i.e., optimal SOM biogeochemical provinces, appropriate predictors, neuron number in the first hidden layer, etc.; see Sect.2.5), the most important parameter for reducing overfitting at each resolution becomes the internal data division ratio between the pCO 2 training data used by the ANN to train and internally evaluate (Fig. 4).We tested a suite of data division ratios between 99 % of data used to train and 1 % used to internally evaluate to a 50/50 split at 1 % intervals for each resolution (Fig. 4).These tests were run without the 10-fold cross-evaluation ensemble approach.To quantify the optimal ratio at each resolution, we used an overfitting metric (Eq.3) equal to the larger of the training or independent withheld data RMSE plus the absolute value of the difference between the two: (3) Using an internal data division ratio optimized based on the overfitting metric, an ANN interpolated pCO 2 product with an uncertainty value of 12.5 ± 0.4 µatm (see Sect. 3.2; Table S2) is possible at each of the coarser resolutions (Fig. 4ae; Table S2).For comparison, the reported uncertainty in a global product (Landschützer et al., 2014) ranges from 9 to 18 µatm.In regions with sufficient observational coverage (Fig. 4f; Bakker et al., 2016), this finding creates a prece-dent for stepping to a higher-resolution product with nearly no loss in performance, overcoming the overfitting concern with increased resolution (Rosenthal, 2016).
4 Air-sea CO 2 fluxes With the estimated ANN pCO 2 product displaying a strong ability to accurately represent regional pCO 2 variability in the northeast Pacific (Sect.3), we calculate air-sea CO 2 fluxes in the region (Eq.1).Long-term (1998Long-term ( -2019) ) mean pCO 2 and air-sea CO 2 fluxes display similar patterns (Fig. 5).In the northwest of our study area, high pCO 2 and net CO 2 outgassing to the atmosphere correspond to the influence of the upwelling subpolar Alaskan Gyre system (Figs.5; 1c).Lower pCO 2 values and stronger atmospheric CO 2 uptake occur in the North Pacific Current region (Fig. 1c) to the south and along the eastern study area margin (Fig. 5).The gradient of the gyre captured in the high-resolution estimate improves regional understanding, with the largest differences between the Landschützer et al. ( 2020b) global product occurring in the north (basinwide absolute difference 2 %-5 %; Fig. S6a).ANN-NEP calculated fluxes compare well to air-sea CO 2 fluxes averaged across six unique, coarser-resolution, global-observationbased pCO 2 products, each using five different wind speed products (r 2 = 0.81; Fay et al., 2021).However, our work suggests that the global product ensemble may underestimate the outgassing signal from the subpolar Alaskan Gyre (Figs. 5b; S7).A higher resolution in the gyre gradient also provides regional context to carbon measurements made at the Ocean Station Papa mooring, often used to represent the Alaskan Gyre (e.g., Jackson et al., 2009), which is actually situated approximately between the two regions and along the Line P monitoring program.

Seasonal variability
To determine seasonal cycle drivers, we decompose the climatological pCO 2 into a thermal and non-thermal component (Takahashi et al., 2002(Takahashi et al., , 1993)): Here the subscripts T and NT represent thermal and nonthermal effects, respectively, while subscripts am and mm represent annual mean and monthly mean values, respectively.Equation (4) imposes the empirical temperature dependency on the annual mean pCO 2 value, providing an estimate of seasonal temperature control (Sarmiento and Gruber, 2006;Takahashi et al., 2002).Equation ( 5) removes the temperature dependency from the monthly mean pCO 2 values,  providing an estimate of the residual, non-thermal controls on pCO 2 including circulation, mixing, gas exchange, and biology.The ratio of the seasonal amplitudes of the two components (Eq.6; R (T NT −1 ) ) can distinguish the dominant pro-cess, where a value greater (less) than 1 indicates that thermal (non-thermal) processes dominate.
In the south part of our study area, the North Pacific Current region (latitudes south of 52 • N; Fig. 6a, c) acts as a strong CO 2 sink through the winter, transitioning to a weak sink through the summer.Whereas in the Alaskan Gyre region the seasonal cycle of pCO 2 is dominantly controlled by non-thermal drivers (Fig. 6b), the North Pacific Current region experiences a near-balance between opposing drivers (Fig. 6c; R (T NT −1 ) = 1.02).In the North Pacific Current region, we see a much smaller seasonal amplitude in pCO 2 (15 µatm; Fig. 6c), peaking in July with warming and falling to a minimum in October.The seasonal amplitude is dampened by the competing effect of temperature changes in solubility and changes in dissolved inorganic carbon concentration through biological drawdown and changing mixed layer depth (Wong et al., 2010;Sutton et al., 2017).With minimal seasonal variation in seawater pCO 2 , the seasonal change in atmospheric CO 2 uptake south of 52 • N (Fig. 6a) is dominantly driven by higher wind speed through the winter months (mean increase of 55 % over summer climatological values).

Alaskan Gyre upwelling strength
On sub-decadal to decadal timescales, there is a strong correlation between air-sea CO 2 flux anomalies and SSH anomalies in the Alaskan Gyre region of our study area (r 2 = 0.93, p < 0.01; Figs.7b, c; S8).In this subpolar gyre, prevailing winds cause upwelling driven by Ekman pumping (Gargett, 1991), but the strength varies.During 1998-2002 as well as 2006-2013, we observe strong winter and spring outgassing in the Alaskan Gyre, with flux densities as high as 3.6 mol m −2 yr −1 in January 2000.In these same periods, anomalously low sea level pressure over the Alaskan Gyre led to anomalously strong wind stress curl which enhanced Ekman pumping and depressed SSH (Fig. 7b; Mann and Lazier, 2006;Hristova et al., 2019).The stronger upwelling brought CO 2 -rich subsurface water to the surface (Lagerloef et al., 1998).Conversely, during the periods of anomalously high sea level pressures and positive SSH anomalies (2003-2005;2014-2020;Fig. 7c), there is less upwelling of CO 2 -rich subsurface water to the surface, allowing primary productivity to draw down surface ocean CO 2 (McKinley et al., 2006) and enhancing CO 2 uptake from the atmosphere (Fig. 7b).
Our observation-based findings show strong carbon relationships with SSH in the Alaskan Gyre, with correlations between other climate indices being weaker.Over longer timescales, climate-driven regional ocean fluctuations have been shown to modulate the Alaskan Gyre surface water inorganic carbon system (Hauri et al., 2021;Di Lorenzo et al., 2008).The North Pacific Gyre Oscillation and the Pacific Decadal Oscillation indices have both been shown to strongly influence the physics, chemistry, and biology of the Gulf of Alaska ecosystem (Di Lorenzo et al., 2008;Newman et al., 2016).Hauri et al. (2021) showed that the rate of ocean acidification in a hindcast model of the Gulf of Alaska was strongly related to the first empirical orthogonal function of SSH.We report the same relationship with SSH described in Hauri et al. (2021) as the dominant control of sub-decadal patterns on air-sea CO 2 fluxes from our observation-based pCO 2 product (Fig. 7).Our estimates of the 12-month running mean air-sea CO 2 flux anomaly in the Alaskan Gyre region (Fig. 7b) are more weakly correlated to the North Pacific Gyre Oscillation, Pacific Decadal Oscillation, and the El Niño-Southern Oscillation indices (r 2 = 0.63, 0.38, 0.22 respectively; p < 0.01).This regional variation in SSH correlating with both observations and models is strong evidence for variations in Alaskan Gyre upwelling strength explaining regional biogeochemistry on sub-decadal to decadal timescales.This relationship supports work showing that the SSH anomaly is an important climate index for the region (Cummins et al., 2005;Di Lorenzo et al., 2008).This finding also highlights the challenges of representing the regional seasonal cycle of the northeast Pacific in a climatology within a reference period dominated by one mode of Alaskan Gyre upwelling strength.

Impact of interannual events
On shorter, interannual timescales, basin-wide variability in air-sea CO 2 flux is significantly influenced by the impact of extreme events, with the underlying sub-decadal and decadal signal amplifying or dampening these impacts.During persistent marine heat waves in the northeast Pacific since 2013, we see strong atmospheric CO 2 uptake anomalies fueled by reduced winter mixing and increased surface density stratification (Fig. 8; Bond et al., 2015).The strongest marine heat wave, known as "the Blob", with sea surface temperature anomalies greater than 3 • C or 4 standard deviations above normal (Freeland and Ross, 2019), persisted in the northeast https://doi.org/10.5194/bg-20-3919-2023 Biogeosciences, 20, 3919-3941, 2023 Pacific from late 2013 to the end of 2015 driven by an anomalous high-pressure atmospheric ridge (Bond et al., 2015;Di Lorenzo and Mantua, 2016).The ridge was associated with a significant decline in local wind speed, decreasing the mixing of deep, colder waters to the surface and raising sea surface temperatures (Bond et al., 2015;Scannell et al., 2020).The reduced winter mixed layer deepening and associated limiting of upwelled and entrained nutrient and CO 2 -rich subsurface waters to the surface have been linked to a relief of ocean acidification (i.e., anomalously high aragonite saturation states; Mogen et al., 2022).There has also been a reported increase in net primary production during the Blob in both in situ and satellite records (Long et al., 2021;Yu et al., 2019;Peña et al., 2019).During the Blob, we see strong negative air-sea CO 2 flux anomalies, particularly in the winter months (October to December 2014 and 2015), indicative of a 30 % increase in uptake relative to climatological monthly means.The increased atmospheric CO 2 uptake is driven by reduced winter wind speeds (by approximately 7 %), leading to limited winter mixed layer deepening and increased surface density stratification, while possibly being enhanced by the increase in net primary production (Fig. 8b).Through a second marine heat wave from mid-2018 to 2020 (Chen et al., 2021;Amaya et al., 2020;Scannell et al., 2020), we see a similar magnitude increase in atmo-spheric CO 2 uptake compared to the Blob event (Fig. 8b).Through some of the largest SST anomalies (October to December 2018 and 2019), we observed large negative air-sea CO 2 flux anomalies indicating enhanced atmospheric uptake of 45 % beyond corresponding climatological monthly means (Fig. 8b), particularly in the Alaskan Gyre (Fig. 7a, b).During this marine heat wave, a similar reduction in upperocean mixing and limited wintertime entrainment due to reduced wind speed were observed (by approximately 9 %; Amaya et al., 2020) along with resultant reduced surface pCO 2 (Franco et al., 2021).Increased net primary production has also been reported (Long et al., 2021).An unusual near-surface freshwater anomaly in the Gulf of Alaska during 2019 contributed to the intensification of the marine heat wave by increasing the near-surface buoyancy and density stratification (Scannell et al., 2020).
Our result that marine heat waves cause enhanced CO 2 uptake in the northeast Pacific may not be applicable to a wider region.Mignot et al. (2022) described how the impact of marine heat waves on air-sea CO 2 fluxes is the net result of two competing mechanisms: (1) increased sea surface temperatures reducing the solubility of CO 2 , increasing pCO 2 , and reducing CO 2 uptake and (2) increased density stratification reducing vertical mixing and entrainment, decreasing surface dissolved inorganic carbon, and increasing CO 2 uptake.Their analysis finds that the temperature effect outweighs the advection effect during persistent marine heat waves in the North Pacific subtropical gyre, reducing CO 2 uptake by 29 ± 11 %, with the opposite true in the tropical Pacific (Mignot et al., 2022).However, when looking at our more localized study area in the northeast Pacific subpolar gyre, we find instead that the impact of reduced winter mixing (because of decreased winds and increased density stratification) tipped the balance toward enhanced atmospheric CO 2 uptake during these marine heat waves, again advocating for the need for high-resolution local studies to better understand local climate change effects.
Through both the Blob and the 2019 marine heat wave, the Alaskan Gyre was in a period of weak upwelling (Fig. 7c), leading to a decade-long negative pCO 2 anomaly (Fig. 8a), in addition to the maximum observed pCO 2 due to the diverging long-term trend from the atmosphere (Sect.4.4).Unraveling the individual influence of these interconnected drivers (i.e., marine heat waves, sub-decadal variability, and long-term trend) is not possible with this product but does prompt future inquiry in combination with regional models and emerging climate analysis tools (e.g., Chapman et al., 2022).
We do not observe a large change in atmospheric CO 2 uptake associated with the 2008 basin-wide ocean iron fertilization event.In August 2008, the eruption of the Kasatochi volcano in the Aleutian Islands, Alaska, USA, dispersed volcanic ash over an unusually large area of the subarctic northeast Pacific, fueling a massive phytoplankton bloom in the iron-limited region (Langmann et al., 2010;Hamme et al., 2010).Hamme et al. (2010) reported that enhanced biological uptake drew down pCO 2 by approximately 25 µatm at Ocean Station Papa.Basin-wide, we see a decrease of 20 µatm from July to August 2008 in the detrended, deseasonalized ANN pCO 2 following the eruption (Fig. 8a) with a drawdown of 30 µatm at Ocean Station Papa.The neural network approach does display a tendency to slightly overestimate relatively low pCO 2 values (Fig. 2a).Because this basin-wide enhanced primary production and surface ocean pCO 2 decrease lasted only 2 months, its impact on the air-sea CO 2 flux was limited (Fig. 8b).The limited impact could be tied to weaker summer wind speeds and longer equilibration times (Jones et al., 2014).The eruption occurred during a period of enhanced Alaskan Gyre upwelling (Fig. 7c), meaning the event was overlaid on top of an already sub-decadal-length positive pCO 2 anomaly (Fig. 8a) perhaps https://doi.org/10.5194/bg-20-3919-2023 Biogeosciences, 20, 3919-3941, 2023 dampening the event's impact.Unfortunately, the lack of direct pCO 2 measurements in SOCAT v2021 during this time prevents us from further investigating the underlying causes.

Air-sea CO 2 flux trend
Overall, the northeast Pacific Ocean CO 2 sink became more negative (i.e., become a larger sink; Fig. 9b) from 1998 to 2020 at a rate of −0.043 ± 0.004 mol m −2 yr −2 .Looking at the start and end of the time series, the average flux from 1998 to 2002 appeared to be a small atmospheric CO 2 sink at −0.7 ± 0.6 mol m −2 yr −1 , compared to the sink from 2016-2020 at −1.6±0.8 mol m −2 yr −1 .Regionally, we do not see a statistically significant trend in the satellite-based ocean surface wind speed data over this time (p > 0.1; Mears et al., 2019).However, the time series endpoints are representative of different Alaskan Gyre upwelling modes (Fig. 7c), with the time series starting in a sub-decadal-length positive pCO 2 anomaly and ending during a decade-long negative pCO 2 anomaly.Decadal trends will be sensitive to the start and endpoint of the time series (e.g., Fay and McKinley, 2013).We caution that our trend results may not be representative of longer time periods (i.e., from industrial onset).
Taking the full study area de-seasonalized (Sect.2.4), averaged pCO 2 , we calculated trends based on shorter time series within our data using different monthly time series start and end dates (Fig. 10).Based on pCO 2 data time series ranges greater than 10 years (between 1998 and 2020), 87 % of trends are less than the atmospheric trend with a mean of 1.59 ± 0.27 µatm yr −1 (N = 9222 at a monthly time step; Fig. 10).In the remaining 13 % of total time series start and end date combinations, there is a pronounced very steep trend exceeding the atmospheric rate of increase.Date combinations resulting in trends exceeding the atmospheric increase could be partly attributed to start and end dates coinciding with periods of weak and strong Alaskan Gyre upwelling, respectively.These upwelling modes induce negative and positive pCO 2 anomalies, which further amplify the observed trend.However, the Alaskan Gyre region makes up only about 25 % of the total study area (region north of 52 • N; Sect.4.2), and trends in Fig. 10 represent the ANN-NEP full spatial domain.
The rate of change in the air-sea CO 2 flux over the study period is largely due to the increasing gradient with the atmosphere (Fig. 9a).Over the full study area from 1998-2020, the ANN-NEP pCO 2 trend is 1.4 ± 0.1 µatm yr −1 .The Landschützer et al. (2020b) global product trend in the region is similar at 1.5 ± 0.1 µatm yr −1 .At Ocean Station Papa, the ANN-NEP pCO 2 trend is 1.5 ± 0.1 µatm yr −1 , in agreement with the observed trend based on discrete samples collected one to three times per year (1.6 ± 0.8 µatm yr −1 between 1990-2019; Franco et al., 2021).The oceanpCO 2 trend is not as rapid as the atmospheric increase of 2.12 ± 0.03 µatm yr −1 over the same period (Fig. 9a).Sutton et al. (2017) also reported a lag with the atmosphere at Ocean Station Papa with a pCO 2 trend of −1.5±0.9 µatm yr −1 from the 2007-2014 mooring pCO 2 data.The ANN-NEP pCO 2 trend at Ocean Station Papa is −0.67 ± 0.05 µatm yr −1 .
The observed lag in the increase in surface ocean pCO 2 with respect to atmospheric pCO 2 , causing an increasing air-sea gradient ( pCO 2 ), may be attributed to interaction with subsurface water.We find a strong spatial correlation between the trend in pCO 2 and the calculated average vertical velocity associated with Ekman pumping (r 2 = 0.64, p < 0.01; Fig. 11b).Ekman pumping was calculated using the MATLAB Climate Toolbox Ekman function (Greene et al., 2019(Greene et al., , 2017;;Kessler, 2002) from monthly 1/4 • spatialresolution Cross-Calibrated Multiplatform zonal and meridional ocean surface wind speeds (Mears et al., 2019) interpolated to 1/12 • .Fay and McKinley (2013) describe regions impacted by upwelling from depth having shallower pCO 2 trends and greater divergence from the atmosphere based on models and observations.Dissolved inorganic carbon increases with depth, causing enhanced vertical mixing to increase surface ocean pCO 2 over the seasonal cycle (Sect.4.1 to 4.3).However, in the long term, dissolved inorganic carbon is increasing most in surface waters, due to direct uptake of atmospheric CO 2 , and least at depth.The supply to the surface of subsurface waters with low anthropogenic carbon causes a lag in the rate of increase in surface ocean pCO 2 .The anthropogenic carbon signals in the intermediate to deep waters in this region are some of the smallest in the global ocean due to circulation patterns (Sabine et al., 2004;Gruber et al., 2019;Carter et al., 2019;Clement and Gruber, 2018).Regions within our study area with a greater connection between surface and deep waters, such as the center of the Alaskan Gyre in the north (Van Scoy et al., 1991), are experiencing the largest divergence from the atmosphere.With a joint increase in projected future wind speeds (Zheng et al., 2016;Young and Ribal, 2019;Wanninkhof and Triñanes, 2017) and a growing pCO 2 , the region is likely to become a stronger net annual sink for atmospheric CO 2 .

Conclusions
Using a high-resolution regional neural network approach, we represent pCO 2 measurement variability well in the northeast Pacific Ocean.We interpolated sparse observations using nonlinear relationships developed with a neural network approach based on predictor data from satellite and reanalysis products to create a continuous monthly pCO 2 estimate at a 1/12 • × 1/12 • spatial resolution.Using a crossevaluation ensemble approach we were able to produce a robust pCO 2 product that represents regional variability with an uncertainty of 12 µatm.We found that stepping to a significantly higher spatial resolution, compared to typical openocean observation-based pCO 2 products (1/4 or 1 • spatial resolution), led to nearly no loss in performance despite a much lower ratio of gridded pCO 2 observations compared  to the total number of grid cells.The most important parameter for reducing overfitting across regional pCO 2 products with different spatial resolutions was the internal division of training data.Higher-resolution products require more direct training data and fewer data to internally evaluate, while still comparing them to independent withheld data.This work shows that high-resolution, high-performance, observation-based neural-network-derived pCO 2 products can be developed when reducing the complexity of controlling processes by focusing on specific regions.However, chosen predictor variables need to be regionally specific considering "processfocused" influences on the local carbon system.Our reported optimization of the internal data division ratio between network training and evaluation data indicates the importance of this choice when moving to a higher spatial resolution.Increased spatial resolution will be necessary to capture variability in regions strongly influenced by mesoscale processes, enabling the resolution of oceanographic features such as eddies, upwelling regimes, and gyre system gradients.
We report pronounced variability in marine CO 2 uptake in the northeast Pacific Ocean dominantly driven by the control of Alaskan Gyre upwelling and connectivity to subsurface waters.Overall, the open-ocean northeast Pacific acted as a net sink for atmospheric CO 2 from 1998 to 2020 with an average basin-wide air-sea CO 2 flux of −1.2 ± 1.4 mol m −2 yr −1 but with pronounced seasonality.In the northern Alaskan Gyre region, wintertime upwelling and entrainment lead to significant outgassing.In the southern North Pacific Current region, the seasonal flux cycle is largely driven by wind speed where the seasonal change in surface ocean pCO 2 remains small.Based on our product, the upwelling strength of the Alaskan Gyre dominates airsea CO 2 flux variability in that region on sub-decadal to decadal timescales.During prolonged periods of enhanced gyre upwelling, we see strong winter outgassing driven by upwelled and entrained CO 2 -rich subsurface waters.During periods of weak gyre upwelling, the northern part of our study area acts as a sink for atmospheric CO 2 yearround.During two recent marine heat waves we see enhanced CO 2 uptake due to limited wintertime entrainment of subsurface waters resulting from weaker winds.However, we observed minimal impact on atmospheric CO 2 uptake following a 2008 volcanic eruption, with air-sea CO 2 flux anomalies linked to enhanced biological uptake via iron fertilization lasting only 2 months.The gradient between the northeast Pacific surface ocean pCO 2 and atmospheric CO 2 is increasing, pushing the region towards becoming an enhanced sink for atmospheric CO 2 .We see the largest increase in the gradient (and thus potential for greater future uptake) at the center of the Alaskan Gyre where, through upwelling, there is a strong connection with subsurface waters low in anthropogenic CO 2 .
The regional, high-resolution pCO 2 product created here could serve as a valuable baseline for regional models (e.g., Pilcher et al., 2018;Hauri et al., 2020).The pCO 2 product and associated air-sea CO 2 flux estimates offer continuous coverage in sparsely sampled regions informed by patterns in well-sampled neighboring waters.The product could be used to aid in model evaluation, for data assimilation, to constrain initial conditions, to enhance carbon flux process understanding, and to improve regional climate change projections.
Our analysis illustrates the complex interplay between factors driving air-sea CO 2 flux variability at varying temporal scales across the study domain and within broad subregions (Alaskan Gyre and North Pacific Current regions) allowing us to suggest what resources will be needed to make further advances.Improvement in estimated pCO 2 would benefit from an increase in the number of pCO 2 observations used for training.We recommend prioritizing additional measurements in the northern Alaskan Gyre region in future observational programs.Our estimated fluxes in the gyre are large (both uptake and outgassing), but observations are sparse, leading to the largest standard deviations between our cross-evaluation ensemble members (Fig. S4).The impact of subdecadal to decadal variability on the trend in surface ocean pCO 2 and in regional atmospheric CO 2 uptake emphasizes the importance of long-duration time series sites and programs to capture the natural cycles of variability and accurately estimate change.Our findings and estimated pCO 2 product serve as environmental baselines, which could be used to inform future marine carbon dioxide removal in the northeast Pacific at the basin and regional scale.However, use of our product at the individual grid cell level is not encouraged as errors likely remain high, whereas over broader regions these errors average away.Our study serves as an important initial step in creating a complete carbon budget for the northeast Pacific, with coastal, pelagic, and benthic carbon stocks and fluxes still to be resolved.
Code and data availability.All data used are publicly available.ANN-NEP pCO 2 and air-sea CO 2 flux fields are available through the National Center for Environmental Information (NCEI Accession 0277836; https://doi.org/10.25921/c1w8-6v02,Duke et al., 2023a).pCO 2 data are from the Surface Ocean CO 2 Atlas (SO-CAT) v2021 (available at https://www.socat.info/,Bakker et al., 2016), and additional data are from the Fisheries and Oceans Canada February 2019 Line P cruise (available at https://www.waterproperties.ca/linep/,Water Properties Group, 2022).Sea surface temperature and chlorophyll a are from the European Space Agency Climate Change Initiative (available at https://climate.esa.int/en/odp/#/dashboard,European Space Agency, 2019).Sea surface salinity, sea surface height, and mixed layer depth are from the Copernicus Marine Environment Monitoring Service (available at https://data.marine.copernicus.eu/product/GLOBAL_MULTIYEAR_PHY_001_030/description, E.U.Copernicus Marine Service, 2022).Ocean surface wind data are from the Cross-Calibrated Multiplatform version 2 Wind Vector Analysis Product (available at https://www.remss.com/measurements/ccmp/,Mears et al., 2019).
Author contributions.PJD and PL developed the neural network code and created the product with help from RCH, DI, NCS, and MMMA.PJD, RCH, DI, and PL contributed to the interpretation and analysis of the results.All co-authors contributed to editing the paper.RCH and DI supervised the project work.PAC provided data and consultation.PJD prepared the paper with contributions from all co-authors.
Competing interests.At least one of the (co-)authors is a member of the editorial board of Biogeosciences.The peer-review process was guided by an independent editor, and the authors also have no other competing interests to declare.
Disclaimer.This article reflects only the authors' view -the funding agencies as well as their executive agencies are not responsible for any use that may be made of the information that the article contains.Review statement.This paper was edited by Julia Uitz and reviewed by Marine Fourrier and one anonymous referee.

Figure 1 .
Figure 1.(a) Total number of months of observational coverage from Surface Ocean CO 2 Atlas (SOCAT) v2021 (Bakker et al., 2016) and additional data from the Fisheries and Oceans Canada February 2019 Line P cruise (https://www.waterproperties.ca/linep/,last access: August 2022) per 1/12 • × 1/12 • grid cell.(b) Number of unique annual months of observational coverage per 1/12 • × 1/12 • grid cell.(c) Mean sea surface height (SSH; Table1) shows relative location of the subpolar Alaskan Gyre (negative SSH values) and the North Pacific Current (SSH approximately equal to 0).Ocean Station Papa is labeled and marked with a black circle, while Line P is labeled and marked with a red line.

Figure 2 .
Figure 2. Regional high-resolution artificial neural network northeast Pacific (ANN-NEP) ensemble mean pCO 2 against (a) training pCO 2 observation data and (b) independent withheld pCO 2 observation data.Number of observations (N ), root mean squared error (RMSE), coefficient of determination (r 2 ), mean absolute error (MAE), mean bias (calculated as the mean residual), and the slope of the linear regression (c 1 ).The observed linear relationship is represented by the dotted blue line.(c) ANN-NEP pCO 2 (ensemble mean) against individual ensemble member estimates.Total number of observations (N ) across all 10-fold ensemble members (see Sect. 2.5).Across all panels, data are binned into 2 µatm by 2 µatm bins.The dashed black line represents a perfect fit of slope (c 1 ) = 1 and intercept = 0. Color bar shows data density on a log scale.Note the order-of-magnitude difference in the color bar scale between panels.

Figure 3 .
Figure 3. (a) Mean residuals over the full study area at each time step of the ANN-NEP pCO 2 estimate in this study and the Landschützer et al. (2020b) product interpolated to the 1/12 • × 1/12 • grid of this study, compared to the gridded SOCAT data displaying the mean absolute error (MAE).(b) Total number of gridded SOCAT observations across the study area at each time step.

Figure 4 .
Figure 4. Varying spatial resolution: (a) 1 • , (b) 1/2 • , (c) 1/4 • , (d) 1/8 • , and (e) 1/12 • ANN pCO 2 product performance evaluated by the mean RMSE (Sect.2.3) of training data (blue line), independent withheld data (orange line), and an overfitting metric (green line; Sect.3.4) against internal data division ratios of the pCO 2 training data used by the ANN to train and internally evaluate.The ratios in gray show the range of the lower 10th percentile (5 of 50 runs) of overfitting metric values for each resolution.(f) At each spatial resolution, the left-hand y axis shows the optimal internal data division ratio with error bars representing the lower 10th percentile of overfitting metric values (same as gray ranges in a to e with all resolutions converging around RMSE = 12.8 ± 0.4 µatm).The right-hand y axis shows the percent of gridded pCO 2 observations (targets) compared to the total number of grid cells.

Figure 5 .
Figure 5. (a) Long-term (1998-2019) mean ANN-NEP pCO 2 and (b) air-sea CO 2 flux density in mol m −2 yr −1 for the open-ocean northeast Pacific.Negative (positive) flux values indicate CO 2 uptake (outgassing) by the ocean.Ocean Station Papa is shown for reference.

Figure 6 .
Figure 6.(a) Zonally averaged air-sea CO 2 flux from the ANN-estimated pCO 2 product climatology along each 1/12 • latitude band in the study area plotted against the climatological month along the x axis (Hovmöller diagram).Negative (positive) flux values indicate CO 2 uptake (outgassing) by the ocean.The dashed gray line subdivides the Alaskan Gyre and North Pacific Current regions in the north and south with different seasonal drivers summarized in panels below.(b) Alaskan Gyre region (latitudes north of 52 • N) and (c) North Pacific Current region (latitudes south of 52• N) area-averaged monthly climatological pCO 2 (solid blue line), thermal component (i.e., changes due to temperature; Eq. 4; dotted red line), non-thermal component (i.e., changes due to circulation, mixing, gas exchange, and biology; Eq. 5; dot-dash green line), and atmospheric pCO 2 (dashed black line).The climatology is plotted over 19 months to emphasize the seasonal cycle.

Figure 7 .
Figure 7. Alaskan Gyre region of our study area (latitudes north of 52 • N).(a) Air-sea CO 2 fluxes grouped by seasonal 3-month bins along with the annual average (black line).(b) Air-sea CO 2 flux anomalies removing the seasonal cycle (Sect.2.4) and applying a 12-month running mean.(c) Sea surface height (SSH) anomalies in the same region removing the seasonal cycle and applying a 12-month running mean.Gray boxes highlight periods of anomalously high Alaskan Gyre upwelling strength corresponding to negative SSH anomalies.Horizontal dashed lines mark 0 in each panel.Seasonal groupings in (a) are winter (December, January, February), spring (March, April, May), summer (June, July, August), and fall (September, October, November).

Figure 8 .
Figure 8. Full study-area-averaged interannual variability in (a) pCO 2 anomaly removing the seasonal cycle (Sect.2.4) and long-term trend (Sect.4.4).In addition, (b) air-sea CO 2 flux anomaly, (c) sea surface temperature anomaly, and (d) chlorophyll-a anomaly all removing the seasonal cycle.Gray boxes highlight large interannual events including the Blob marine heat wave in 2014-2016, a second marine heat wave in 2018-2020 (labeled "'18 MHW" in the figure), and a 2008 ocean iron fertilization event following the Kasatochi volcanic eruption (Kasatochi).Horizontal dashed lines mark 0 in each panel.

Figure 9 .
Figure 9. Full study-area-averaged long-term trends in (a) ANN-NEP surface ocean pCO 2 (solid line) and atmospheric pCO 2 (dashed line) and (b) air-sea CO 2 flux density.

Figure 10 .
Figure 10.Full study-area-averaged pCO 2 anomaly (removing the seasonal cycle; Sect.2.4) linear trend calculated using different monthly time series start and end dates.Time series start from dates on the left and end on a date along the top.The dashed black line indicates equal start and end dates.Trend values are only shown for time series of at least a 5-year duration.Red areas represent negative pCO 2 trends; gold areas represent trends greater than the atmospheric rate of increase (2.12 ± 0.01 µatm yr −1 ).The purple dotted line indicates a 10-year time series duration.

Figure 11 .
Figure 11.(a) Trend in pCO 2 where more negative (darker) values indicate an increasing gradient with the atmosphere and a lag in the pCO 2 increase in the surface ocean.Black crosshatches show grid cells with an insignificant calculated trend (outside the 95 % confidence level; p ≥ 0.05).(b) Calculated average vertical velocity associated with Ekman pumping (calculated from zonal and meridional wind speed) where negative (blue) values indicate downwelling and positive (green) values indicate upwelling.Ocean Station Papa is shown for reference.
Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.Acknowledgements.Ocean Station Papa mooring time series site and the Line P Program are operated by the National Oceanic and Atmospheric Administration (NOAA) and Fisheries and Oceans Canada (DFO).Thanks to Marine Fourrier and one anonymous reviewer for their helpful comments improving the paper.Financial support.Funding for this project to Roberta C. Hamme was provided by the Natural Sciences and Engineering Research Council of Canada (NSERC) through the Advancing Climate Change Science in Canada program (grant no.ACCPJ 536173-18).Funding from Fisheries and Oceans Canada's Aquatic Climate Change Adaptation Service Program to Paul Covert supported the analysis of recent underway pCO 2 measurements made by the Line P Program (grant no.96036).Patrick J. Duke's financial support was also provided by a Natural Sciences and Engineering Research Council of Canada (NSERC) Doctoral Postgraduate Scholarship.

Table 1 .
Northeast Pacific open-ocean artificial neural network predictor variables and their corresponding source, original temporal and spatial resolutions, and processing steps used for this study.
(Mears et al., 2019) Cross-Calibrated Multiplatform ocean surface wind data(Mears et al., 2019)interpolated to 1/12 • , the temperature-dependent Schmidt number specific to CO 2 , and the gas transfer coefficient fromWanninkhof (2014).Negative (positive) flux values indicate CO 2 uptake (outgassing) by the ocean.Uncertainty in the air-sea CO 2 flux comes from a 20 % uncertainty in k (