<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing with OASIS Tables v3.0 20080202//EN" "journalpub-oasis3.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:oasis="http://docs.oasis-open.org/ns/oasis-exchange/table" dtd-version="3.0">
  <front>
    <journal-meta><journal-id journal-id-type="publisher">BG</journal-id><journal-title-group>
    <journal-title>Biogeosciences</journal-title>
    <abbrev-journal-title abbrev-type="publisher">BG</abbrev-journal-title><abbrev-journal-title abbrev-type="nlm-ta">Biogeosciences</abbrev-journal-title>
  </journal-title-group><issn pub-type="epub">1726-4189</issn><publisher>
    <publisher-name>Copernicus Publications</publisher-name>
    <publisher-loc>Göttingen, Germany</publisher-loc>
  </publisher></journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.5194/bg-14-5551-2017</article-id><title-group><article-title>Empirical methods for the estimation of Southern Ocean CO<inline-formula><mml:math id="M1" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula>: support vector and random forest regression</article-title>
      </title-group><?xmltex \runningtitle{Southern Ocean surface CO${}_{2}$ estimates}?><?xmltex \runningauthor{L.~Gregor et al.}?>
      <contrib-group>
        <contrib contrib-type="author" corresp="yes" rid="aff1 aff2">
          <name><surname>Gregor</surname><given-names>Luke</given-names></name>
          <email>luke.gregor@uct.ac.za</email>
        <ext-link>https://orcid.org/0000-0001-6071-1857</ext-link></contrib>
        <contrib contrib-type="author" corresp="no" rid="aff3">
          <name><surname>Kok</surname><given-names>Schalk</given-names></name>
          
        </contrib>
        <contrib contrib-type="author" corresp="no" rid="aff1">
          <name><surname>Monteiro</surname><given-names>Pedro M. S.</given-names></name>
          
        </contrib>
        <aff id="aff1"><label>1</label><institution>Southern Ocean Carbon-Climate Observatory (SOCCO), CSIR, Cape Town, South Africa</institution>
        </aff>
        <aff id="aff2"><label>2</label><institution>University of Cape Town, Department of Oceanography, Cape Town, South Africa</institution>
        </aff>
        <aff id="aff3"><label>3</label><institution>University of Pretoria, Department of Mechanical and Aeronautical Engineering, Pretoria, South Africa</institution>
        </aff>
      </contrib-group>
      <author-notes><corresp id="corr1">Luke Gregor (luke.gregor@uct.ac.za)</corresp></author-notes><pub-date><day>8</day><month>December</month><year>2017</year></pub-date>
      
      <volume>14</volume>
      <issue>23</issue>
      <fpage>5551</fpage><lpage>5569</lpage>
      <history>
        <date date-type="received"><day>29</day><month>May</month><year>2017</year></date>
           <date date-type="rev-request"><day>12</day><month>June</month><year>2017</year></date>
           <date date-type="rev-recd"><day>3</day><month>October</month><year>2017</year></date>
           <date date-type="accepted"><day>5</day><month>November</month><year>2017</year></date>
      </history>
      <permissions>
        
        
      <license license-type="open-access"><license-p>This work is licensed under the Creative Commons Attribution 3.0 Unported License. To view a copy of this licence, visit <ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by/3.0/">https://creativecommons.org/licenses/by/3.0/</ext-link></license-p></license></permissions><self-uri xlink:href="https://bg.copernicus.org/articles/14/5551/2017/bg-14-5551-2017.html">This article is available from https://bg.copernicus.org/articles/14/5551/2017/bg-14-5551-2017.html</self-uri><self-uri xlink:href="https://bg.copernicus.org/articles/14/5551/2017/bg-14-5551-2017.pdf">The full text article is available as a PDF file from https://bg.copernicus.org/articles/14/5551/2017/bg-14-5551-2017.pdf</self-uri>
      <abstract>
    <p id="d1e118">The Southern Ocean accounts for 40 % of oceanic CO<inline-formula><mml:math id="M2" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> uptake, but the
estimates are bound by large uncertainties due to a paucity in observations.
Gap-filling empirical methods have been used to good effect to approximate
<inline-formula><mml:math id="M3" display="inline"><mml:mi>p</mml:mi></mml:math></inline-formula>CO<inline-formula><mml:math id="M4" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> from satellite observable variables in other parts of the ocean,
but many of these methods are not in agreement in the Southern Ocean. In this
study we propose two additional methods that perform well in the Southern
Ocean: support vector regression (SVR) and random forest regression (RFR).
The methods are used to estimate <inline-formula><mml:math id="M5" display="inline"><mml:mrow><mml:mi mathvariant="normal">Δ</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:math></inline-formula>CO<inline-formula><mml:math id="M6" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> in the Southern Ocean based
on SOCAT v3, achieving similar trends to the SOM-FFN method by
<xref ref-type="bibr" rid="bib1.bibx22" id="text.1"/>. Results show that the SOM-FFN and RFR approaches
have RMSEs of similar magnitude (14.84 and 16.45 <inline-formula><mml:math id="M7" display="inline"><mml:mi mathvariant="normal">µ</mml:mi></mml:math></inline-formula>atm, where
1 atm <inline-formula><mml:math id="M8" display="inline"><mml:mo>=</mml:mo></mml:math></inline-formula> 101 325 Pa) where the SVR method has a larger RMSE
(24.40 <inline-formula><mml:math id="M9" display="inline"><mml:mi mathvariant="normal">µ</mml:mi></mml:math></inline-formula>atm). However, the larger errors for SVR and RFR are, in
part, due to an increase in coastal observations from SOCAT v2 to v3, where
the SOM-FFN method used v2 data. The success of both SOM-FFN and RFR depends
on the ability to adapt to different modes of variability. The SOM-FFN
achieves this by having independent regression models for each cluster, while
this flexibility is intrinsic to the RFR method. Analyses of the estimates
shows that the SVR and RFR's respective sensitivity and robustness to
outliers define the outcome significantly. Further analyses on the methods
were performed by using a synthetic dataset to assess the following: which
method (RFR or SVR) has the best performance? What is the effect of using
time, latitude and longitude as proxy variables on <inline-formula><mml:math id="M10" display="inline"><mml:mrow><mml:mi mathvariant="normal">Δ</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:math></inline-formula>CO<inline-formula><mml:math id="M11" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula>? What is
the impact of the sampling bias in the SOCAT v3 dataset on the estimates? We
find that while RFR is indeed better than SVR, the ensemble of the two
methods outperforms either one, due to complementary strengths and weaknesses
of the methods. Results also show that for the RFR and SVR implementations,
it is better to include coordinates as proxy variables as RMSE scores are
lowered and the phasing of the seasonal cycle is more accurate. Lastly, we
show that there is only a weak bias due to undersampling. The synthetic data
provide a useful framework to test methods in regions of sparse data coverage
and show potential as a useful tool to evaluate methods in future studies.</p>
  </abstract>
    </article-meta>
  </front>
<body>
      

<sec id="Ch1.S1" sec-type="intro">
  <title>Introduction</title>
      <p id="d1e216">The global oceans have played an important role in mitigating the effects of
climate change by taking up 25 % of anthropogenic CO<inline-formula><mml:math id="M12" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> emissions annually
<xref ref-type="bibr" rid="bib1.bibx21 bib1.bibx28" id="paren.2"/>. The Southern Ocean has played a
disproportionate role in this uptake, accounting for  40 % of the oceanic
anthropogenic CO<inline-formula><mml:math id="M13" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> uptake <xref ref-type="bibr" rid="bib1.bibx21 bib1.bibx13" id="paren.3"/>. Yet, despite
the region's importance, first-order CO<inline-formula><mml:math id="M14" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> flux estimates are bound by large
uncertainties due to sparse observations in the Southern Ocean
<xref ref-type="bibr" rid="bib1.bibx24 bib1.bibx37 bib1.bibx25 bib1.bibx48 bib1.bibx3" id="paren.4"/>. These
uncertainties limit our capacity to resolve variability and trends of CO<inline-formula><mml:math id="M15" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula>.</p>
      <p id="d1e265">Viable alternative methods to estimate net CO<inline-formula><mml:math id="M16" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> flux are atmospheric CO<inline-formula><mml:math id="M17" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula>
inversions, ocean biogeochemical process models and empirical models
<xref ref-type="bibr" rid="bib1.bibx42" id="paren.5"/>. As shown by <xref ref-type="bibr" rid="bib1.bibx27" id="text.6"/>, atmospheric CO<inline-formula><mml:math id="M18" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula>
inversions are useful tools to estimate the net CO<inline-formula><mml:math id="M19" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> fluxes, but fail to
offer further understanding with spatially integrated air–sea flux estimates
<xref ref-type="bibr" rid="bib1.bibx11" id="paren.7"/>. Conversely, ocean biogeochemical process models are good
tools for mechanistic understanding, but fail to represent the seasonality of
CO<inline-formula><mml:math id="M20" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> fluxes in the Southern Ocean <xref ref-type="bibr" rid="bib1.bibx26 bib1.bibx36" id="paren.8"/>. Empirical
modelling offers an opportunity to bridge the gap between sparse data in the
Southern Ocean and correct parameterization of future earth system models.</p>
      <p id="d1e326">Empirical models maximize the utility of existing surface ocean CO<inline-formula><mml:math id="M21" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula>
observations (<inline-formula><mml:math id="M22" display="inline"><mml:mi>p</mml:mi></mml:math></inline-formula>CO<inline-formula><mml:math id="M23" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula>) by interpolating these with satellite proxy data.
Access to in situ <inline-formula><mml:math id="M24" display="inline"><mml:mi>p</mml:mi></mml:math></inline-formula>CO<inline-formula><mml:math id="M25" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> data, via platforms such as SOCAT (Surface Ocean
CO<inline-formula><mml:math id="M26" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> Atlas), has been crucial to the success of empirical methods
<xref ref-type="bibr" rid="bib1.bibx42 bib1.bibx3" id="paren.9"/>. This, in conjunction with the increasing
use of machine learning, has seen a proliferation in the number and diversity
of methods in the literature. <xref ref-type="bibr" rid="bib1.bibx42" id="text.10"/> compared a suite of
14 methods using a regional framework provided by <xref ref-type="bibr" rid="bib1.bibx11" id="text.11"/>. The
majority of these methods are variants of multiple linear regression (MLR) or
artificial neural networks (ANNs), with regression being applied in regional
windows or clusters based on climatologies of satellite measurable variables.
The authors found that methods agreed in regions where data coverage was
adequate, but for data-sparse regions, such as the Southern Ocean,
the interannual CO<inline-formula><mml:math id="M27" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> variability of various empirical methods was not
coherent.</p>
      <p id="d1e398">Only two of the methods in <xref ref-type="bibr" rid="bib1.bibx42" id="text.12"/> were able to adequately
represent interannual variability of <inline-formula><mml:math id="M28" display="inline"><mml:mrow><mml:mi mathvariant="normal">Δ</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:math></inline-formula>CO<inline-formula><mml:math id="M29" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula>, namely: the SOM-FFN
(self-organizing map–feed-forward neural network) from
<xref ref-type="bibr" rid="bib1.bibx22" id="text.13"/>, and the mixed layer scheme (MLS) from
<xref ref-type="bibr" rid="bib1.bibx41" id="text.14"/>. These two methods were used by
<xref ref-type="bibr" rid="bib1.bibx23" id="text.15"/> to show that Southern Ocean CO<inline-formula><mml:math id="M30" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> uptake
strengthened after 2000. However, these methods often showed large
interannual differences in flux estimates despite agreeing on the overall
decadal trend. This shows that there is lack of coherence even amongst the
methods that perform well, meaning that different methods may lead to
different interpretation of the drivers of <inline-formula><mml:math id="M31" display="inline"><mml:mrow><mml:mi mathvariant="normal">Δ</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:math></inline-formula>CO<inline-formula><mml:math id="M32" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula>. The primary
reason for the varied results is thought to be the way in which the
algorithms deal with sparse data in the Southern Ocean <xref ref-type="bibr" rid="bib1.bibx42" id="paren.16"/>.
This alludes to the importance of testing multiple approaches, as different
methods may be able to better represent the CO<inline-formula><mml:math id="M33" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> estimates in the data-sparse Southern Ocean.</p>
      <p id="d1e474">In this study we introduce two methods new to this application, namely:
support vector regression (SVR) and random forest regression (RFR). SVR is a
method based on the theory of statistical learning, making the method robust
to over-fitting by statistically determining the complexity of a problem
rather than a heuristic approach as required in setting up an ANNs hidden
layer structure <xref ref-type="bibr" rid="bib1.bibx50 bib1.bibx45" id="paren.17"/>. In a review on the use of
support vector machines (the broad category for regression and classification
variants) in remote sensing, <xref ref-type="bibr" rid="bib1.bibx38" id="text.18"/> found that the method
had the “ability to generalize well even with limited training samples”.
This makes SVR an appealing consideration for the sparsely sampled Southern
Ocean. RFR uses an ensemble of decision trees to create robust estimates,
often without requiring data pre-processing, making it an effective “off the
shelf” method <xref ref-type="bibr" rid="bib1.bibx30" id="paren.19"/>. As with SVM, random forests (both
classification and regression variants) have also been used in remote sensing
applications, though it does not seem to be as widely used in earth system
sciences despite proving to be a powerful, yet easy to implement, learning
algorithm <xref ref-type="bibr" rid="bib1.bibx8 bib1.bibx16" id="paren.20"/>. We use SVR and RFR to estimate
CO<inline-formula><mml:math id="M34" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> fluxes in the Southern Ocean to try to better resolve the seasonal
cycle from 1998 to 2014. These methods are trained with SOCAT v3 data
collocated with satellite proxies. We compare these results with those of
<xref ref-type="bibr" rid="bib1.bibx22" id="text.21"/>. However, the lack of data in the Southern Ocean,
particularly in winter, makes it difficult to understand the limitations of
these methods within the context of SOCAT data.</p>
      <p id="d1e502">To gain a better understanding of these methods' strengths and weaknesses we
implement SVR and RFR in a synthetic data environment. A similar approach was
taken by <xref ref-type="bibr" rid="bib1.bibx12" id="text.22"/> in the North Atlantic, which experienced a
similar data paucity to the Southern Ocean in the early 2000s. This idealized
environment was also used to estimate the effect of including or
excluding certain proxy
variables as well as the optimal coverage of cruise tracks to constrain the
North Atlantic <inline-formula><mml:math id="M35" display="inline"><mml:mrow><mml:mi mathvariant="normal">Δ</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:math></inline-formula>CO<inline-formula><mml:math id="M36" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> adequately. Similarly, we assess the efficacy
of including coordinate variables as proxies of <inline-formula><mml:math id="M37" display="inline"><mml:mrow><mml:mi mathvariant="normal">Δ</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:math></inline-formula>CO<inline-formula><mml:math id="M38" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> in the
empirical methods. In the intercomparison study by <xref ref-type="bibr" rid="bib1.bibx42" id="text.23"/>
proxies typically include, but are not limited to, sea surface temperature
(SST), chlorophyll <inline-formula><mml:math id="M39" display="inline"><mml:mi>a</mml:mi></mml:math></inline-formula> (Chl <inline-formula><mml:math id="M40" display="inline"><mml:mi>a</mml:mi></mml:math></inline-formula>), mixed layer depth (MLD) and sea surface
salinity (SSS); however, several methods in the study also include latitude
and longitude. While coordinates do not mechanistically impact <inline-formula><mml:math id="M41" display="inline"><mml:mrow><mml:mi mathvariant="normal">Δ</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:math></inline-formula>CO<inline-formula><mml:math id="M42" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula>, they do help to constrain estimates where the available remote
sensing proxies cannot adequately do so. The synthetic data are also used to
test the ability of the SVR and RFR to approximate <inline-formula><mml:math id="M43" display="inline"><mml:mrow><mml:mi mathvariant="normal">Δ</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:math></inline-formula>CO<inline-formula><mml:math id="M44" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> in the
seasonally sparse Southern Ocean.</p>
</sec>
<sec id="Ch1.S2">
  <title>Data and methods</title>
      <p id="d1e609">This study is presented in two parts. The first applies SVR and RFR to the
SOCAT v3 dataset and compares these outputs with those of the SOM-FFN by
<xref ref-type="bibr" rid="bib1.bibx22" id="text.24"/>. These estimates will be referred to as the
observational estimates. Here the domain is limited to the three Southern
Ocean domains of <xref ref-type="bibr" rid="bib1.bibx11" id="text.25"/> that are shown in
Fig. <xref ref-type="fig" rid="Ch1.F1"/>. These biomes are used to assess the
performance of each of the methods, as done in <xref ref-type="bibr" rid="bib1.bibx42" id="text.26"/>.
<xref ref-type="bibr" rid="bib1.bibx11" id="text.27"/> use a different nomenclature, which roughly corresponds to
frontal zones. We rename the sub-tropical seasonally stratified biome (STSS)
as the Sub-Antarctic Zone (SAZ); the sub-polar seasonally stratified biome
(SPSS) becomes the Polar Frontal Zone (PFZ) and the ice biome (ICE) is the
Antarctic Zone (AZ) <xref ref-type="bibr" rid="bib1.bibx36" id="paren.28"/>.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F1"><caption><p id="d1e632">The three Southern Ocean biomes as defined by <xref ref-type="bibr" rid="bib1.bibx11" id="text.29"/>.
The common names for the biomes are shown in the key, with the abbreviations shown in the round brackets.
The abbreviation in the square brackets show the abbreviations as given by <xref ref-type="bibr" rid="bib1.bibx11" id="text.30"/>.</p></caption>
        <?xmltex \igopts{width=213.395669pt}?><graphic xlink:href="https://bg.copernicus.org/articles/14/5551/2017/bg-14-5551-2017-f01.pdf"/>

      </fig>

      <p id="d1e647">The second part aims to better understand the limitations of these methods
with the given dataset by implementing the methods with ocean biogeochemical
model output. The domain of this synthetic data experiment is defined by the
three southern biomes of <xref ref-type="bibr" rid="bib1.bibx11" id="text.31"/>. These are defined by observed
oceanographic and biological parameters, but are used for the sake of
consistency despite potential differences between observations and the model.</p>
<sec id="Ch1.S2.SS1">
  <title>Gridded data</title>

<?xmltex \floatpos{t}?><table-wrap id="Ch1.T1" specific-use="star"><caption><p id="d1e661">Information on data products used in this study. The temporal and
spatial resolutions are for the raw data (before gridding). Dashes show that
times are either not applicable or that the dataset is continually updated.
Note that the start and end year show full years only. Links to download the
data are given in the additional materials. The asterisk (*) indicates that
variables are the output of a data assimilative model.</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="7">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="left"/>
     <oasis:colspec colnum="3" colname="col3" align="left"/>
     <oasis:colspec colnum="4" colname="col4" align="left" colsep="1"/>
     <oasis:colspec colnum="5" colname="col5" align="left"/>
     <oasis:colspec colnum="6" colname="col6" align="left"/>
     <oasis:colspec colnum="7" colname="col7" align="left"/>
     <oasis:thead>
       <oasis:row>

         <oasis:entry colname="col1">Group or product</oasis:entry>

         <oasis:entry colname="col2">Variables</oasis:entry>

         <oasis:entry rowsep="1" namest="col3" nameend="col4" align="center" colsep="1">Date range </oasis:entry>

         <oasis:entry rowsep="1" namest="col5" nameend="col6" align="center">Resolution </oasis:entry>

         <oasis:entry colname="col7">Reference</oasis:entry>

       </oasis:row>
       <oasis:row rowsep="1">

         <oasis:entry colname="col1"/>

         <oasis:entry colname="col2"/>

         <oasis:entry colname="col3">Start</oasis:entry>

         <oasis:entry colname="col4">End</oasis:entry>

         <oasis:entry colname="col5">Time</oasis:entry>

         <oasis:entry colname="col6">Space</oasis:entry>

         <oasis:entry colname="col7"/>

       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row>

         <oasis:entry colname="col1">SOCAT v3</oasis:entry>

         <oasis:entry colname="col2"><inline-formula><mml:math id="M45" display="inline"><mml:mi>f</mml:mi></mml:math></inline-formula>CO<inline-formula><mml:math id="M46" display="inline"><mml:mrow><mml:msubsup><mml:mi/><mml:mn mathvariant="normal">2</mml:mn><mml:mi mathvariant="normal">sea</mml:mi></mml:msubsup></mml:mrow></mml:math></inline-formula></oasis:entry>

         <oasis:entry colname="col3">1970</oasis:entry>

         <oasis:entry colname="col4">2014</oasis:entry>

         <oasis:entry colname="col5">1 month</oasis:entry>

         <oasis:entry colname="col6">1<inline-formula><mml:math id="M47" display="inline"><mml:msup><mml:mi/><mml:mo>∘</mml:mo></mml:msup></mml:math></inline-formula></oasis:entry>

         <oasis:entry colname="col7">
                    <xref ref-type="bibr" rid="bib1.bibx3" id="text.32"/>
                  </oasis:entry>

       <?xmltex \interline{[2.845276pt]}?></oasis:row>
       <oasis:row>

         <oasis:entry colname="col1">CDIAC</oasis:entry>

         <oasis:entry colname="col2"><inline-formula><mml:math id="M48" display="inline"><mml:mi>x</mml:mi></mml:math></inline-formula>CO<inline-formula><mml:math id="M49" display="inline"><mml:mrow><mml:msubsup><mml:mi/><mml:mn mathvariant="normal">2</mml:mn><mml:mi mathvariant="normal">atm</mml:mi></mml:msubsup></mml:mrow></mml:math></inline-formula></oasis:entry>

         <oasis:entry colname="col3">1970</oasis:entry>

         <oasis:entry colname="col4">2014</oasis:entry>

         <oasis:entry colname="col5">–</oasis:entry>

         <oasis:entry colname="col6">–</oasis:entry>

         <oasis:entry colname="col7">
                    <xref ref-type="bibr" rid="bib1.bibx32" id="text.33"/>
                  </oasis:entry>

       <?xmltex \interline{[2.845276pt]}?></oasis:row>
       <oasis:row>

         <oasis:entry colname="col1">GlobColour</oasis:entry>

         <oasis:entry colname="col2">Chlorophyll</oasis:entry>

         <oasis:entry colname="col3">1998</oasis:entry>

         <oasis:entry colname="col4">–</oasis:entry>

         <oasis:entry colname="col5">1 day</oasis:entry>

         <oasis:entry colname="col6">0.25<inline-formula><mml:math id="M50" display="inline"><mml:msup><mml:mi/><mml:mo>∘</mml:mo></mml:msup></mml:math></inline-formula></oasis:entry>

         <oasis:entry colname="col7">
                    <xref ref-type="bibr" rid="bib1.bibx31" id="text.34"/>
                  </oasis:entry>

       <?xmltex \interline{[2.845276pt]}?></oasis:row>
       <oasis:row>

         <oasis:entry colname="col1">GHRSST</oasis:entry>

         <oasis:entry colname="col2">Sea surface temperature</oasis:entry>

         <oasis:entry colname="col3">1981</oasis:entry>

         <oasis:entry colname="col4">–</oasis:entry>

         <oasis:entry colname="col5">1 day</oasis:entry>

         <oasis:entry colname="col6">0.25<inline-formula><mml:math id="M51" display="inline"><mml:msup><mml:mi/><mml:mo>∘</mml:mo></mml:msup></mml:math></inline-formula></oasis:entry>

         <oasis:entry colname="col7">
                    <xref ref-type="bibr" rid="bib1.bibx40" id="text.35"/>
                  </oasis:entry>

       <?xmltex \interline{[2.845276pt]}?></oasis:row>
       <oasis:row>

         <oasis:entry colname="col1" morerows="1">ECCO2 (cube92)</oasis:entry>

         <oasis:entry colname="col2">Mixed layer depth*</oasis:entry>

         <oasis:entry colname="col3">1992</oasis:entry>

         <oasis:entry colname="col4">2015</oasis:entry>

         <oasis:entry colname="col5">1 day</oasis:entry>

         <oasis:entry colname="col6">0.25<inline-formula><mml:math id="M52" display="inline"><mml:msup><mml:mi/><mml:mo>∘</mml:mo></mml:msup></mml:math></inline-formula></oasis:entry>

         <oasis:entry colname="col7">
                    <xref ref-type="bibr" rid="bib1.bibx33" id="text.36"/>
                  </oasis:entry>

       <?xmltex \interline{[2.845276pt]}?></oasis:row>
       <oasis:row>

         <oasis:entry colname="col2">Salinity*</oasis:entry>

         <oasis:entry colname="col3"/>

         <oasis:entry colname="col4"/>

         <oasis:entry colname="col5"/>

         <oasis:entry colname="col6"/>

         <oasis:entry colname="col7"/>

       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table></table-wrap>

      <p id="d1e954">The data sources are shown in Table <xref ref-type="table" rid="Ch1.T1"/>. These
gridded data refer primarily to remotely sensed data, with the exception of
MLD and SSS. The latter variables are output from ECCO<inline-formula><mml:math id="M53" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula>, an assimilative
model. The temporal range of the data (1998 through 2014) is limited by the
availability of GlobColour (Chl <inline-formula><mml:math id="M54" display="inline"><mml:mi>a</mml:mi></mml:math></inline-formula> starting in 1998) and SOCAT v3 (<inline-formula><mml:math id="M55" display="inline"><mml:mi>f</mml:mi></mml:math></inline-formula>CO<inline-formula><mml:math id="M56" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula>
ending in 2014).</p>
      <p id="d1e991">All data are gridded to 1 month <inline-formula><mml:math id="M57" display="inline"><mml:mo>×</mml:mo></mml:math></inline-formula> 1<inline-formula><mml:math id="M58" display="inline"><mml:msup><mml:mi/><mml:mo>∘</mml:mo></mml:msup></mml:math></inline-formula> using <italic>iris</italic> and
<italic>xarray</italic> packages in Python <xref ref-type="bibr" rid="bib1.bibx17 bib1.bibx34" id="paren.37"/>. Gridded <inline-formula><mml:math id="M59" display="inline"><mml:mi>p</mml:mi></mml:math></inline-formula>CO<inline-formula><mml:math id="M60" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula>
(SOCAT v3) is used to train the algorithms <xref ref-type="bibr" rid="bib1.bibx3" id="paren.38"/>. Surface
station measurements (flask and tower) of atmospheric <inline-formula><mml:math id="M61" display="inline"><mml:mi>x</mml:mi></mml:math></inline-formula>CO<inline-formula><mml:math id="M62" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> are
interpolated to a regular grid using support vector regression
<xref ref-type="bibr" rid="bib1.bibx32" id="paren.39"/>. Mean sea level pressure (NCEP2) is used in the
conversion from <inline-formula><mml:math id="M63" display="inline"><mml:mi>x</mml:mi></mml:math></inline-formula>CO<inline-formula><mml:math id="M64" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> to <inline-formula><mml:math id="M65" display="inline"><mml:mi>p</mml:mi></mml:math></inline-formula>CO<inline-formula><mml:math id="M66" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> <xref ref-type="bibr" rid="bib1.bibx20" id="paren.40"/>.</p>
      <p id="d1e1094">Cloud coverage and low light at high latitudes during winter result in
missing Chl <inline-formula><mml:math id="M67" display="inline"><mml:mi>a</mml:mi></mml:math></inline-formula> data. Cloud gaps are filled with the climatology of Chl <inline-formula><mml:math id="M68" display="inline"><mml:mi>a</mml:mi></mml:math></inline-formula>
(from 1998 to 2014) and missing low-light data are filled with a value
of 0.1 <inline-formula><mml:math id="M69" display="inline"><mml:mo>±</mml:mo></mml:math></inline-formula> 0.03 mg m<inline-formula><mml:math id="M70" display="inline"><mml:msup><mml:mi/><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">3</mml:mn></mml:mrow></mml:msup></mml:math></inline-formula> (uniformly distributed random noise).</p>
</sec>
<sec id="Ch1.S2.SS2">
  <title>Model data</title>
      <p id="d1e1137">The output from a regional NEMO–PISCES configuration (BIOPERIANT05-GAA95b) is
used as the synthetic dataset. The configuration is an updated version of
PERIANT05 used by <xref ref-type="bibr" rid="bib1.bibx10" id="text.41"/>, where BIOPERIANT05-GAA95b includes
biogeochemistry with PISCES-v2. The model has a peri-Antarctic domain with an
open northern boundary at 30<inline-formula><mml:math id="M71" display="inline"><mml:msup><mml:mi/><mml:mo>∘</mml:mo></mml:msup></mml:math></inline-formula> S. The horizontal resolution of the
configuration is 0.5<inline-formula><mml:math id="M72" display="inline"><mml:msup><mml:mi/><mml:mo>∘</mml:mo></mml:msup></mml:math></inline-formula> cos(latitude) with 46 vertical levels. The
northern boundary is forced by a global 0.5<inline-formula><mml:math id="M73" display="inline"><mml:msup><mml:mi/><mml:mo>∘</mml:mo></mml:msup></mml:math></inline-formula> model, ORCA05 as
presented in <xref ref-type="bibr" rid="bib1.bibx4" id="text.42"/>. Output is saved as 5-day averages. The
simulation was run from 1998 to 2009. The synthetic observations are sampled
at the model resolution (5 days <inline-formula><mml:math id="M74" display="inline"><mml:mo>×</mml:mo></mml:math></inline-formula> 0.5<inline-formula><mml:math id="M75" display="inline"><mml:msup><mml:mi/><mml:mo>∘</mml:mo></mml:msup></mml:math></inline-formula>) to resemble the SOCAT
dataset. Hereafter all data are resampled to 1.0<inline-formula><mml:math id="M76" display="inline"><mml:msup><mml:mi/><mml:mo>∘</mml:mo></mml:msup></mml:math></inline-formula> spatial resolution
and monthly temporal resolution data to match observations. Finally, for the
simulation experiment we define the Southern Ocean using the three
southernmost biomes defined in <xref ref-type="bibr" rid="bib1.bibx11" id="text.43"/>, as was done for the observational
estimates.</p>
</sec>
<sec id="Ch1.S2.SS3">
  <title>Data transformation and derived variables</title>
      <p id="d1e1208">Both gridded data and synthetic input data are transformed in preparation for
the empirical algorithms. The log<inline-formula><mml:math id="M77" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">10</mml:mn></mml:msub></mml:math></inline-formula> transformations of MLD and filled
chlorophyll (Chl <inline-formula><mml:math id="M78" display="inline"><mml:mrow><mml:msub><mml:mi>a</mml:mi><mml:mi mathvariant="normal">clim</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>) are taken to return a distribution that
closer represents a normal distribution.</p>
      <p id="d1e1231">Several of the studies in <xref ref-type="bibr" rid="bib1.bibx42" id="text.44"/> included latitude, longitude
and/or time as proxies of <inline-formula><mml:math id="M79" display="inline"><mml:mrow><mml:mi mathvariant="normal">Δ</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:math></inline-formula>CO<inline-formula><mml:math id="M80" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula>. It is important to note that
coordinates do not drive mechanistic changes in <inline-formula><mml:math id="M81" display="inline"><mml:mrow><mml:mi mathvariant="normal">Δ</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:math></inline-formula>CO<inline-formula><mml:math id="M82" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula>. Rather, the
inclusion of coordinates in the empirical methods account for unknown or
regionally varying proxies that cannot be measured remotely. Many methods in
the intercomparison by <xref ref-type="bibr" rid="bib1.bibx42" id="text.45"/> did not include coordinates, but
account for unaccountable spatial variability by clustering or subsetting
data regionally. In this study, we use a single large domain with no
clustering or regional subsets. Two scenarios for each method in the
simulation experiment are run: no coordinate variables, and including
coordinate variables (time, latitude and longitude).</p>
      <p id="d1e1279">The coordinates are transformed to preserve the continuity of the data, as is
shown below. Seasonality of the data is preserved by transforming the day of
the year (<inline-formula><mml:math id="M83" display="inline"><mml:mi>j</mml:mi></mml:math></inline-formula>) and is included in both SVR and RFR analyses:
            <disp-formula id="Ch1.E1" content-type="numbered"><mml:math id="M84" display="block"><mml:mrow><mml:mi>t</mml:mi><mml:mo>=</mml:mo><mml:mfenced open="(" close=")"><mml:mtable class="array" columnalign="center"><mml:mtr><mml:mtd><mml:mrow><mml:mi>cos⁡</mml:mi><mml:mo>(</mml:mo><mml:mi>j</mml:mi><mml:mo>⋅</mml:mo><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mrow><mml:mn mathvariant="normal">2</mml:mn><mml:mi mathvariant="italic">π</mml:mi></mml:mrow><mml:mn mathvariant="normal">365</mml:mn></mml:mfrac></mml:mstyle><mml:mo>)</mml:mo></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mrow><mml:mi>sin⁡</mml:mi><mml:mo>(</mml:mo><mml:mi>j</mml:mi><mml:mo>⋅</mml:mo><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mrow><mml:mn mathvariant="normal">2</mml:mn><mml:mi mathvariant="italic">π</mml:mi></mml:mrow><mml:mn mathvariant="normal">365</mml:mn></mml:mfrac></mml:mstyle><mml:mo>)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mfenced><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula>
          Transformed coordinate vectors were passed to both SVR and RFR using <inline-formula><mml:math id="M85" display="inline"><mml:mi>n</mml:mi></mml:math></inline-formula>-vector
transformations of latitude (<inline-formula><mml:math id="M86" display="inline"><mml:mi mathvariant="italic">λ</mml:mi></mml:math></inline-formula>) and longitude (<inline-formula><mml:math id="M87" display="inline"><mml:mi mathvariant="italic">μ</mml:mi></mml:math></inline-formula>)
<xref ref-type="bibr" rid="bib1.bibx14 bib1.bibx44" id="paren.46"/>, with <inline-formula><mml:math id="M88" display="inline"><mml:mi>n</mml:mi></mml:math></inline-formula> containing the following:
            <disp-formula id="Ch1.E2" content-type="numbered"><mml:math id="M89" display="block"><mml:mrow><mml:mi>A</mml:mi><mml:mo>,</mml:mo><mml:mi>B</mml:mi><mml:mo>,</mml:mo><mml:mi>C</mml:mi><mml:mo>=</mml:mo><mml:mfenced open="(" close=")"><mml:mtable class="array" columnalign="center"><mml:mtr><mml:mtd><mml:mrow><mml:mi>sin⁡</mml:mi><mml:mo>(</mml:mo><mml:mi mathvariant="italic">λ</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mrow><mml:mi>sin⁡</mml:mi><mml:mo>(</mml:mo><mml:mi mathvariant="italic">μ</mml:mi><mml:mo>)</mml:mo><mml:mo>⋅</mml:mo><mml:mi>cos⁡</mml:mi><mml:mo>(</mml:mo><mml:mi mathvariant="italic">λ</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mrow><mml:mo>-</mml:mo><mml:mi>cos⁡</mml:mi><mml:mo>(</mml:mo><mml:mi mathvariant="italic">μ</mml:mi><mml:mo>)</mml:mo><mml:mo>⋅</mml:mo><mml:mi>cos⁡</mml:mi><mml:mo>(</mml:mo><mml:mi mathvariant="italic">λ</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mfenced><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula></p>
      <p id="d1e1455">Co-located <inline-formula><mml:math id="M90" display="inline"><mml:mi>f</mml:mi></mml:math></inline-formula>CO<inline-formula><mml:math id="M91" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> (<inline-formula><mml:math id="M92" display="inline"><mml:mi>y</mml:mi></mml:math></inline-formula>) and proxy data (<inline-formula><mml:math id="M93" display="inline"><mml:mi>X</mml:mi></mml:math></inline-formula>) are used to create training
arrays (<inline-formula><mml:math id="M94" display="inline"><mml:mi>x</mml:mi></mml:math></inline-formula>). The final input for SVR and RFR are (with 12 columns)
log<inline-formula><mml:math id="M95" display="inline"><mml:mrow><mml:msub><mml:mi/><mml:mn mathvariant="normal">1</mml:mn></mml:msub><mml:mn mathvariant="normal">0</mml:mn></mml:mrow></mml:math></inline-formula>(Chl <inline-formula><mml:math id="M96" display="inline"><mml:mrow><mml:msub><mml:mi>a</mml:mi><mml:mi mathvariant="normal">clim</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>), SST, <inline-formula><mml:math id="M97" display="inline"><mml:mi>f</mml:mi></mml:math></inline-formula>CO<inline-formula><mml:math id="M98" display="inline"><mml:msub><mml:mi/><mml:mrow><mml:mn mathvariant="normal">2</mml:mn><mml:mo>(</mml:mo><mml:mi mathvariant="normal">atm</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:msub></mml:math></inline-formula>, ADT,
log<inline-formula><mml:math id="M99" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">10</mml:mn></mml:msub></mml:math></inline-formula>(MLD), ICE, SSS, cos(<inline-formula><mml:math id="M100" display="inline"><mml:mi>j</mml:mi></mml:math></inline-formula>), sin(<inline-formula><mml:math id="M101" display="inline"><mml:mi>j</mml:mi></mml:math></inline-formula>) and <inline-formula><mml:math id="M102" display="inline"><mml:mi>n</mml:mi></mml:math></inline-formula> vectors [<inline-formula><mml:math id="M103" display="inline"><mml:mrow><mml:mi>A</mml:mi><mml:mo>,</mml:mo><mml:mi>B</mml:mi><mml:mo>,</mml:mo><mml:mi>C</mml:mi></mml:mrow></mml:math></inline-formula>].
SVR requires each column of the proxies to be <inline-formula><mml:math id="M104" display="inline"><mml:mi>z</mml:mi></mml:math></inline-formula>-scored, i.e. normalized
to the mean (<inline-formula><mml:math id="M105" display="inline"><mml:mi mathvariant="italic">μ</mml:mi></mml:math></inline-formula>) and standard deviation (<inline-formula><mml:math id="M106" display="inline"><mml:mi mathvariant="italic">σ</mml:mi></mml:math></inline-formula>) of each column
<inline-formula><mml:math id="M107" display="inline"><mml:mrow><mml:mo>(</mml:mo><mml:mstyle displaystyle="false"><mml:mfrac style="text"><mml:mrow><mml:mi>x</mml:mi><mml:mo>-</mml:mo><mml:mi mathvariant="italic">μ</mml:mi></mml:mrow><mml:mi mathvariant="italic">σ</mml:mi></mml:mfrac></mml:mstyle><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>.</p>
</sec>
<sec id="Ch1.S2.SS4">
  <title>Empirical methods and implementation</title>
      <p id="d1e1638">Data are split randomly into a training
and independent test dataset with a ratio of <inline-formula><mml:math id="M108" display="inline"><mml:mrow><mml:mn mathvariant="normal">0.7</mml:mn><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mo>:</mml:mo><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mn mathvariant="normal">0.3</mml:mn></mml:mrow></mml:math></inline-formula>. The independent
dataset is used to give a test error of the trained algorithm. The
statistical learning package, scikit-learn, in Python is used for all
regression and cross-validation methods <xref ref-type="bibr" rid="bib1.bibx39" id="paren.47"/>. The details on
each cross-validation method are outlined in the subsections below.</p>
<sec id="Ch1.S2.SS4.SSS1">
  <title>Support vector regression</title>
      <p id="d1e1663">The basic formulation of SVR is similar to that of linear regression as described by <xref ref-type="bibr" rid="bib1.bibx45" id="text.48"/>:

                  <disp-formula id="Ch1.E3" content-type="numbered"><mml:math id="M109" display="block"><mml:mrow><mml:mstyle displaystyle="true" class="stylechange"/><mml:mi>f</mml:mi><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>)</mml:mo><mml:mo>=</mml:mo><mml:mo>〈</mml:mo><mml:mi>w</mml:mi><mml:mo>,</mml:mo><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mi mathvariant="bold-italic">x</mml:mi><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mo>〉</mml:mo><mml:mo>+</mml:mo><mml:mi>b</mml:mi><mml:mspace width="1em" linebreak="nobreak"/><mml:mtext> with </mml:mtext><mml:mspace linebreak="nobreak" width="0.33em"/><mml:mi>b</mml:mi><mml:mo>∈</mml:mo><mml:mi mathvariant="double-struck">R</mml:mi><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>

            where <inline-formula><mml:math id="M110" display="inline"><mml:mi>b</mml:mi></mml:math></inline-formula> is an intercept and <inline-formula><mml:math id="M111" display="inline"><mml:mrow><mml:mo>〈</mml:mo><mml:mo>⋅</mml:mo><mml:mo>,</mml:mo><mml:mo>⋅</mml:mo><mml:mo>〉</mml:mo></mml:mrow></mml:math></inline-formula> denotes the dot
product of the weights (<inline-formula><mml:math id="M112" display="inline"><mml:mi>w</mml:mi></mml:math></inline-formula>) and <inline-formula><mml:math id="M113" display="inline"><mml:mi mathvariant="bold-italic">x</mml:mi></mml:math></inline-formula>, the training data. The weights
and intercept are found by solving the cost function:

                  <disp-formula id="Ch1.E4" content-type="numbered"><mml:math id="M114" display="block"><mml:mrow><mml:mstyle class="stylechange" displaystyle="true"/><mml:mtext>minimize</mml:mtext><mml:mspace width="1em" linebreak="nobreak"/><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mn mathvariant="normal">1</mml:mn><mml:mn mathvariant="normal">2</mml:mn></mml:mfrac></mml:mstyle><mml:mo>|</mml:mo><mml:mo>|</mml:mo><mml:mi>w</mml:mi><mml:mo>|</mml:mo><mml:msup><mml:mo>|</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:msup><mml:mtext>  subject to</mml:mtext><mml:mfenced open="{" close=""><mml:mtable columnspacing="1em" class="cases" rowspacing="0.2ex" columnalign="left left" framespacing="0em"><mml:mtr><mml:mtd><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>-</mml:mo><mml:mo>〈</mml:mo><mml:mi>w</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>〉</mml:mo><mml:mo>-</mml:mo><mml:mi>b</mml:mi></mml:mrow></mml:mtd><mml:mtd><mml:mrow><mml:mo>≤</mml:mo><mml:mi mathvariant="italic">ϵ</mml:mi></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mrow><mml:mo>〈</mml:mo><mml:mi>w</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>〉</mml:mo><mml:mo>+</mml:mo><mml:mi>b</mml:mi><mml:mo>-</mml:mo><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:mtd><mml:mtd><mml:mrow><mml:mo>≤</mml:mo><mml:mi mathvariant="italic">ϵ</mml:mi></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mfenced><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula></p>
      <p id="d1e1854">In this form, <inline-formula><mml:math id="M115" display="inline"><mml:mi>w</mml:mi></mml:math></inline-formula> is minimized according to the target values (<inline-formula><mml:math id="M116" display="inline"><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>) to a
precision of <inline-formula><mml:math id="M117" display="inline"><mml:mi mathvariant="italic">ϵ</mml:mi></mml:math></inline-formula> – i.e. there is no room for error greater than
<inline-formula><mml:math id="M118" display="inline"><mml:mi mathvariant="italic">ϵ</mml:mi></mml:math></inline-formula>. However, with the majority of problems, meeting these constraints
is not possible if data are noisy or <inline-formula><mml:math id="M119" display="inline"><mml:mi mathvariant="italic">ϵ</mml:mi></mml:math></inline-formula> is set to be small. The inclusion
of <italic>slack variables</italic> (<inline-formula><mml:math id="M120" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">ξ</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msubsup><mml:mi mathvariant="italic">ξ</mml:mi><mml:mi>i</mml:mi><mml:mo>*</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula>) relaxes the constraints and
the problem is now formulated as follows:

                  <disp-formula specific-use="align" content-type="numbered"><mml:math id="M121" display="block"><mml:mtable displaystyle="true"><mml:mtr><mml:mtd><mml:mrow><mml:mstyle class="stylechange" displaystyle="true"/><mml:mtext>minimize</mml:mtext><mml:mspace width="1em" linebreak="nobreak"/></mml:mrow></mml:mtd><mml:mtd><mml:mrow><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mn mathvariant="normal">1</mml:mn><mml:mn mathvariant="normal">2</mml:mn></mml:mfrac></mml:mstyle><mml:mo>|</mml:mo><mml:mo>|</mml:mo><mml:mi>w</mml:mi><mml:mo>|</mml:mo><mml:msup><mml:mo>|</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:msup><mml:mo>+</mml:mo><mml:mi>C</mml:mi><mml:munderover><mml:mo movablelimits="false">∑</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mi>n</mml:mi></mml:munderover><mml:mo>(</mml:mo><mml:msub><mml:mi mathvariant="italic">ξ</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>+</mml:mo><mml:msubsup><mml:mi mathvariant="italic">ξ</mml:mi><mml:mi>i</mml:mi><mml:mo>*</mml:mo></mml:msubsup><mml:mo>)</mml:mo></mml:mrow></mml:mtd></mml:mtr><mml:mlabeledtr id="Ch1.E5"><mml:mtd/><mml:mtd><mml:mstyle class="stylechange" displaystyle="true"/></mml:mtd><mml:mtd><mml:mrow><mml:mstyle displaystyle="true" class="stylechange"/><mml:mtext>subject to</mml:mtext><mml:mfenced open="{" close=""><mml:mtable rowspacing="0.2ex" class="cases" columnspacing="1em" columnalign="left left" framespacing="0em"><mml:mtr><mml:mtd><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>-</mml:mo><mml:mo>〈</mml:mo><mml:mi>w</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>〉</mml:mo><mml:mo>-</mml:mo><mml:mi>b</mml:mi></mml:mrow></mml:mtd><mml:mtd><mml:mrow><mml:mo>≤</mml:mo><mml:mi mathvariant="italic">ϵ</mml:mi><mml:mo>+</mml:mo><mml:msub><mml:mi mathvariant="italic">ξ</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mrow><mml:mo>〈</mml:mo><mml:mi>w</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>〉</mml:mo><mml:mo>+</mml:mo><mml:mi>b</mml:mi><mml:mo>-</mml:mo><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:mtd><mml:mtd><mml:mrow><mml:mo>≤</mml:mo><mml:mi mathvariant="italic">ϵ</mml:mi><mml:mo>+</mml:mo><mml:msubsup><mml:mi mathvariant="italic">ξ</mml:mi><mml:mi>i</mml:mi><mml:mo>*</mml:mo></mml:msubsup></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mrow><mml:msub><mml:mi mathvariant="italic">ξ</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msubsup><mml:mi mathvariant="italic">ξ</mml:mi><mml:mi>i</mml:mi><mml:mo>*</mml:mo></mml:msubsup></mml:mrow></mml:mtd><mml:mtd><mml:mrow><mml:mo>≥</mml:mo><mml:mn mathvariant="normal">0</mml:mn></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mfenced><mml:mo>.</mml:mo></mml:mrow></mml:mtd></mml:mlabeledtr></mml:mtable></mml:math></disp-formula>

              Here <inline-formula><mml:math id="M122" display="inline"><mml:mi>C</mml:mi></mml:math></inline-formula> is a parameter that adjusts for the amount of error that the
minimization allows. The slack variable <inline-formula><mml:math id="M123" display="inline"><mml:mrow><mml:mo>|</mml:mo><mml:mi mathvariant="italic">ξ</mml:mi><mml:mo>|</mml:mo></mml:mrow></mml:math></inline-formula> is only counted towards the
cost if the point lies outside the margin (<inline-formula><mml:math id="M124" display="inline"><mml:mrow><mml:mo>|</mml:mo><mml:mi mathvariant="italic">ξ</mml:mi><mml:mo>|</mml:mo><mml:mo>≥</mml:mo><mml:mi mathvariant="italic">ϵ</mml:mi></mml:mrow></mml:math></inline-formula>). The points
on or outside the margins are called support vectors and are used to
construct the hypothesis function, <inline-formula><mml:math id="M125" display="inline"><mml:mrow><mml:mi>h</mml:mi><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>. This is shown in
Fig. <xref ref-type="fig" rid="Ch1.F2"/>a where a linear SVR is fitted to noisy data
produced from a cubic spline. The optimization problem shown in
Eq. (<xref ref-type="disp-formula" rid="Ch1.E5"/>) is solved in its dual formulation (see
<xref ref-type="bibr" rid="bib1.bibx16" id="altparen.49"/>, for the full description). Importantly, solving the dual
formulation allows for efficient kernelization of SVR.</p>
      <p id="d1e2164">Kernelization describes the process that maps the proxy variables (<inline-formula><mml:math id="M126" display="inline"><mml:mi mathvariant="bold-italic">x</mml:mi></mml:math></inline-formula>)
onto a higher dimensional feature space. In this study we used a Gaussian
kernel (or radial basis function – RBF), which allows for potentially
infinite complexity determined by the number of support vectors
<xref ref-type="bibr" rid="bib1.bibx50" id="paren.50"/>. The RBF kernel introduces an additional hyper-parameter
(<inline-formula><mml:math id="M127" display="inline"><mml:mi mathvariant="italic">γ</mml:mi></mml:math></inline-formula>) that defines the width of the Gaussian. Selection of the SVR
hyper-parameters (<inline-formula><mml:math id="M128" display="inline"><mml:mi mathvariant="italic">ϵ</mml:mi></mml:math></inline-formula>, <inline-formula><mml:math id="M129" display="inline"><mml:mi>C</mml:mi></mml:math></inline-formula>, <inline-formula><mml:math id="M130" display="inline"><mml:mi mathvariant="italic">γ</mml:mi></mml:math></inline-formula>) is done using a two-stage
exhaustive grid search approach with cross-validation. We use K-fold cross-validation, where the data are divided into eight equal “folds” (<inline-formula><mml:math id="M131" display="inline"><mml:mrow><mml:mi>k</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">8</mml:mn></mml:mrow></mml:math></inline-formula>).
Seven of the folds are used to train the model, while the remaining fold is
used for validation. This is done iteratively until each excluded fold has
been used to test the results.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F2"><caption><p id="d1e2220">A simple example demonstrating the principle of <bold>(a)</bold> support vector regression and <bold>(b)</bold> random forest regression.
The dashed grey line is the true function <inline-formula><mml:math id="M132" display="inline"><mml:mrow><mml:mi>f</mml:mi><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>)</mml:mo><mml:mo>=</mml:mo><mml:mn mathvariant="normal">0.4</mml:mn><mml:msup><mml:mi>x</mml:mi><mml:mn mathvariant="normal">3</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> with the blue dots
representing a random sample taken from this function <inline-formula><mml:math id="M133" display="inline"><mml:mrow><mml:mi>f</mml:mi><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>)</mml:mo><mml:mo>+</mml:mo><mml:mi mathvariant="italic">σ</mml:mi></mml:mrow></mml:math></inline-formula>, where
<inline-formula><mml:math id="M134" display="inline"><mml:mi mathvariant="italic">σ</mml:mi></mml:math></inline-formula> is normally distributed noise. The black line in each panel,
<inline-formula><mml:math id="M135" display="inline"><mml:mrow><mml:mi>h</mml:mi><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>, shows the estimate of the true function. The orange dots in
<bold>(a)</bold> show the samples from the random subset chosen as support
vectors from which <inline-formula><mml:math id="M136" display="inline"><mml:mrow><mml:mi>h</mml:mi><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> is estimated. The orange lines in <bold>(b)</bold> show
200 decision tree estimates, <inline-formula><mml:math id="M137" display="inline"><mml:mrow><mml:msub><mml:mi>g</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>, which are averaged to create the
ensemble, <inline-formula><mml:math id="M138" display="inline"><mml:mrow><mml:mi>h</mml:mi><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>.</p></caption>
            <?xmltex \igopts{width=227.622047pt}?><graphic xlink:href="https://bg.copernicus.org/articles/14/5551/2017/bg-14-5551-2017-f02.png"/>

          </fig>

</sec>
<sec id="Ch1.S2.SS4.SSS2">
  <title>Random forest regression</title>
      <p id="d1e2355">Decision trees form the basic building block of a random forest (RF), with
the average of <inline-formula><mml:math id="M139" display="inline"><mml:mi>n</mml:mi></mml:math></inline-formula> decision trees taken as the ensemble estimate
<xref ref-type="bibr" rid="bib1.bibx6" id="paren.51"/> (Fig. <xref ref-type="fig" rid="Ch1.F2"/>b). The basic idea of a
decision tree is to iteratively partition data into boxes using simple rules
that minimize the error at each split (referred to as a node) – these boxes
would become hypercubes in higher dimensional problems. This is described by
the basic formulation as described in <xref ref-type="bibr" rid="bib1.bibx29" id="text.52"/>:
<list list-type="order"><list-item>
      <p id="d1e2375"><italic>Start at the root node.</italic></p></list-item><list-item>
      <p id="d1e2380"><italic>For each X, find the set S that minimizes the sum of the node impurities in the two child nodes and choose the split X</italic> <inline-formula><mml:math id="M140" display="inline"><mml:mo>∈</mml:mo></mml:math></inline-formula> <italic>S that gives the minimum overall X and S.</italic></p></list-item><list-item>
      <p id="d1e2395"><italic>If a stopping criterion is reached, exit. Otherwise, apply step 2 to each child node in turn.</italic></p></list-item></list>
Decision trees have high variance due to their discrete nature. Random
forests reduce this high variance by bootstrapping with aggregation (called
<italic>bagging</italic>): a subset of the available training dataset is sampled with
a replacement for each decision tree in the RF. The sampling with replacement
means that each training observation has a <inline-formula><mml:math id="M141" display="inline"><mml:mo>∼</mml:mo></mml:math></inline-formula> 63 % chance of being
chosen at least once for a particular tree <xref ref-type="bibr" rid="bib1.bibx30" id="paren.53"/>. This
subsampling provides estimates that are robust to outliers as these have a
chance of being omitted in training. This means that a random forest
typically performs better when number of decision trees (<inline-formula><mml:math id="M142" display="inline"><mml:mi>t</mml:mi></mml:math></inline-formula>) is large, but
increasing the number of trees has diminishing returns in terms of
performance vs. computation. Additional robustness is given to RFs by
randomizing and/or limiting the number of proxy variables (<inline-formula><mml:math id="M143" display="inline"><mml:mi>m</mml:mi></mml:math></inline-formula>) given to the
nodes in each tree when splitting the data (hence random) <xref ref-type="bibr" rid="bib1.bibx30" id="paren.54"/>.
In this study, the maximum number of proxy variables (<inline-formula><mml:math id="M144" display="inline"><mml:mrow><mml:mi>m</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">11</mml:mn></mml:mrow></mml:math></inline-formula>) was given to
the RFR. The complexity of a RF can be adjusted by limiting the minimum
number of leaves at a terminal branch (<inline-formula><mml:math id="M145" display="inline"><mml:mi>l</mml:mi></mml:math></inline-formula>), where a fully grown tree would
allow <inline-formula><mml:math id="M146" display="inline"><mml:mi>l</mml:mi></mml:math></inline-formula> to be one.</p>
      <p id="d1e2458">A useful feature of bagging is that it intrinsically provides a
cross-validation dataset (a.k.a. out-of-bag samples) that is not part of the
training procedure (for a specific set of trees). The out-of-bag samples are
those that are not selected during bagging. The advantage of this approach
over K-fold cross-validation is that the full dataset can be used in the
training procedure, as opposed to splitting the dataset for cross-validation.
The out-of-bag error is used to cross-validate the model and select the
hyper-parameters (<inline-formula><mml:math id="M147" display="inline"><mml:mi>t</mml:mi></mml:math></inline-formula>, <inline-formula><mml:math id="M148" display="inline"><mml:mi>m</mml:mi></mml:math></inline-formula>, <inline-formula><mml:math id="M149" display="inline"><mml:mi>l</mml:mi></mml:math></inline-formula>) for the RF.</p>
</sec>
</sec>
<sec id="Ch1.S2.SS5">
  <?xmltex \opttitle{CO${}_{2}$ fluxes}?><title>CO<inline-formula><mml:math id="M150" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> fluxes</title>
      <p id="d1e2499">Air–sea CO<inline-formula><mml:math id="M151" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> fluxes are quantified with the following:
            <disp-formula id="Ch1.E6" content-type="numbered"><mml:math id="M152" display="block"><mml:mrow><mml:mi>F</mml:mi><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">CO</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:mrow><mml:mo>=</mml:mo><mml:msub><mml:mi>K</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:msub><mml:mo>⋅</mml:mo><mml:msub><mml:mi>k</mml:mi><mml:mi mathvariant="normal">w</mml:mi></mml:msub><mml:mo>⋅</mml:mo><mml:mi mathvariant="normal">Δ</mml:mi><mml:mi>p</mml:mi><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">CO</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:mrow><mml:mo>⋅</mml:mo><mml:mo>(</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>-</mml:mo><mml:mo>[</mml:mo><mml:mi mathvariant="normal">ice</mml:mi><mml:mo>]</mml:mo><mml:mo>)</mml:mo><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula>
          The gas transfer velocity (<inline-formula><mml:math id="M153" display="inline"><mml:mrow><mml:msub><mml:mi>k</mml:mi><mml:mi mathvariant="normal">w</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>) is calculated using a quadratic
dependency of wind speed with the coefficients of <xref ref-type="bibr" rid="bib1.bibx51" id="text.55"/>.
The <inline-formula><mml:math id="M154" display="inline"><mml:mi mathvariant="bold-italic">u</mml:mi></mml:math></inline-formula> and <inline-formula><mml:math id="M155" display="inline"><mml:mi mathvariant="bold-italic">v</mml:mi></mml:math></inline-formula> vectors of CCMP v2 are used to compute the wind
speed <xref ref-type="bibr" rid="bib1.bibx1" id="paren.56"/>. Coefficients from <xref ref-type="bibr" rid="bib1.bibx52" id="text.57"/> are used for
<inline-formula><mml:math id="M156" display="inline"><mml:mrow><mml:msub><mml:mi>K</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M157" display="inline"><mml:mrow><mml:mi mathvariant="normal">Δ</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:math></inline-formula>CO<inline-formula><mml:math id="M158" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> is estimated by the empirical models. The effect
of sea-ice cover on CO<inline-formula><mml:math id="M159" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> fluxes is treated linearly; the fraction of sea
ice cover ([ice]) is converted to fraction of open water by subtracting 1
as shown in Eq. (<xref ref-type="disp-formula" rid="Ch1.E6"/>).</p>
      <p id="d1e2647">These results are analysed regionally with the three Southern Ocean biomes
defined by <xref ref-type="bibr" rid="bib1.bibx11" id="text.58"/> (Fig. <xref ref-type="fig" rid="Ch1.F1"/>). We compare our
estimates of CO<inline-formula><mml:math id="M160" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> fluxes with those of <xref ref-type="bibr" rid="bib1.bibx22" id="text.59"/>, who used a
two-step neural network method abbreviated to SOM-FFN. Note that the SOM-FFN method was trained using
SOCAT v2 compared to the methods in this study that used SOCAT v3.</p>
</sec>
<sec id="Ch1.S2.SS6">
  <title>Synthetic data experiments</title>
      <p id="d1e2673">Two experiments are run with the synthetic data. The first experiment aims to
identify the efficacy of including or omitting coordinates as proxy variables
on each method's ability to estimate <inline-formula><mml:math id="M161" display="inline"><mml:mrow><mml:mi mathvariant="normal">Δ</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:math></inline-formula>CO<inline-formula><mml:math id="M162" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> using SOCAT v3
locations. This is achieved by implementing the model with the transformed
coordinate variables as proxies and then without. Note that the training
procedure for the models remains the same as for the observational estimates
of <inline-formula><mml:math id="M163" display="inline"><mml:mrow><mml:mi mathvariant="normal">Δ</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:math></inline-formula>CO<inline-formula><mml:math id="M164" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula>.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F3" specific-use="star"><caption><p id="d1e2716">The spatial distribution of sampling locations in the synthetic dataset (BIOPERIANT05).
The top panel <bold>(a)</bold> shows the sampling strategy using SOCAT v3 locations, and the bottom panel <bold>(b)</bold> shows the uniform random sampling distribution used in the second experiment. </p></caption>
          <?xmltex \igopts{width=355.659449pt}?><graphic xlink:href="https://bg.copernicus.org/articles/14/5551/2017/bg-14-5551-2017-f03.png"/>

        </fig>

      <p id="d1e2731">The second experiment assesses the impact that the seasonally sparse SOCAT v3
has on the ability of the methods to estimate <inline-formula><mml:math id="M165" display="inline"><mml:mrow><mml:mi mathvariant="normal">Δ</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:math></inline-formula>CO<inline-formula><mml:math id="M166" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula>. This is done
by comparing the results of <inline-formula><mml:math id="M167" display="inline"><mml:mrow><mml:mi mathvariant="normal">Δ</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:math></inline-formula>CO<inline-formula><mml:math id="M168" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> estimates when trained according
to (1) SOCAT v3 locations trained with synthetic data
(Fig. <xref ref-type="fig" rid="Ch1.F3"/>a) and (2) uniformly random sampling locations
(random in space and time) with a sample size the same as SOCAT v3
(Fig. <xref ref-type="fig" rid="Ch1.F3"/>b). Once again this the training procedure
remains the same (as stated above).</p>
</sec>
</sec>
<sec id="Ch1.S3">
  <title>Results</title>
<sec id="Ch1.S3.SS1">
  <?xmltex \opttitle{Observational CO${}_{2}$ data results}?><title>Observational CO<inline-formula><mml:math id="M169" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> data results</title>

      <?xmltex \floatpos{t}?><fig id="Ch1.F4" specific-use="star"><caption><p id="d1e2801">The RMSE (top row, <bold>a–c</bold>) for each of the three Southern Ocean biomes for RFR (blue), SVR (green) and SOM-FFN (red).
The grey fill in the top row <bold>(a–c)</bold> shows the number of observations for each of the biomes for each year.
The maps in the bottom row <bold>(d–f)</bold> show the spatial distribution of residuals in the Southern Ocean for SVR <bold>(d)</bold>, SOM-FFN <bold>(e)</bold> and RFR (out-of-bag errors) <bold>(f)</bold>.
The thin black lines define the three Southern Ocean biomes as defined by <xref ref-type="bibr" rid="bib1.bibx11" id="text.60"/>.
Note that RFR and SVR are trained and tested with SOCAT v3 while SOM-FFN is trained and tested with SOCAT v2.
</p></caption>
          <?xmltex \igopts{width=384.112205pt}?><graphic xlink:href="https://bg.copernicus.org/articles/14/5551/2017/bg-14-5551-2017-f04.png"/>

        </fig>

      <p id="d1e2832">We use the RMSE as the primary metric of the methods' performance as shown in
Fig. <xref ref-type="fig" rid="Ch1.F4"/>a–c. Note that the RFR RMSE is calculated
from the out-of-bag error (effectively an independent error). SOM-FFN has the
best RMSE score of 14.84 <inline-formula><mml:math id="M170" display="inline"><mml:mi mathvariant="normal">µ</mml:mi></mml:math></inline-formula>atm (using SOCAT v2), which is better
than the RMSE of RFR (16.45 <inline-formula><mml:math id="M171" display="inline"><mml:mi mathvariant="normal">µ</mml:mi></mml:math></inline-formula>atm) and SVR (24.404 <inline-formula><mml:math id="M172" display="inline"><mml:mi mathvariant="normal">µ</mml:mi></mml:math></inline-formula>atm),
which are trained with SOCAT v3. The biases of the different methods are
similar in magnitude for each of the biomes (<inline-formula><mml:math id="M173" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>0.40, <inline-formula><mml:math id="M174" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>0.03 and
<inline-formula><mml:math id="M175" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>0.75 <inline-formula><mml:math id="M176" display="inline"><mml:mi mathvariant="normal">µ</mml:mi></mml:math></inline-formula>atm for the SOM-FFN, RFR and SVR, respectively). The mean
absolute errors (MAEs) for the respective methods are 9.78, 9.85 and
15.27 <inline-formula><mml:math id="M177" display="inline"><mml:mi mathvariant="normal">µ</mml:mi></mml:math></inline-formula>atm.</p>
      <p id="d1e2894">The difference between the MAE and the RMSE highlights the ability of methods
to fit outliers or extreme points, as the RMSE metric penalizes outliers more severely than the MAE. The SOM-FFN approach
has the smallest difference between these two metrics (5.06, 6.60 and
9.13 <inline-formula><mml:math id="M178" display="inline"><mml:mi mathvariant="normal">µ</mml:mi></mml:math></inline-formula>atm for the SOM-FFN, RFR and SVR, respectively). This
superior performance may be due to two factors. Firstly, the SOM-FFN method
may be better at fitting the extreme points (those that are in the outer
percentiles of the distribution). Second, it may allude to the fact that the
SOCAT v2 dataset is less variable. Testing the SVR and RFR implementations
against SOCAT v2 yields similar results, with the exception of in the SAZ,
where both RMSE and MAE improve (results shown in
Table <xref ref-type="table" rid="Ch1.T2"/>).</p>

<?xmltex \floatpos{t}?><table-wrap id="Ch1.T2"><caption><p id="d1e2910">Various performance metrics for empirical estimates of <inline-formula><mml:math id="M179" display="inline"><mml:mrow><mml:mi mathvariant="normal">Δ</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:math></inline-formula>CO<inline-formula><mml:math id="M180" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> in the Sub-Antarctic Zone (SAZ), Polar Frontal Zone (PFZ) and
Antarctic Zone (AZ) (as defined by <xref ref-type="bibr" rid="bib1.bibx11" id="altparen.61"/>). Results tested
according to SOCAT v2 and SOCAT v3 are shown for the SVR and RFR methods. </p></caption><oasis:table frame="topbot"><oasis:tgroup cols="6">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="left"/>
     <oasis:colspec colnum="3" colname="col3" align="right"/>
     <oasis:colspec colnum="4" colname="col4" align="right"/>
     <oasis:colspec colnum="5" colname="col5" align="right"/>
     <oasis:colspec colnum="6" colname="col6" align="right"/>
     <oasis:thead>
       <oasis:row rowsep="1">

         <oasis:entry colname="col1"/>

         <oasis:entry colname="col2">Method</oasis:entry>

         <oasis:entry colname="col3">RMSE</oasis:entry>

         <oasis:entry colname="col4">MAE</oasis:entry>

         <oasis:entry colname="col5"><inline-formula><mml:math id="M181" display="inline"><mml:mrow><mml:msup><mml:mi>r</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula></oasis:entry>

         <oasis:entry colname="col6">Bias</oasis:entry>

       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row>

         <oasis:entry rowsep="1" colname="col1" morerows="4">SAZ</oasis:entry>

         <oasis:entry colname="col2">SVR (v3)</oasis:entry>

         <oasis:entry colname="col3">18.14</oasis:entry>

         <oasis:entry colname="col4">11.28</oasis:entry>

         <oasis:entry colname="col5">0.48</oasis:entry>

         <oasis:entry colname="col6">0.61</oasis:entry>

       </oasis:row>
       <oasis:row>

         <oasis:entry colname="col2">SVR (v2)</oasis:entry>

         <oasis:entry colname="col3">15.99</oasis:entry>

         <oasis:entry colname="col4">10.36</oasis:entry>

         <oasis:entry colname="col5">0.49</oasis:entry>

         <oasis:entry colname="col6">0.20</oasis:entry>

       </oasis:row>
       <oasis:row>

         <oasis:entry colname="col2">RFR (v3)</oasis:entry>

         <oasis:entry colname="col3">13.67</oasis:entry>

         <oasis:entry colname="col4">8.16</oasis:entry>

         <oasis:entry colname="col5">0.70</oasis:entry>

         <oasis:entry colname="col6"><inline-formula><mml:math id="M182" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>0.14</oasis:entry>

       </oasis:row>
       <oasis:row>

         <oasis:entry colname="col2">RFR (v2)</oasis:entry>

         <oasis:entry colname="col3">12.66</oasis:entry>

         <oasis:entry colname="col4">7.65</oasis:entry>

         <oasis:entry colname="col5">0.68</oasis:entry>

         <oasis:entry colname="col6"><inline-formula><mml:math id="M183" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>0.29</oasis:entry>

       </oasis:row>
       <oasis:row rowsep="1">

         <oasis:entry colname="col2">SOM-FFN</oasis:entry>

         <oasis:entry colname="col3">10.07</oasis:entry>

         <oasis:entry colname="col4">7.04</oasis:entry>

         <oasis:entry colname="col5">0.76</oasis:entry>

         <oasis:entry colname="col6"><inline-formula><mml:math id="M184" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>1.15</oasis:entry>

       </oasis:row>
       <oasis:row>

         <oasis:entry rowsep="1" colname="col1" morerows="4">PFZ</oasis:entry>

         <oasis:entry colname="col2">SVR (v3)</oasis:entry>

         <oasis:entry colname="col3">14.45</oasis:entry>

         <oasis:entry colname="col4">10.06</oasis:entry>

         <oasis:entry colname="col5">0.48</oasis:entry>

         <oasis:entry colname="col6">0.31</oasis:entry>

       </oasis:row>
       <oasis:row>

         <oasis:entry colname="col2">SVR (v2)</oasis:entry>

         <oasis:entry colname="col3">14.29</oasis:entry>

         <oasis:entry colname="col4">10.01</oasis:entry>

         <oasis:entry colname="col5">0.44</oasis:entry>

         <oasis:entry colname="col6"><inline-formula><mml:math id="M185" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>0.01</oasis:entry>

       </oasis:row>
       <oasis:row>

         <oasis:entry colname="col2">RFR (v3)</oasis:entry>

         <oasis:entry colname="col3">10.71</oasis:entry>

         <oasis:entry colname="col4">6.7</oasis:entry>

         <oasis:entry colname="col5">0.71</oasis:entry>

         <oasis:entry colname="col6">0.21</oasis:entry>

       </oasis:row>
       <oasis:row>

         <oasis:entry colname="col2">RFR (v2)</oasis:entry>

         <oasis:entry colname="col3">10.56</oasis:entry>

         <oasis:entry colname="col4">6.77</oasis:entry>

         <oasis:entry colname="col5">0.69</oasis:entry>

         <oasis:entry colname="col6"><inline-formula><mml:math id="M186" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>0.34</oasis:entry>

       </oasis:row>
       <oasis:row rowsep="1">

         <oasis:entry colname="col2">SOM-FFN</oasis:entry>

         <oasis:entry colname="col3">11.01</oasis:entry>

         <oasis:entry colname="col4">7.68</oasis:entry>

         <oasis:entry colname="col5">0.6</oasis:entry>

         <oasis:entry colname="col6">0.26</oasis:entry>

       </oasis:row>
       <oasis:row>

         <oasis:entry colname="col1" morerows="4">AZ</oasis:entry>

         <oasis:entry colname="col2">SVR (v3)</oasis:entry>

         <oasis:entry colname="col3">36.14</oasis:entry>

         <oasis:entry colname="col4">25.19</oasis:entry>

         <oasis:entry colname="col5">0.56</oasis:entry>

         <oasis:entry colname="col6"><inline-formula><mml:math id="M187" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>3.22</oasis:entry>

       </oasis:row>
       <oasis:row>

         <oasis:entry colname="col2">SVR (v2)</oasis:entry>

         <oasis:entry colname="col3">35.69</oasis:entry>

         <oasis:entry colname="col4">25.01</oasis:entry>

         <oasis:entry colname="col5">0.59</oasis:entry>

         <oasis:entry colname="col6"><inline-formula><mml:math id="M188" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>2.88</oasis:entry>

       </oasis:row>
       <oasis:row>

         <oasis:entry colname="col2">RFR (v3)</oasis:entry>

         <oasis:entry colname="col3">23.8</oasis:entry>

         <oasis:entry colname="col4">15.81</oasis:entry>

         <oasis:entry colname="col5">0.8</oasis:entry>

         <oasis:entry colname="col6"><inline-formula><mml:math id="M189" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>0.27</oasis:entry>

       </oasis:row>
       <oasis:row>

         <oasis:entry colname="col2">RFR (v2)</oasis:entry>

         <oasis:entry colname="col3">23.49</oasis:entry>

         <oasis:entry colname="col4">15.63</oasis:entry>

         <oasis:entry colname="col5">0.81</oasis:entry>

         <oasis:entry colname="col6"><inline-formula><mml:math id="M190" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>0.62</oasis:entry>

       </oasis:row>
       <oasis:row>

         <oasis:entry colname="col2">SOM-FFN</oasis:entry>

         <oasis:entry colname="col3">21.32</oasis:entry>

         <oasis:entry colname="col4">14.91</oasis:entry>

         <oasis:entry colname="col5">0.82</oasis:entry>

         <oasis:entry colname="col6"><inline-formula><mml:math id="M191" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>0.77</oasis:entry>

       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table></table-wrap>

      <p id="d1e3333">The RMSEs and biases in the PFZ are least variable between methods. While
there is a substantial increase in the number of observations from 2004 there
is no appreciable change in the RMSE. The Antarctic Zone (AZ) is the primary
contributor to these errors with much larger average RMSE values than for the
SAZ and PFZ (36.14, 23.80 and 21.32 <inline-formula><mml:math id="M192" display="inline"><mml:mi mathvariant="normal">µ</mml:mi></mml:math></inline-formula>atm for SVR, RFR and SOM-FFN, respectively). This increase in the RMSE is likely driven by the larger
variability of <inline-formula><mml:math id="M193" display="inline"><mml:mrow><mml:mi mathvariant="normal">Δ</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:math></inline-formula>CO<inline-formula><mml:math id="M194" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> observations in the AZ, where standard
deviations of observations are 25.05, 20.01 and 54.65 <inline-formula><mml:math id="M195" display="inline"><mml:mi mathvariant="normal">µ</mml:mi></mml:math></inline-formula>atm for the
SAZ, PFZ and AZ, respectively. This is reflected in the highest <inline-formula><mml:math id="M196" display="inline"><mml:mrow><mml:msub><mml:mi>r</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> scores
in the AZ for the respective methods (Table <xref ref-type="table" rid="Ch1.T2"/>).</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F5" specific-use="star"><caption><p id="d1e3385">Seasonal averages for <inline-formula><mml:math id="M197" display="inline"><mml:mrow><mml:mi mathvariant="normal">Δ</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:math></inline-formula>CO<inline-formula><mml:math id="M198" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> from 1998 to 2014 for SVR, SOM-FFN and FRF.
The annual mean is shown in the top row <bold>(a, b, c)</bold>, the mean summer
(DJF) <inline-formula><mml:math id="M199" display="inline"><mml:mrow><mml:mi mathvariant="normal">Δ</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:math></inline-formula>CO<inline-formula><mml:math id="M200" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> is shown in the middle row <bold>(d, e, f)</bold> and the
mean winter (JJA) <inline-formula><mml:math id="M201" display="inline"><mml:mrow><mml:mi mathvariant="normal">Δ</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:math></inline-formula>CO<inline-formula><mml:math id="M202" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> is shown in the middle row <bold>(g, h, i)</bold>. The thin black lines denote the SAZ, PFZ and AZ from outside inward.
Note that the <inline-formula><mml:math id="M203" display="inline"><mml:mrow><mml:mi mathvariant="normal">Δ</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:math></inline-formula>CO<inline-formula><mml:math id="M204" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> has been normalized to sea ice cover where
<inline-formula><mml:math id="M205" display="inline"><mml:mrow><mml:mi mathvariant="normal">Δ</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:math></inline-formula>CO<inline-formula><mml:math id="M206" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> is multiplied by <inline-formula><mml:math id="M207" display="inline"><mml:mrow><mml:mo>(</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>-</mml:mo><mml:mo>[</mml:mo><mml:mi mathvariant="normal">ice</mml:mi><mml:mo>]</mml:mo><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>. The red oval in
<bold>(f)</bold> highlights the difference in SOM-FFN estimates of <inline-formula><mml:math id="M208" display="inline"><mml:mrow><mml:mi mathvariant="normal">Δ</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:math></inline-formula>CO<inline-formula><mml:math id="M209" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> during summer in the Atlantic compared to SVR and RFR. </p></caption>
          <?xmltex \igopts{width=355.659449pt}?><graphic xlink:href="https://bg.copernicus.org/articles/14/5551/2017/bg-14-5551-2017-f05.png"/>

        </fig>

      <p id="d1e3542">The annual and seasonal averages (winter <inline-formula><mml:math id="M210" display="inline"><mml:mo>=</mml:mo></mml:math></inline-formula> JJA, summer <inline-formula><mml:math id="M211" display="inline"><mml:mo>=</mml:mo></mml:math></inline-formula> DJF) for
<inline-formula><mml:math id="M212" display="inline"><mml:mrow><mml:mi mathvariant="normal">Δ</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:math></inline-formula>CO<inline-formula><mml:math id="M213" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> estimated by RFR, SVR and SOM-FFN for the Southern Ocean are
shown in Fig. <xref ref-type="fig" rid="Ch1.F5"/>. Note that the estimates have been
scaled to sea ice concentration (<inline-formula><mml:math id="M214" display="inline"><mml:mrow><mml:mi mathvariant="normal">Δ</mml:mi><mml:mi>p</mml:mi><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">CO</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:mrow><mml:mo>×</mml:mo><mml:mo>(</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>-</mml:mo><mml:mo>[</mml:mo><mml:mi mathvariant="normal">ice</mml:mi><mml:mo>]</mml:mo><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>) as done for fluxes in Eq. (<xref ref-type="disp-formula" rid="Ch1.E6"/>) – this mutes
winter estimates in the AZ. There is, in general, good agreement in the
spatial distribution between the methods, with the SAZ being a net sink of
CO<inline-formula><mml:math id="M215" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> and the region south of the polar front (PFZ and AZ) being a source of
CO<inline-formula><mml:math id="M216" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> to the atmosphere, as found by <xref ref-type="bibr" rid="bib1.bibx35" id="text.62"/>.</p>
      <p id="d1e3636">More specifically, there is stronger zonal asymmetry in summer compared to
winter. This is driven, in part, by a strong reduction of <inline-formula><mml:math id="M217" display="inline"><mml:mi>p</mml:mi></mml:math></inline-formula>CO<inline-formula><mml:math id="M218" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> by
biological production <xref ref-type="bibr" rid="bib1.bibx35 bib1.bibx25" id="paren.63"/>. There are three regions
in the SAZ where <inline-formula><mml:math id="M219" display="inline"><mml:mrow><mml:mi mathvariant="normal">Δ</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:math></inline-formula>CO<inline-formula><mml:math id="M220" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> reduction is strongest and consistent
between methods (Fig. <xref ref-type="fig" rid="Ch1.F5"/>): east of South America
(Brazil–Malvinas
Confluence), south-east of Africa
(Agulhas retroflection) and between Australia and New Zealand (Tasman Sea).
The reduction of <inline-formula><mml:math id="M221" display="inline"><mml:mrow><mml:mi mathvariant="normal">Δ</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:math></inline-formula>CO<inline-formula><mml:math id="M222" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> in the PFZ is strongest in the Atlantic
sector downstream of the South Georgia and South Sandwich Islands and in the
Indian sector downstream of the Kerguelen Plateau
(Fig. <xref ref-type="fig" rid="Ch1.F5"/>d–f). In both cases, SAZ and PFZ, these
regions are consistent with regions of high biomass
<xref ref-type="bibr" rid="bib1.bibx49 bib1.bibx7" id="paren.64"/>.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F6" specific-use="star"><caption><p id="d1e3707">Time series of <inline-formula><mml:math id="M223" display="inline"><mml:mrow><mml:mi mathvariant="normal">Δ</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:math></inline-formula>CO<inline-formula><mml:math id="M224" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> estimates for the three Southern Ocean biomes as defined by <xref ref-type="bibr" rid="bib1.bibx11" id="text.65"/>: SAZ, PFZ and MIZ.
The <inline-formula><mml:math id="M225" display="inline"><mml:mi>y</mml:mi></mml:math></inline-formula>-axis grid lines represent the same scale for figures <bold>(a)</bold>
through <bold>(c)</bold>. The SOM-FFN estimates are only available until 2011 as
it is trained with SOCAT v2, while the SVR and RFR are trained with SOCAT v3.
The <inline-formula><mml:math id="M226" display="inline"><mml:mrow><mml:mi mathvariant="normal">Δ</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:math></inline-formula>CO<inline-formula><mml:math id="M227" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> normalized to sea ice cover is shown by dashed lines in the
AZ. </p></caption>
          <?xmltex \igopts{width=355.659449pt}?><graphic xlink:href="https://bg.copernicus.org/articles/14/5551/2017/bg-14-5551-2017-f06.pdf"/>

        </fig>

      <p id="d1e3771">However, there are more subtle differences in the magnitudes and
distributions of these patterns. The RFR underestimates winter outgassing
south of the polar front (Fig. <xref ref-type="fig" rid="Ch1.F5"/>g) compared to the
other methods, resulting in a weaker annual source. Conversely, the SVR has
strong winter outgassing (Fig. <xref ref-type="fig" rid="Ch1.F5"/>h) in the PFZ
compared to other methods. In summer, the largest difference occurs in the
eastern Atlantic sector of the SAZ where the SOM-FFN <inline-formula><mml:math id="M228" display="inline"><mml:mrow><mml:mi mathvariant="normal">Δ</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:math></inline-formula>CO<inline-formula><mml:math id="M229" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula>
estimates (highlighted in Fig. <xref ref-type="fig" rid="Ch1.F5"/>f) are larger
compared to SVR and RFR. Other differences in the spatial output are more
subtle.</p>
      <p id="d1e3799">The agreements and differences between methods are also observed in the time
series for each of the biomes (Fig. <xref ref-type="fig" rid="Ch1.F6"/>).
Importantly, there is coherence in the strengthening sink (2002 to 2012) and
timing of the seasonal cycle between the three methods
<xref ref-type="bibr" rid="bib1.bibx23" id="paren.66"/>. The differences in the magnitude of the winter
outgassing in the PFZ and AZ (Fig. <xref ref-type="fig" rid="Ch1.F6"/>b, c) are
also apparent, with the SVR overestimating <inline-formula><mml:math id="M230" display="inline"><mml:mrow><mml:mi mathvariant="normal">Δ</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:math></inline-formula>CO<inline-formula><mml:math id="M231" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> compared to other
methods and the RFR with conservative outgassing estimates.</p>
      <p id="d1e3828">There is also a large difference between the SOM-FFN and the two other
methods in summer, particularly from 1998 through 2006.
Figure <xref ref-type="fig" rid="Ch1.F5"/>f shows that this could be driven by the
difference in the eastern sector of the Atlantic (circled with red).
Estimates of winter <inline-formula><mml:math id="M232" display="inline"><mml:mrow><mml:mi mathvariant="normal">Δ</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:math></inline-formula>CO<inline-formula><mml:math id="M233" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> are in agreement, with the exception of
the last 4 years when SVR winter estimates increased relative to RFR. The
SAZ and PFZ also show variability in the magnitude of a seasonal shoulder in
late summer, where increasing <inline-formula><mml:math id="M234" display="inline"><mml:mrow><mml:mi mathvariant="normal">Δ</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:math></inline-formula>CO<inline-formula><mml:math id="M235" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> is briefly delayed by a short
sharp decrease resulting in a saw-tooth pattern. This effect is the strongest
for the SVR and weakest for the RFR.</p>
      <p id="d1e3871">The seasonal amplitude of <inline-formula><mml:math id="M236" display="inline"><mml:mrow><mml:mi mathvariant="normal">Δ</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:math></inline-formula>CO<inline-formula><mml:math id="M237" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> in the AZ is far larger than in
the SAZ and PFZ (Fig. <xref ref-type="fig" rid="Ch1.F6"/>c) resulting in large
methodological differences. However, this large differential is not realized
in calculated air–sea CO<inline-formula><mml:math id="M238" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> fluxes due to ice cover, as shown by the dashed
lines (Fig. <xref ref-type="fig" rid="Ch1.F6"/>c). Summer estimates are also
influenced by sea ice cover, but not to the extent that winter fluxes would
be reduced.</p>
</sec>
<sec id="Ch1.S3.SS2">
  <title>Simulation experiment results</title>

<?xmltex \floatpos{t}?><table-wrap id="Ch1.T3"><caption><p id="d1e3915">Root mean squared error (RMSE, <inline-formula><mml:math id="M239" display="inline"><mml:mi mathvariant="normal">µ</mml:mi></mml:math></inline-formula>atm) for the three
synthetic data experiments for RFR (left), SVR (middle) and the ensemble mean
(ENS) of the two methods.
Both in- and out-of-sample errors are reported (E[in] and E[out], respectively).
SOCAT experiments are those where the location of synthetic training data is the same as SOCAT v3.
This was run with (with coordinates) and without coordinates (without coordinates), namely time, latitude and longitude.
A third experiment was run with random samples – coordinate variables are included as proxies.</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="5">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="left"/>
     <oasis:colspec colnum="3" colname="col3" align="right"/>
     <oasis:colspec colnum="4" colname="col4" align="right"/>
     <oasis:colspec colnum="5" colname="col5" align="right"/>
     <oasis:thead>
       <oasis:row rowsep="1">

         <oasis:entry colname="col1"/>

         <oasis:entry colname="col2">Experiment</oasis:entry>

         <oasis:entry colname="col3">RFR</oasis:entry>

         <oasis:entry colname="col4">SVR</oasis:entry>

         <oasis:entry colname="col5">ENS</oasis:entry>

       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row>

         <oasis:entry rowsep="1" colname="col1" morerows="2">E[in]</oasis:entry>

         <oasis:entry colname="col2">SOCAT (without coordinates)</oasis:entry>

         <oasis:entry colname="col3">6.65</oasis:entry>

         <oasis:entry colname="col4">7.47</oasis:entry>

         <oasis:entry colname="col5">–</oasis:entry>

       </oasis:row>
       <oasis:row>

         <oasis:entry colname="col2">SOCAT (with coordinates)</oasis:entry>

         <oasis:entry colname="col3">5.12</oasis:entry>

         <oasis:entry colname="col4">5.10</oasis:entry>

         <oasis:entry colname="col5">–</oasis:entry>

       </oasis:row>
       <oasis:row rowsep="1">

         <oasis:entry colname="col2">Random Sampling</oasis:entry>

         <oasis:entry colname="col3">7.23</oasis:entry>

         <oasis:entry colname="col4">7.83</oasis:entry>

         <oasis:entry colname="col5">–</oasis:entry>

       </oasis:row>
       <oasis:row>

         <oasis:entry colname="col1" morerows="2">E[out]</oasis:entry>

         <oasis:entry colname="col2">SOCAT (without coordinates)</oasis:entry>

         <oasis:entry colname="col3">7.46</oasis:entry>

         <oasis:entry colname="col4">7.46</oasis:entry>

         <oasis:entry colname="col5">7.08</oasis:entry>

       </oasis:row>
       <oasis:row>

         <oasis:entry colname="col2">SOCAT (with coordinates)</oasis:entry>

         <oasis:entry colname="col3">5.76</oasis:entry>

         <oasis:entry colname="col4">6.19</oasis:entry>

         <oasis:entry colname="col5">5.36</oasis:entry>

       </oasis:row>
       <oasis:row>

         <oasis:entry colname="col2">Random sampling</oasis:entry>

         <oasis:entry colname="col3">4.88</oasis:entry>

         <oasis:entry colname="col4">4.94</oasis:entry>

         <oasis:entry colname="col5">–</oasis:entry>

       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table></table-wrap>

      <p id="d1e4061">The advantage of using synthetic data is that both in- and out-of-sample
errors can be estimated, where the in-sample error is calculated from the
training points and the out-of-sample error is from the entire predicted domain.
The latter gives a representation of the <italic>true</italic> error of the method.
The results from these experiments are shown in
Table <xref ref-type="table" rid="Ch1.T3"/>. The detailed out-of-sample histograms are
shown in Fig. <xref ref-type="fig" rid="App1.Ch1.F4"/>.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F7" specific-use="star"><caption><p id="d1e4073">Distributions of root mean squared error (RMSE) for the three
synthetic data experiments for RFR in the top row <bold>(a–c)</bold>, SVR in the
middle row <bold>(d–f)</bold> and the ensemble mean for RFR and SVR in the
bottom row <bold>(g–i)</bold>. The first column <bold>(a, d, g)</bold> shows the
RMSE for synthetic SOCAT training locations without coordinates (w/o coords) as proxies,
while the second column <bold>(b, e, h)</bold> includes coordinates as proxies.
The last column <bold>(c, f, i)</bold> shows the RMSE of randomly sampled training locations where coordinates are included as proxies.
</p></caption>
          <?xmltex \igopts{width=355.659449pt}?><graphic xlink:href="https://bg.copernicus.org/articles/14/5551/2017/bg-14-5551-2017-f07.png"/>

        </fig>

      <?xmltex \floatpos{t}?><fig id="Ch1.F8" specific-use="star"><caption><p id="d1e4104">Time series of BIOPERIANT05 <inline-formula><mml:math id="M240" display="inline"><mml:mrow><mml:mi mathvariant="normal">Δ</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:math></inline-formula>CO<inline-formula><mml:math id="M241" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> (target) and empirical estimates of <inline-formula><mml:math id="M242" display="inline"><mml:mrow><mml:mi mathvariant="normal">Δ</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:math></inline-formula>CO<inline-formula><mml:math id="M243" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> for each of the experiments for RFR <bold>(a)</bold>, SVR <bold>(b)</bold> and the ensemble mean estimates <bold>(c)</bold>.
The SOCAT v3 estimates are trained using the locations of SOCAT v3 data.
The variant <italic>with coordinates</italic> includes coordinates as proxies of <inline-formula><mml:math id="M244" display="inline"><mml:mrow><mml:mi mathvariant="normal">Δ</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:math></inline-formula>CO<inline-formula><mml:math id="M245" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> while these are not included for the variant labelled as <italic>without coordinates</italic>.
The <italic>Random</italic> estimates are trained with uniformly distributed random sample locations.
The number of samples per time step for SOCAT <bold>(a)</bold> and random sampling locations <bold>(b)</bold> are shown by the grey fill.
</p></caption>
          <?xmltex \igopts{width=355.659449pt}?><graphic xlink:href="https://bg.copernicus.org/articles/14/5551/2017/bg-14-5551-2017-f08.png"/>

        </fig>

<?xmltex \hack{\newpage}?>
<sec id="Ch1.S3.SS2.SSS1">
  <title>Coordinates as proxy variables</title>
      <p id="d1e4203">This experiment used the synthetic dataset to test the influence of including
or excluding transformed coordinates (time, latitude and longitude) as
proxies of <inline-formula><mml:math id="M246" display="inline"><mml:mrow><mml:mi mathvariant="normal">Δ</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:math></inline-formula>CO<inline-formula><mml:math id="M247" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula>. There are four major results from the experiment
results. Firstly, the RMSE estimates are smaller when coordinates are
included as proxies for both in- and out-of-sample subsets
(Table <xref ref-type="table" rid="Ch1.T3"/>). Secondly, RFR achieves marginally
better out-of-sample RMSE than SVR (5.76 and 6.19 <inline-formula><mml:math id="M248" display="inline"><mml:mi mathvariant="normal">µ</mml:mi></mml:math></inline-formula>atm, respectively) when trained with coordinates. Third, both RFR and SVR have
comparable out-of-sample RMSE estimates (7.46 <inline-formula><mml:math id="M249" display="inline"><mml:mi mathvariant="normal">µ</mml:mi></mml:math></inline-formula>atm) for <inline-formula><mml:math id="M250" display="inline"><mml:mrow><mml:mi mathvariant="normal">Δ</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:math></inline-formula>CO<inline-formula><mml:math id="M251" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> estimates trained with and without coordinate proxies. Lastly, the
ensemble mean of SVR and RFR has lower out-of-sample RMSE estimates than the
individual estimates for implementations with and without coordinate proxies,
though these gains are marginal (Table <xref ref-type="table" rid="Ch1.T3"/>).</p>
      <p id="d1e4263">These points can also be gleaned from RMSE maps
(Fig. <xref ref-type="fig" rid="Ch1.F7"/>). Both RFR and SVR errors are low; however, the
RFR outperforms the SVR marginally for the open-ocean regions. Errors in
coastal regions remain high for each of the experiments and methods
(Fig. <xref ref-type="fig" rid="Ch1.F7"/>a, b, d, e), such as in the Argentine Sea, the
Agulhas retroflection, and the marginal ice zone. The ensemble mean of the
estimates achieves a balance between the two methods with low and moderate
RMSE scores in the open-ocean and coastal regions. Lastly, the distributions
of the errors for RFR and SVR without coordinate proxies (and thus the
ensemble mean) are similar.</p>
      <p id="d1e4270">The time series (Fig. <xref ref-type="fig" rid="Ch1.F8"/>) show that including
coordinate variables plays an important role in achieving accurate phasing of
the seasonal cycle. When coordinates are not included as proxies, the phasing
shifts earlier for both methods. There is also an improvement of estimates
over time, where the first 2 years (1998 and 1999) have worse estimates for
both SVR and RFR (Fig. <xref ref-type="fig" rid="Ch1.F8"/>). This does not seem to
be linked to the number of observations, but could be due to the
distribution. The ensemble <inline-formula><mml:math id="M252" display="inline"><mml:mrow><mml:mi mathvariant="normal">Δ</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:math></inline-formula>CO<inline-formula><mml:math id="M253" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> in the 1998 to 1999 period is
closer to BIOPERIANT05 output as the over- and underestimates of
RFR and SVR, respectively, compensate for each other.</p>
</sec>
<sec id="Ch1.S3.SS2.SSS2">
  <title>Random sampling regime</title>
      <p id="d1e4302">A second experiment is performed to
assess the inaccuracies that arise due to the spatial and temporal sampling
biases in the SOCAT v3 dataset. Training locations are
chosen at random and uniformly in time and space. This eliminates any
summer–winter biases as well as clustering of cruise tracks in certain
regions (such as the Argentine sea). Note that coordinates are included as
proxies of <inline-formula><mml:math id="M254" display="inline"><mml:mrow><mml:mi mathvariant="normal">Δ</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:math></inline-formula>CO<inline-formula><mml:math id="M255" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> with the random sampling regime.</p>
      <p id="d1e4324">Firstly, the results show that the biases in SOCAT v3 do contribute to
out-of-sample errors, as the random sampling regime achieved lower RMSE
scores than any of the other experiments (4.88 and 4.94 <inline-formula><mml:math id="M256" display="inline"><mml:mi mathvariant="normal">µ</mml:mi></mml:math></inline-formula>atm for
RFR and SVR, respectively, as in Table <xref ref-type="table" rid="Ch1.T3"/>). However,
RFR is marginally less susceptible to sampling biases than SVR, as the
relative improvement for the latter is larger (with differences of 0.88 and
1.70 <inline-formula><mml:math id="M257" display="inline"><mml:mi mathvariant="normal">µ</mml:mi></mml:math></inline-formula>atm, respectively). The spatial distributions of RMSE for the
random sampling implementations (Fig. <xref ref-type="fig" rid="Ch1.F7"/>c, f) show that
errors in coastal regions remain high (<inline-formula><mml:math id="M258" display="inline"><mml:mo>&gt;</mml:mo></mml:math></inline-formula> 12 <inline-formula><mml:math id="M259" display="inline"><mml:mi mathvariant="normal">µ</mml:mi></mml:math></inline-formula>atm) with uniform
sampling. Lastly, there is an improvement in the estimates from 1998 to 2000,
particularly with the random sampling for the SVR
(Fig. <xref ref-type="fig" rid="Ch1.F8"/>), suggesting that the method is more
susceptible to the temporal bias than RFR (if coordinates are included as
proxies).</p>
</sec>
</sec>
</sec>
<sec id="Ch1.S4">
  <title>Discussion</title>
<sec id="Ch1.S4.SS1">
  <title>Observational estimates</title>
      <p id="d1e4375">In this section we address the methods' ability to fit the training data, in
other words an assessment of in-sample errors
(Fig. <xref ref-type="fig" rid="Ch1.F4"/> and Table <xref ref-type="table" rid="Ch1.T2"/>).
Thereafter we investigate the differences in the estimates of <inline-formula><mml:math id="M260" display="inline"><mml:mrow><mml:mi mathvariant="normal">Δ</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:math></inline-formula>CO<inline-formula><mml:math id="M261" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> (Figs. <xref ref-type="fig" rid="Ch1.F5"/> and
<xref ref-type="fig" rid="Ch1.F6"/>).</p><?xmltex \hack{\newpage}?>
<sec id="Ch1.S4.SS1.SSS1">
  <title>Assessment of in-sample errors</title>
      <p id="d1e4412">Based on the results, the SOM-FFN method (by <xref ref-type="bibr" rid="bib1.bibx22" id="altparen.67"/>)
proves to be an elegant implementation of neural network methods that is able
to estimate SOCAT <inline-formula><mml:math id="M262" display="inline"><mml:mrow><mml:mi mathvariant="normal">Δ</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:math></inline-formula>CO<inline-formula><mml:math id="M263" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> (in-sample estimate) better than the RFR
and SVR methods (with respective RMSE estimates of 14.84, 16.45 and
24.40 <inline-formula><mml:math id="M264" display="inline"><mml:mi mathvariant="normal">µ</mml:mi></mml:math></inline-formula>atm). Here we assess these differences and try to identify
the possible reasons for the differences.</p>
      <p id="d1e4444">One of the largest differences in the methods' ability to fit the training
data is in the SAZ where the RFR and SVR score poorly in comparison to
SOM-FFN, particularly from 2000 to 2006 (Fig. <xref ref-type="fig" rid="Ch1.F4"/>a).
This is during a period where the number of observations are still relatively
low in the SOCAT v3 database (Fig. <xref ref-type="fig" rid="Ch1.F4"/>a). This may
then be due to an increase in the complexity of <inline-formula><mml:math id="M265" display="inline"><mml:mrow><mml:mi mathvariant="normal">Δ</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:math></inline-formula>CO<inline-formula><mml:math id="M266" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> measurements
in the SAZ from SOCAT v2 to v3 from 1998 through 2006, which makes the data
more challenging to fit accurately. This is exemplified in the maps of RMSE
(Fig. <xref ref-type="fig" rid="Ch1.F4"/>g–i), where coastal regions typically have
larger error estimates. A comparison of SOCAT v2 and v3 for this period shows
that the increase in the number of observations occurs primarily in the
Argentine Sea, thus confirming this hypothesis
(Fig. <xref ref-type="fig" rid="App1.Ch1.F1"/>a). The comparison of SOCAT v2 and v3 RMSE
results for RFR and SVR confirm this (Table <xref ref-type="table" rid="Ch1.T2"/>), where
there is a marked improvement when using the older dataset. Importantly, this
shows that increasing the number of measurements does not necessarily improve
the in-sample error estimates, but may yield a more accurate out-of-sample
estimate; however, this is difficult to test with limited data.</p>
      <p id="d1e4477">Despite the improvement in performance when testing against SOCAT v2, SVR and
RFR still have poorer performance than the SOM-FFN approach. We attribute
this in part to the SOM-FFN's ability to reduce the large RMSE contributions
observed in the other two methods. This notion is supported by the smaller
difference between RMSE and MAE, especially in the SAZ
(Table <xref ref-type="table" rid="Ch1.T2"/>). The SOM-FFN achieves this by increasing the
flexibility of the algorithm by having multiple regression models that can
each be optimized for data with a particular length scale of variability.
This allows the SOM-FFN approach to adapt to short scales of variability in
dynamic regions such as the Argentine Sea and the coastal Antarctic
(Fig. <xref ref-type="fig" rid="Ch1.F4"/>i).</p>
      <p id="d1e4484">In comparison, this implementation of SVR, which is theoretically similar to
an artificial neural network, only has one length scale for the entire domain
<xref ref-type="bibr" rid="bib1.bibx50 bib1.bibx45" id="paren.68"/>. This becomes apparent in the AZ, where many of
the observations are in the more biogeochemically dynamic coastal Antarctic,
where melting sea ice results in short decorrelation length scales
<xref ref-type="bibr" rid="bib1.bibx2 bib1.bibx9 bib1.bibx19" id="paren.69"/>. The SVR has much larger RMSE
scores in the AZ than the RFR or SOM-FFN (35.69, 23.49 and
21.32 <inline-formula><mml:math id="M267" display="inline"><mml:mi mathvariant="normal">µ</mml:mi></mml:math></inline-formula>atm, respectively). This suggests that implementing the SVR
approach without an initial clustering or regionalization step will not
yield good results.</p>
      <p id="d1e4501">By comparison, the RFR approach is more adept at fitting various length
scales of variability, accounting for both the higher and lower variability
in the AZ and PFZ, respectively (with SOCAT v3 standard deviations of 54.65
and 20.01, respectively). The high <inline-formula><mml:math id="M268" display="inline"><mml:mrow><mml:msup><mml:mi>r</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> scores achieved by RFR in the AZ and
PFZ (0.81 and 0.71, respectively) highlight the flexibility in the method
(Table <xref ref-type="table" rid="Ch1.T2"/>). This is due to the differences in the
underlying mathematics of the methods. Decision trees, which are the building
blocks of RFR, separate data at each decision node with a discrete boundary
<xref ref-type="bibr" rid="bib1.bibx6" id="paren.70"/>. Conversely, ANNs and SVRs often use Gaussian functions
in the cost function, resulting in smoother approximations
<xref ref-type="bibr" rid="bib1.bibx50" id="paren.71"/>. This makes decision trees prone to overfitting, but the
ensemble implementation of random forests eliminates this to a large extent.</p>
</sec>
<sec id="Ch1.S4.SS1.SSS2">
  <?xmltex \opttitle{Differences in $\Delta p$CO${}_{2}$ estimates}?><title>Differences in <inline-formula><mml:math id="M269" display="inline"><mml:mrow><mml:mi mathvariant="normal">Δ</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:math></inline-formula>CO<inline-formula><mml:math id="M270" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> estimates</title>
      <p id="d1e4549">One of the largest differences in <inline-formula><mml:math id="M271" display="inline"><mml:mrow><mml:mi mathvariant="normal">Δ</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:math></inline-formula>CO<inline-formula><mml:math id="M272" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> is the weaker sink
estimated by the SOM-FFN method in the SAZ
(Fig. <xref ref-type="fig" rid="Ch1.F6"/>). This difference can be traced to the
eastern Atlantic SAZ, where the SOM-FFN has higher estimates of <inline-formula><mml:math id="M273" display="inline"><mml:mrow><mml:mi mathvariant="normal">Δ</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:math></inline-formula>CO<inline-formula><mml:math id="M274" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> (Fig. <xref ref-type="fig" rid="Ch1.F5"/>e shown by the red oval and the
differences between the methods in Fig. <xref ref-type="fig" rid="App1.Ch1.F3"/>a,
b). A comparison between the SVR and RFR trained with SOCAT v2 and v3 further
eliminates the use of different training datasets as primary sources of
difference, where methodology is a higher-order driver of difference
(Fig. <xref ref-type="fig" rid="App1.Ch1.F2"/>). The lack of this feature in the
eastern Atlantic sector of the Southern Ocean in SVR and RFR estimates
suggests that this is a function of the initial clustering step in the
SOM-FFN. The clustering process separates <inline-formula><mml:math id="M275" display="inline"><mml:mi>p</mml:mi></mml:math></inline-formula>CO<inline-formula><mml:math id="M276" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> observations into
clusters that are not restricted in time and space <xref ref-type="bibr" rid="bib1.bibx22" id="paren.72"/>.
This allows the SOM-FFN to “transfer knowledge” from a remote location
(even outside the Southern Hemisphere) if proxies are similar to the Southern
Ocean. This knowledge transfer assumes that the relationship between
<inline-formula><mml:math id="M277" display="inline"><mml:mi>p</mml:mi></mml:math></inline-formula>CO<inline-formula><mml:math id="M278" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> and the measured proxies is globally consistent. Moreover, there is
the assumption that all <inline-formula><mml:math id="M279" display="inline"><mml:mi>p</mml:mi></mml:math></inline-formula>CO<inline-formula><mml:math id="M280" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> variability (within a cluster) can captured
by the measured proxies. This assumption is not made when using coordinates
or regional subsets as locations are isolated, but there is then the
potential loss of knowledge from remote locations. This question will be
addressed further in the discussion on the use of coordinate variables as
proxies of <inline-formula><mml:math id="M281" display="inline"><mml:mrow><mml:mi mathvariant="normal">Δ</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:math></inline-formula>CO<inline-formula><mml:math id="M282" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula>.</p>
      <p id="d1e4670">Another difference between <inline-formula><mml:math id="M283" display="inline"><mml:mrow><mml:mi mathvariant="normal">Δ</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:math></inline-formula>CO<inline-formula><mml:math id="M284" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> estimates is the tendency for SVR
to overestimate <inline-formula><mml:math id="M285" display="inline"><mml:mrow><mml:mi mathvariant="normal">Δ</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:math></inline-formula>CO<inline-formula><mml:math id="M286" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> relative to the RFR and SOM-FFN approaches,
particularly in the PFZ and AZ where winter data are sparse
(Fig. <xref ref-type="fig" rid="Ch1.F6"/>b, c). We attribute this to the SVR's
sensitivity to outliers, determined by the fact that the cost function
penalizes outliers heavily (Eq. <xref ref-type="disp-formula" rid="Ch1.E5"/>). In context of
the SOCAT v3 dataset, the algorithm may treat the sparse winter data as
outliers. This is due to the fact that sparse winter measurements of <inline-formula><mml:math id="M287" display="inline"><mml:mrow><mml:mi mathvariant="normal">Δ</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:math></inline-formula>CO<inline-formula><mml:math id="M288" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> are positive, while the abundant summer measurements are negative
(<xref ref-type="bibr" rid="bib1.bibx35 bib1.bibx26" id="altparen.73"/>). This may then be a positive realization of a
methodological attribute that is typically considered a weakness.</p>
      <p id="d1e4738">Conversely, RFR winter estimates of <inline-formula><mml:math id="M289" display="inline"><mml:mrow><mml:mi mathvariant="normal">Δ</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:math></inline-formula>CO<inline-formula><mml:math id="M290" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> are often lower than the
SOM-FFN and SVR estimates, again in the AZ and PFZ
(Figs. <xref ref-type="fig" rid="Ch1.F5"/>g–i and <xref ref-type="fig" rid="Ch1.F6"/>b,
c). This may be due to the method's resilience against outliers, which could
be due to two attributes <xref ref-type="bibr" rid="bib1.bibx30" id="paren.74"/>. Firstly, outliers are less
likely to dominate the feature space with the use of bootstrap aggregation as
these points will be sampled less frequently. Secondly, individual decision
trees regress values by using the average of samples in a terminal node (or
leaf), where the minimum number of samples per terminal node is set by the
user. This second attribute means that estimates will never be outside the
bounds of the minimum and maximum of the training dataset, thus leading to
conservative estimates (as shown in Fig. <xref ref-type="fig" rid="Ch1.F2"/>b). The
differences between the methods shown in
Fig. <xref ref-type="fig" rid="Ch1.F6"/> could be a good case for an ensemble
approach, where the strengths of one model compensate for the weakness of
another. This is assessed in the synthetic data and will be discussed
further.</p>
      <p id="d1e4772">These attributes may also be the reason for the differences in the magnitudes
of the autumn peak in <inline-formula><mml:math id="M291" display="inline"><mml:mrow><mml:mi mathvariant="normal">Δ</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:math></inline-formula>CO<inline-formula><mml:math id="M292" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> in the SAZ and PFZ. Mechanistically
this peak could be attributed to a sharp increase in cooling leading into
winter, resulting in increased solubility of CO<inline-formula><mml:math id="M293" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> and thus a sharp
reduction of <inline-formula><mml:math id="M294" display="inline"><mml:mrow><mml:mi mathvariant="normal">Δ</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:math></inline-formula>CO<inline-formula><mml:math id="M295" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> <xref ref-type="bibr" rid="bib1.bibx35 bib1.bibx47" id="paren.75"/>. Deeper mixing
of the water column shortly thereafter would entrain CO<inline-formula><mml:math id="M296" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula>-rich waters, thus
increasing <inline-formula><mml:math id="M297" display="inline"><mml:mrow><mml:mi mathvariant="normal">Δ</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:math></inline-formula>CO<inline-formula><mml:math id="M298" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> <xref ref-type="bibr" rid="bib1.bibx26" id="paren.76"/>. However, the trend for this
peak to shrink in the SAZ and PFZ for all methods suggests that this may an
artefact that is specific to the SOCAT dataset.</p>
</sec>
</sec>
<sec id="Ch1.S4.SS2">
  <title>Synthetic data experiments</title>
      <p id="d1e4864">In this section we discuss the outcomes of the two experiments performed on
the synthetic dataset (BIOPERIANT05 model output). The first experiment
addresses the efficacy of including coordinate variables as proxies of
<inline-formula><mml:math id="M299" display="inline"><mml:mrow><mml:mi mathvariant="normal">Δ</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:math></inline-formula>CO<inline-formula><mml:math id="M300" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula>. This is done by running two implementations of RFR and SVR:
without coordinates as proxies, and with coordinates as proxies. The second
experiment addresses the impact that the SOCAT dataset, biased in both space
and time has on <inline-formula><mml:math id="M301" display="inline"><mml:mrow><mml:mi mathvariant="normal">Δ</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:math></inline-formula>CO<inline-formula><mml:math id="M302" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> estimates.</p>
<sec id="Ch1.S4.SS2.SSS1">
  <title>Coordinate variables improve estimates</title>
      <p id="d1e4910">This topic has to some extent already been mentioned in the discussion of the
observational data, in which the cases for and against the inclusion of
coordinate variables as proxies for <inline-formula><mml:math id="M303" display="inline"><mml:mrow><mml:mi mathvariant="normal">Δ</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:math></inline-formula>CO<inline-formula><mml:math id="M304" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> were put forward. If coordinates are not included there is the benefit of
potential information transfer from remote parts of the domain, but this
assumes that the satellite observable proxies (and assimilative model output)
constrain <inline-formula><mml:math id="M305" display="inline"><mml:mrow><mml:mi mathvariant="normal">Δ</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:math></inline-formula>CO<inline-formula><mml:math id="M306" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> in a globally consistent way. If coordinates are
included the information transfer is lost and the assumption is made that the
proxy variables are not able to constrain <inline-formula><mml:math id="M307" display="inline"><mml:mrow><mml:mi mathvariant="normal">Δ</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:math></inline-formula>CO<inline-formula><mml:math id="M308" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> in a globally
consistent manner.</p>
      <p id="d1e4971">The results of this experiment show that coordinates improve estimates of
<inline-formula><mml:math id="M309" display="inline"><mml:mrow><mml:mi mathvariant="normal">Δ</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:math></inline-formula>CO<inline-formula><mml:math id="M310" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> with better RMSE scores for both SVR and RFR
(Table <xref ref-type="table" rid="Ch1.T3"/>). We are thus in favour of the second
hypothesis that the available proxies cannot sufficiently constrain <inline-formula><mml:math id="M311" display="inline"><mml:mrow><mml:mi mathvariant="normal">Δ</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:math></inline-formula>CO<inline-formula><mml:math id="M312" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> without coordinates. A two-step clustering approach, such as
SOM-FFN, may be able to achieve comparable results without coordinates, but
this would have to be tested with that specific method. However, this may
also lead to trends in the data that may be artefacts of remote knowledge
transfer, as potentially seen in the observational data
(Fig. <xref ref-type="fig" rid="Ch1.F5"/>f).</p>
      <p id="d1e5017">An important outcome of this experiment is that the inclusion of coordinates
improves the seasonal phasing of the methods
(Fig. <xref ref-type="fig" rid="Ch1.F8"/>a, b). It is critical for the empirical
methods to correctly estimate the phasing of <inline-formula><mml:math id="M313" display="inline"><mml:mrow><mml:mi mathvariant="normal">Δ</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:math></inline-formula>CO<inline-formula><mml:math id="M314" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> as the seasonal
cycle phasing may be a useful indicator of anthropogenically driven changes to
the marine carbonate system.</p>
      <p id="d1e5041">One of the assumptions in these synthetic data experiments is that the models
are, to some extent, representative of the variability in the observed ocean.
However, the BIOPERIANT05 output does not achieve this, with a standard
deviation of 19.80 <inline-formula><mml:math id="M315" display="inline"><mml:mi mathvariant="normal">µ</mml:mi></mml:math></inline-formula>atm for synthetic SOCAT v3 data compared to
38.20 <inline-formula><mml:math id="M316" display="inline"><mml:mi mathvariant="normal">µ</mml:mi></mml:math></inline-formula>atm of the gridded SOCAT v3 observations (according to
Southern Ocean as defined by <xref ref-type="bibr" rid="bib1.bibx11" id="altparen.77"/>). This could be a cause for
concern. However, we believe that this creates an even a stronger case for
the use of coordinates as proxy variables. The increased variability in the
observations could be due to processes that deterministic models cannot yet
constrain due to our lack of understanding of the marine carbonate system
<xref ref-type="bibr" rid="bib1.bibx26 bib1.bibx36" id="paren.78"/>.</p>
</sec>
<sec id="Ch1.S4.SS2.SSS2">
  <title>SOCAT biases</title>
      <p id="d1e5070">The lack of winter <inline-formula><mml:math id="M317" display="inline"><mml:mi>p</mml:mi></mml:math></inline-formula>CO<inline-formula><mml:math id="M318" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> data is a problem throughout the mid-latitude and high
latitude oceans, but is particularly severe in the Southern Ocean
<xref ref-type="bibr" rid="bib1.bibx3" id="paren.79"/>, and the impact of the lack of data in the Southern Ocean
is not known. Moreover, the efficacy of various methods to fill this large
temporal gap is unknown <xref ref-type="bibr" rid="bib1.bibx42" id="paren.80"/>. Here we show that there is a
considerable impact in this synthetic data environment, but the effect of the
sampling bias is perhaps smaller than we would have anticipated. Both methods
are able to estimate the spatial distribution and the seasonal cycle of
<inline-formula><mml:math id="M319" display="inline"><mml:mrow><mml:mi mathvariant="normal">Δ</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:math></inline-formula>CO<inline-formula><mml:math id="M320" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> with relative accuracy (Figs. <xref ref-type="fig" rid="Ch1.F8"/>
and <xref ref-type="fig" rid="Ch1.F7"/> and Table <xref ref-type="table" rid="Ch1.T3"/>). This
could be due to two factors.</p>
      <p id="d1e5121">Firstly, winter data are less variable than summer data and require less
sampling. Mechanistically, this is a likely scenario. In summer <inline-formula><mml:math id="M321" display="inline"><mml:mrow><mml:mi mathvariant="normal">Δ</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:math></inline-formula>CO<inline-formula><mml:math id="M322" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> is spatio-temporally heterogeneous in the Southern Ocean due to the
uptake of CO<inline-formula><mml:math id="M323" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> by phytoplankton
<xref ref-type="bibr" rid="bib1.bibx35 bib1.bibx2 bib1.bibx49 bib1.bibx9 bib1.bibx26" id="paren.81"/>. The
drivers of phytoplankton are complex due to the co-limitation of light and
iron (as a micronutrient) in the Southern Ocean
<xref ref-type="bibr" rid="bib1.bibx5 bib1.bibx49 bib1.bibx46" id="paren.82"/>. This complexity would require
more sampling: perhaps additional proxies or increased spatial resolution to
capture the variability of <inline-formula><mml:math id="M324" display="inline"><mml:mrow><mml:mi mathvariant="normal">Δ</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:math></inline-formula>CO<inline-formula><mml:math id="M325" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula>. Conversely, processes driving
winter <inline-formula><mml:math id="M326" display="inline"><mml:mrow><mml:mi mathvariant="normal">Δ</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:math></inline-formula>CO<inline-formula><mml:math id="M327" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula>, namely the interaction of mixing and buoyancy, act
on larger scales, potentially leading to less spatio-temporal heterogeneity.
However, the lack of observations means that we simply cannot know with
certainty. This makes a strong case for autonomous sampling platforms to the
Southern Ocean's winter sampling gap. The SOCCOM float project may soon yield
such measurements with pH-derived estimates of <inline-formula><mml:math id="M328" display="inline"><mml:mi>p</mml:mi></mml:math></inline-formula>CO<inline-formula><mml:math id="M329" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula>
<xref ref-type="bibr" rid="bib1.bibx43 bib1.bibx18 bib1.bibx53" id="paren.83"/>.</p>
      <p id="d1e5217">Secondly, the model used to generate the synthetic data may not be
representative of the Southern Ocean. This has been discussed in the previous
section, but here, rather than increasing our confidence, it diminishes our
confidence in the result. Studies have shown that process models are not able
to accurately represent the seasonal cycle of CO<inline-formula><mml:math id="M330" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> in the Southern Ocean
<xref ref-type="bibr" rid="bib1.bibx26 bib1.bibx36" id="paren.84"/>. Moreover, <inline-formula><mml:math id="M331" display="inline"><mml:mrow><mml:mi mathvariant="normal">Δ</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:math></inline-formula>CO<inline-formula><mml:math id="M332" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> is often driven by
processes that are not representative of observations <xref ref-type="bibr" rid="bib1.bibx36" id="paren.85"/>.</p>
      <p id="d1e5254">The most likely scenario is likely a combination of these two factors, where
winter data are in fact less variable than summer data, but the error is
larger than the experiment shows due to incomplete knowledge of the processes
that describe <inline-formula><mml:math id="M333" display="inline"><mml:mi>p</mml:mi></mml:math></inline-formula>CO<inline-formula><mml:math id="M334" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> in the process models.</p>
</sec>
<sec id="Ch1.S4.SS2.SSS3">
  <title>The best method: the ensemble average</title>
      <p id="d1e5279">The synthetic data also allow us to compare the two methods relative to each
other, in the context of the SOCAT v3 data. The data show that the RFR method
performs better than the SVR (trained with coordinates as proxies) with
respective out-of-sample RMSEs of 5.76 and 6.19 <inline-formula><mml:math id="M335" display="inline"><mml:mi mathvariant="normal">µ</mml:mi></mml:math></inline-formula>atm
(Table <xref ref-type="table" rid="Ch1.T3"/>). However, it is the average of these two
methods (ensemble mean) that achieves the lowest RMSE (5.36 <inline-formula><mml:math id="M336" display="inline"><mml:mi mathvariant="normal">µ</mml:mi></mml:math></inline-formula>atm),
albeit marginal. The time series in Fig. <xref ref-type="fig" rid="Ch1.F8"/>c shows
that the improvement may come from the period 1998 to 2000, when RFR is
plagued by underestimation of the sink strength, while SVR overestimates the
sink strength. This supports the notion that the strengths and weaknesses of
these two methods compliment each other. Moreover, it supports the merit of
multiple approaches and further development of empirical methods for the
estimation of <inline-formula><mml:math id="M337" display="inline"><mml:mrow><mml:mi mathvariant="normal">Δ</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:math></inline-formula>CO<inline-formula><mml:math id="M338" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula>.</p>
</sec>
</sec>
</sec>
<sec id="Ch1.S5" sec-type="conclusions">
  <title>Conclusions</title>
      <p id="d1e5329">In this study two empirical methods (SVR and RFR) are presented as
alternative and complementary <inline-formula><mml:math id="M339" display="inline"><mml:mi>p</mml:mi></mml:math></inline-formula>CO<inline-formula><mml:math id="M340" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> gap-filling methods. These algorithms
are established in other fields, but have not been applied for the estimation
of surface ocean <inline-formula><mml:math id="M341" display="inline"><mml:mrow><mml:mi mathvariant="normal">Δ</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:math></inline-formula>CO<inline-formula><mml:math id="M342" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula>. We apply the methods to the Southern Ocean
where the paucity of ship-based measurements during winter is one of the
major challenges. The SOCAT v3 dataset was co-located with assimilative model
output and satellite measurable proxy variables to create a training dataset
<xref ref-type="bibr" rid="bib1.bibx3" id="paren.86"/>. These estimates were compared with the SOM-FFN approach
by <xref ref-type="bibr" rid="bib1.bibx22" id="text.87"/>.</p>
      <p id="d1e5374">We found that the SOM-FFN and RFR methods had error estimates of similar
magnitudes, while SVR had a larger error estimate. The RFR performed
comparably to the SOM-FFN approach when compared with the SOCAT v2 dataset,
with which SOM-FFN was trained.<?xmltex \hack{\vadjust{\newpage}}?> The increase in the
number of measurements in the highly variable coastal ocean between SOCAT v2
and v3 leads to increased RMSE values, particularly in the Sub-Antarctic Zone
(SAZ). Despite accounting for the increase in coastal data, the SOM-FFN had
error estimates comparable to RFR and SVR the approach in the SAZ. We
attribute this to the method's ability to cluster the training data into
regions of different modes of variability to which individual regressions are
then applied. The SVR method performed poorly due to its inability to adapt
to various modes of variability, while the RFR is intrinsically much more
flexible and thus performed well in fitting the training data.</p>
      <p id="d1e5379">There was good agreement amongst the three methods with respect to the
overall trend of <inline-formula><mml:math id="M343" display="inline"><mml:mrow><mml:mi mathvariant="normal">Δ</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:math></inline-formula>CO<inline-formula><mml:math id="M344" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula>, but there were also differences. The
primary difference was in the Atlantic sector of the SAZ, where the
SOM-FFN overestimated <inline-formula><mml:math id="M345" display="inline"><mml:mrow><mml:mi mathvariant="normal">Δ</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:math></inline-formula>CO<inline-formula><mml:math id="M346" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> relative to the other methods. This is
likely due to remote knowledge transfer within a data-sparse cluster;
however, we cannot identify this as right or wrong due to the lack of data in
this region. Other differences were due to intrinsic attributes of the
methods: SVR was sensitive to outliers, resulting in relatively large winter
<inline-formula><mml:math id="M347" display="inline"><mml:mrow><mml:mi mathvariant="normal">Δ</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:math></inline-formula>CO<inline-formula><mml:math id="M348" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> estimates – potentially a desirable feature for sparse
winter data; RFR underestimated <inline-formula><mml:math id="M349" display="inline"><mml:mrow><mml:mi mathvariant="normal">Δ</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:math></inline-formula>CO<inline-formula><mml:math id="M350" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> relative to the other
methods due to its robustness to outliers.</p>
      <p id="d1e5459">To test the efficacy of these methods, they were applied to a synthetic
dataset (process model output). Two major questions were asked: (1) what is
the efficacy of including coordinate variables (time, latitude and longitude)
as proxy variables? (2) What is the impact of sampling biases in the SOCAT v3
dataset? The results showed that including coordinate variables improved the
estimates of <inline-formula><mml:math id="M351" display="inline"><mml:mrow><mml:mi mathvariant="normal">Δ</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:math></inline-formula>CO<inline-formula><mml:math id="M352" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> for SVR and RFR. Moreover, the phasing of the
seasonal cycle was also improved with the inclusion of coordinates. The
second experiment showed that there is only a small bias in the estimates of
<inline-formula><mml:math id="M353" display="inline"><mml:mrow><mml:mi mathvariant="normal">Δ</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:math></inline-formula>CO<inline-formula><mml:math id="M354" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula>; however, the inability of process models to represent
Southern Ocean <inline-formula><mml:math id="M355" display="inline"><mml:mrow><mml:mi mathvariant="normal">Δ</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:math></inline-formula>CO<inline-formula><mml:math id="M356" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> variability accurately places uncertainty on
this result.</p>
      <p id="d1e5521">Lastly, we show that while the RFR approach outperforms the SVR approach, the
ensemble mean of the two methods scores better than either individual
methods. This provides a motivation for continued research on methods that complement
each other in strengths and weaknesses.</p>
</sec>

      
      </body>
    <back><notes notes-type="dataavailability">

      <p id="d1e5528">The data are available at <ext-link xlink:href="https://doi.org/10.6084/m9.figshare.5369038" ext-link-type="DOI">10.6084/m9.figshare.5369038</ext-link>
(<xref ref-type="bibr" rid="bib1.bibx15" id="altparen.88"/>).</p>
  </notes><?xmltex \hack{\clearpage}?><app-group>

<app id="App1.Ch1.S1">
  <title>Comparison of SOCAT v2 and v3</title>
      <p id="d1e5546">One of the shortcomings of this study is that the SOM-FFN method used SOCAT
v2 as a training dataset, while the SVR and RFR methods were trained with
SOCAT v3. Figure <xref ref-type="fig" rid="App1.Ch1.F3"/> shows that there is a marked
difference between the two datasets. Importantly, the increase in the number
of observations between 1998 and 2006 between SOCAT v2 and v3 are almost
exclusively in the Argentine Sea.</p>
      <p id="d1e5551">These differences may have an impact on the estimates of <inline-formula><mml:math id="M357" display="inline"><mml:mrow><mml:mi mathvariant="normal">Δ</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:math></inline-formula>CO<inline-formula><mml:math id="M358" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula>. To
test this, the methods were implemented as explained in
Sect. <xref ref-type="sec" rid="Ch1.S2.SS4"/> with the exception that RFR and SVR
methods were trained with both SOCAT v2 and v3.
Figure <xref ref-type="fig" rid="App1.Ch1.F2"/> shows that, on average, there is a
larger difference between the RFR and SVR methods than the different training
datasets.</p>

      <?xmltex \floatpos{h!}?><fig id="App1.Ch1.F1"><caption><p id="d1e5579">The increase in the number of observations between SOCAT v2 and SOCAT v3 for two periods: <bold>(a)</bold> 1998 through 2006, and 2007 through 2014.   </p></caption>
        <?xmltex \hack{\hsize\textwidth}?>
        <?xmltex \igopts{width=355.659449pt}?><graphic xlink:href="https://bg.copernicus.org/articles/14/5551/2017/bg-14-5551-2017-f09.pdf"/>

      </fig>

      <?xmltex \floatpos{h!}?><fig id="App1.Ch1.F2"><caption><p id="d1e5596">Comparison of air–sea CO<inline-formula><mml:math id="M359" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> flux RFR and SVR when trained with SOCAT
v2 and v3. The SOM-FFN method, trained with SOCAT v2, is also shown.
The figure demonstrates that methodology plays a larger role in determining the outcome of the estimate than the availability of data (for these two methods).</p></caption>
        <?xmltex \hack{\hsize\textwidth}?>
        <?xmltex \igopts{width=355.659449pt}?><graphic xlink:href="https://bg.copernicus.org/articles/14/5551/2017/bg-14-5551-2017-f10.pdf"/>

      </fig>

      <p id="d1e5616"><?xmltex \hack{\newpage}?>The differences between the different methods are shown in
Fig. <xref ref-type="fig" rid="App1.Ch1.F3"/>. Figures (a) and (b) show that the SVR
and RFR methods estimate a stronger sink in the Atlantic sector of the SAZ.
Here (Fig. <xref ref-type="fig" rid="App1.Ch1.F3"/>b) the tendency of the SVR method
to estimate strong outgassing south of the polar front relative to SOM-FFN
and RFR is also seen. Conversely, the RFR, on average, underestimates <inline-formula><mml:math id="M360" display="inline"><mml:mrow><mml:mi mathvariant="normal">Δ</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:math></inline-formula>CO<inline-formula><mml:math id="M361" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> south of the polar front.
<?xmltex \hack{\clearpage}?></p>

      <?xmltex \floatpos{h!}?><fig id="App1.Ch1.F3"><caption><p id="d1e5646">The differences between annual averages of each of the approaches for the period 1998 to 2006: <bold>(a)</bold> RFR–SOM-FFN; <bold>(b)</bold> SVR–SOM-FFN; <bold>(c)</bold> RFR–SVR.
</p></caption>
        <?xmltex \hack{\hsize\textwidth}?>
        <?xmltex \igopts{width=355.659449pt}?><graphic xlink:href="https://bg.copernicus.org/articles/14/5551/2017/bg-14-5551-2017-f11.png"/>

      </fig>

</app>

<app id="App1.Ch1.S2">
  <title>Synthetic data experiments</title>

      <?xmltex \floatpos{h!}?><fig id="App1.Ch1.F4"><caption><p id="d1e5674">Two-dimensional histograms for the distributions of out-of-sample estimates of <inline-formula><mml:math id="M362" display="inline"><mml:mrow><mml:mi mathvariant="normal">Δ</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:math></inline-formula>CO<inline-formula><mml:math id="M363" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> relative to target <inline-formula><mml:math id="M364" display="inline"><mml:mrow><mml:mi mathvariant="normal">Δ</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:math></inline-formula>CO<inline-formula><mml:math id="M365" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> (BIOPERIANT05).
The top row <bold>(a–c)</bold> shows estimates made by RFR and the bottom row
<bold>(d–f)</bold> shows estimates of SVR. The first column <bold>(a, c)</bold>
shows those estimates trained SOCAT v3 locations without coordinate variables
(time, latitude and longitude) as proxy variables and the second column
<bold>(b, e)</bold> shows those with coordinate proxies. The last column
<bold>(c, f)</bold> shows estimates trained with random locations (uniform in
time and space) with coordinate proxies. The metrics are shown on each plot
where MAE and RMSE are mean absolute error and root mean squared error, respectively. The number of observations in the estimate is denoted by <inline-formula><mml:math id="M366" display="inline"><mml:mi>n</mml:mi></mml:math></inline-formula>.
</p></caption>
        <?xmltex \hack{\hsize\textwidth}?>
        <?xmltex \igopts{width=441.017717pt}?><graphic xlink:href="https://bg.copernicus.org/articles/14/5551/2017/bg-14-5551-2017-f12.pdf"/>

      </fig>

<?xmltex \hack{\clearpage}?>
</app>
  </app-group><notes notes-type="competinginterests">

      <p id="d1e5752">The authors declare that they have no conflict of
interest.</p>
  </notes><ack><title>Acknowledgements</title><p id="d1e5758">This work is part of a PhD funded by the ACCESS program. The authors would
like to thank Marina Lévy for use of the BIOPERIANT05-GAA95b model data.
This work was partly enabled by the Centre for High Performance Computing
(CSIR).</p><p id="d1e5760">The Surface Ocean CO<inline-formula><mml:math id="M367" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> Atlas (SOCAT) is an international effort, endorsed
by the International Ocean Carbon Coordination Project (IOCCP), the Surface
Ocean Lower Atmosphere Study (SOLAS) and the Integrated Marine
Biogeochemistry and Ecosystem Research program (IMBER), to deliver a
uniformly quality-controlled surface ocean CO<inline-formula><mml:math id="M368" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> database. The many
researchers and funding agencies responsible for the collection of data and
quality control are thanked for their contributions to SOCAT.
<?xmltex \hack{\newline}?><?xmltex \hack{\newline}?> Edited by: Laurent Bopp
<?xmltex \hack{\newline}?>Reviewed by: Nicolas Gruber and one anonymous referee</p></ack><ref-list>
    <title>References</title>

      <ref id="bib1.bibx1"><label>Atlas et al.(2011)Atlas, Hoffman, Ardizzone, Leidner, Jusem, Smith,
and Gombos</label><mixed-citation>Atlas, R., Hoffman, R. N., Ardizzone, J., Leidner, S. M., Jusem, J. C.,
Smith,
D. K., and Gombos, D.: A Cross-calibrated, Multiplatform Ocean Surface Wind
Velocity Product for Meteorological and Oceanographic Applications, B.
Am. Meteorol. Soc., 92, 157–174,
<ext-link xlink:href="https://doi.org/10.1175/2010BAMS2946.1" ext-link-type="DOI">10.1175/2010BAMS2946.1</ext-link>,
2011.</mixed-citation></ref>
      <ref id="bib1.bibx2"><label>Bakker et al.(2008)Bakker, Hoppema, Schr, Geibert, and
Baar</label><mixed-citation>Bakker, D. C. E., Hoppema, M., Schröder, M., Geibert, W., and de Baar, H.
J. W.: A rapid transition from ice covered CO<inline-formula><mml:math id="M369" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula>–rich waters to a
biologically mediated CO<inline-formula><mml:math id="M370" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> sink in the eastern Weddell Gyre,
Biogeosciences, 5, 1373–1386, <ext-link xlink:href="https://doi.org/10.5194/bg-5-1373-2008" ext-link-type="DOI">10.5194/bg-5-1373-2008</ext-link>, 2008.</mixed-citation></ref>
      <ref id="bib1.bibx3"><label>Bakker et al.(2016)</label><mixed-citation>Bakker, D. C. E., Pfeil, B., Landa, C. S., Metzl, N., O'Brien, K. M., Olsen,
A., Smith, K., Cosca, C., Harasawa, S., Jones, S. D., Nakaoka, S.-I., Nojiri,
Y., Schuster, U., Steinhoff, T., Sweeney, C., Takahashi, T., Tilbrook, B.,
Wada, C., Wanninkhof, R., Alin, S. R., Balestrini, C. F., Barbero, L., Bates,
N. R., Bianchi, A. A., Bonou, F., Boutin, J., Bozec, Y., Burger, E. F., Cai,
W.-J., Castle, R. D., Chen, L., Chierici, M., Currie, K., Evans, W.,
Featherstone, C., Feely, R. A., Fransson, A., Goyet, C., Greenwood, N.,
Gregor, L., Hankin, S., Hardman-Mountford, N. J., Harlay, J., Hauck, J.,
Hoppema, M., Humphreys, M. P., Hunt, C. W., Huss, B., Ibánhez, J. S. P.,
Johannessen, T., Keeling, R., Kitidis, V., Körtzinger, A., Kozyr, A.,
Krasakopoulou, E., Kuwata, A., Landschützer, P., Lauvset, S. K., Lefèvre,
N., Lo Monaco, C., Manke, A., Mathis, J. T., Merlivat, L., Millero, F. J.,
Monteiro, P. M. S., Munro, D. R., Murata, A., Newberger, T., Omar, A. M.,
Ono, T., Paterson, K., Pearce, D., Pierrot, D., Robbins, L. L., Saito, S.,
Salisbury, J., Schlitzer, R., Schneider, B., Schweitzer, R., Sieger, R.,
Skjelvan, I., Sullivan, K. F., Sutherland, S. C., Sutton, A. J., Tadokoro,
K., Telszewski, M., Tuma, M., van Heuven, S. M. A. C., Vandemark, D., Ward,
B., Watson, A. J., and Xu, S.: A multi-decade record of high-quality
<inline-formula><mml:math id="M371" display="inline"><mml:mi>f</mml:mi></mml:math></inline-formula>CO<inline-formula><mml:math id="M372" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> data in version 3 of the Surface Ocean CO<inline-formula><mml:math id="M373" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> Atlas (SOCAT), Earth
Syst. Sci. Data, 8, 383–413, <ext-link xlink:href="https://doi.org/10.5194/essd-8-383-2016" ext-link-type="DOI">10.5194/essd-8-383-2016</ext-link>, 2016.</mixed-citation></ref>
      <ref id="bib1.bibx4"><label>Biastoch et al.(2008)Biastoch, Böning, Getzlaff, Molines, and
Madec</label><mixed-citation>Biastoch, A., Böning, C. W., Getzlaff, J., Molines, J.-M., and Madec,
G.:
Causes of Interannual–Decadal Variability in the Meridional Overturning
Circulation of the Midlatitude North Atlantic Ocean, J. Climate, 21,
6599–6615, <ext-link xlink:href="https://doi.org/10.1175/2008JCLI2404.1" ext-link-type="DOI">10.1175/2008JCLI2404.1</ext-link>,
2008.</mixed-citation></ref>
      <ref id="bib1.bibx5"><label>Boyd and Ellwood(2010)</label><mixed-citation>Boyd, P. W. and Ellwood, M. J.: The biogeochemical cycle of iron in the
ocean, Nat. Geosci., 3, 675–682, <ext-link xlink:href="https://doi.org/10.1038/ngeo964" ext-link-type="DOI">10.1038/ngeo964</ext-link>, 2010.</mixed-citation></ref>
      <ref id="bib1.bibx6"><label>Breiman(2001)</label><mixed-citation>Breiman, L.: Random forests, Mach. Learn., 45, 5–32,
<ext-link xlink:href="https://doi.org/10.1023/A:1010933404324" ext-link-type="DOI">10.1023/A:1010933404324</ext-link>, 2001.</mixed-citation></ref>
      <ref id="bib1.bibx7"><label>Carranza and Gille(2015)</label><mixed-citation>Carranza, M. M. and Gille, S. T.: Southern Ocean wind-driven entrainment
enhances satellite chlorophyll-a through the summer, J. Geophys.
Res.-Oceans, 120, 304–323, <ext-link xlink:href="https://doi.org/10.1002/2014JC010203" ext-link-type="DOI">10.1002/2014JC010203</ext-link>,
2015.</mixed-citation></ref>
      <ref id="bib1.bibx8"><label>Caruana and Niculescu-Mizil(2006)</label><mixed-citation>Caruana, R. and Niculescu-Mizil, A.: An empirical comparison of supervised
learning algorithms, Proceedings of the 23th International Conference on
Machine Learning, pp. 161–168, <ext-link xlink:href="https://doi.org/10.1145/1143844.1143865" ext-link-type="DOI">10.1145/1143844.1143865</ext-link>,
2006.</mixed-citation></ref>
      <ref id="bib1.bibx9"><label>Chierici et al.(2012)Chierici, Signorini, Mattsdotter-Björk,
Fransson, and Olsen</label><mixed-citation>Chierici, M., Signorini, S. R., Mattsdotter-Björk, M., Fransson, A.,
and
Olsen, A.: Surface water fCO<inline-formula><mml:math id="M374" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> algorithms for the high-latitude Pacific
sector of the Southern Ocean, Remote Sens. Environ., 119, 184–196,
<ext-link xlink:href="https://doi.org/10.1016/j.rse.2011.12.020" ext-link-type="DOI">10.1016/j.rse.2011.12.020</ext-link>,
2012.</mixed-citation></ref>
      <ref id="bib1.bibx10"><label>Dufour et al.(2012)Dufour, Sommer, Zika, Gehlen, Orr, Mathiot, and
Barnier</label><mixed-citation>Dufour, C. O., Sommer, L. L., Zika, J. D., Gehlen, M., Orr, J. C., Mathiot,
P.,
and Barnier, B.: Standing and transient eddies in the response of the
Southern Ocean meridional overturning to the Southern annular mode, J.
Climate, 25, 6958–6974, <ext-link xlink:href="https://doi.org/10.1175/JCLI-D-11-00309.1" ext-link-type="DOI">10.1175/JCLI-D-11-00309.1</ext-link>, 2012.</mixed-citation></ref>
      <ref id="bib1.bibx11"><label>Fay and McKinley(2014)</label><mixed-citation>Fay, A. R. and McKinley, G. A.: Global open-ocean biomes: mean and temporal
variability, Earth Syst. Sci. Data, 6, 273–284,
<ext-link xlink:href="https://doi.org/10.5194/essd-6-273-2014" ext-link-type="DOI">10.5194/essd-6-273-2014</ext-link>, 2014.</mixed-citation></ref>
      <ref id="bib1.bibx12"><label>Friedrich and Oschlies(2009)</label><mixed-citation>Friedrich, T. and Oschlies, A.: Neural network-based estimates of North
Atlantic surface pCO<inline-formula><mml:math id="M375" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> from satellite data: A methodological study, J.
Geophys. Res., 114, 1–12, <ext-link xlink:href="https://doi.org/10.1029/2007JC004646" ext-link-type="DOI">10.1029/2007JC004646</ext-link>, 2009.</mixed-citation></ref>
      <ref id="bib1.bibx13"><label>Frölicher et al.(2015)Frölicher, Sarmiento, Paynter,
Dunne, Krasting, and Winton</label><mixed-citation>Frölicher, T. L., Sarmiento, J. L., Paynter, D. J., Dunne, J. P.,
Krasting, J. P., and Winton, M.: Dominance of the Southern Ocean in
anthropogenic carbon and heat uptake in CMIP5 models, J. Climate,
28, 862–886, <ext-link xlink:href="https://doi.org/10.1175/JCLI-D-14-00117.1" ext-link-type="DOI">10.1175/JCLI-D-14-00117.1</ext-link>, 2015.</mixed-citation></ref>
      <ref id="bib1.bibx14"><label>Gade(2010)</label><mixed-citation>Gade, K.: A Non-singular Horizontal Position Representation, J.
Navigation, 63, 395–417, <ext-link xlink:href="https://doi.org/10.1017/S0373463309990415" ext-link-type="DOI">10.1017/S0373463309990415</ext-link>,
2010.</mixed-citation></ref>
      <ref id="bib1.bibx15"><label>Gregor et al.(2017)</label><mixed-citation>Gregor, L., Kok, S., and Monteiro, P. M. S.: Dataset for empirically
estimated <inline-formula><mml:math id="M376" display="inline"><mml:mi>p</mml:mi></mml:math></inline-formula>CO<inline-formula><mml:math id="M377" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> for the Southern Ocean: RFR and SVR,
<ext-link xlink:href="https://doi.org/10.6084/m9.figshare.5369038" ext-link-type="DOI">10.6084/m9.figshare.5369038</ext-link>, 2017.</mixed-citation></ref>
      <ref id="bib1.bibx16"><label>Hastie et al.(2009)Hastie, Tibshirani, and Friedman</label><mixed-citation>
Hastie, T., Tibshirani, R., and Friedman, J.: The Elements of Statistical
Learning: Data mining, Inference, and Prediction, Springer, 2nd Edn.,
2009.</mixed-citation></ref>
      <ref id="bib1.bibx17"><label>Hoyer and Hamman(2017)</label><mixed-citation>Hoyer, S. and Hamman, J. J.: xarray: N-D labeled Arrays and Datasets in
Python, Journal of Open Research Software, 5, 1–6, <ext-link xlink:href="https://doi.org/10.5334/jors.148" ext-link-type="DOI">10.5334/jors.148</ext-link>,
2017.</mixed-citation></ref>
      <ref id="bib1.bibx18"><label>Johnson et al.(2017)Johnson, Plant, Coletti, Jannasch, Sakamoto,
Riser, Swift, Williams, Boss, Haëntjens, Talley, and
Sarmiento</label><mixed-citation>Johnson, K. S., Plant, J. N., Coletti, L. J., Jannasch, H. W., Sakamoto,
C. M.,
Riser, S. C., Swift, D. D., Williams, N. L., Boss, E., Haëntjens, N.,
Talley, L. D., and Sarmiento, J. L.: Biogeochemical sensor performance in
the SOCCOM profiling float array, J. Geophys. Res.-Oceans, 122, 6416–6436,
<ext-link xlink:href="https://doi.org/10.1002/2017JC012838" ext-link-type="DOI">10.1002/2017JC012838</ext-link>,
2017.</mixed-citation></ref>
      <ref id="bib1.bibx19"><label>Jones et al.(2012)</label><mixed-citation>Jones, S. D., Le Quéré, C., and Rdenbeck, C.: Autocorrelation
characteristics of surface ocean pCO<inline-formula><mml:math id="M378" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> and air-sea CO<inline-formula><mml:math id="M379" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> fluxes, Global
Biogeochem. Cy., 26, 1–12, <ext-link xlink:href="https://doi.org/10.1029/2010GB004017" ext-link-type="DOI">10.1029/2010GB004017</ext-link>, 2012.</mixed-citation></ref>
      <ref id="bib1.bibx20"><label>Kanamitsu et al.(2002)Kanamitsu, Ebisuzaki, Woollen, Yang, Hnilo,
Fiorino, and Potter</label><mixed-citation>Kanamitsu, M., Ebisuzaki, W., Woollen, J., Yang, S. K., Hnilo, J. J.,
Fiorino,
M., and Potter, G. L.: NCEP-DOE AMIP-II reanalysis (R-2), B.
Am. Meteorol. Soc., 83, 1631–1643+1559,
<ext-link xlink:href="https://doi.org/10.1175/BAMS-83-11-1631" ext-link-type="DOI">10.1175/BAMS-83-11-1631</ext-link>, 2002.</mixed-citation></ref>
      <ref id="bib1.bibx21"><label>Khatiwala et al.(2013)Khatiwala, Tanhua, Mikaloff Fletcher, Gerber,
Doney, Graven, Gruber, McKinley, Murata, Ríos, and
Sabine</label><mixed-citation>Khatiwala, S., Tanhua, T., Mikaloff Fletcher, S., Gerber, M., Doney, S. C.,
Graven, H. D., Gruber, N., McKinley, G. A., Murata, A., Ríos, A. F., and
Sabine, C. L.: Global ocean storage of anthropogenic carbon, Biogeosciences,
10, 2169–2191, <ext-link xlink:href="https://doi.org/10.5194/bg-10-2169-2013" ext-link-type="DOI">10.5194/bg-10-2169-2013</ext-link>, 2013.</mixed-citation></ref>
      <ref id="bib1.bibx22"><label>Landschützer et al.(2014)Landschützer, Gruber, Bakker,
and Schuster</label><mixed-citation>Landschützer, P., Gruber, N., Bakker, D. C., and Schuster, U.: Recent
variability of the global ocean carbon sink, Global Planet. Change,
927–949, <ext-link xlink:href="https://doi.org/10.1002/2014GB004853" ext-link-type="DOI">10.1002/2014GB004853</ext-link>,
2014.</mixed-citation></ref>
      <ref id="bib1.bibx23"><label>Landschützer et al.(2015)Landschützer, Gruber, Haumann,
Rödenbeck, Bakker, Van Heuven, Hoppema, Metzl, Sweeney, Takahashi,
Tilbrook, and Wanninkhof</label><mixed-citation>Landschützer, P., Gruber, N., Haumann, F. A., Rödenbeck, C.,
Bakker, D. C., Van Heuven, S. M., Hoppema, M., Metzl, N., Sweeney, C.,
Takahashi, T. T., Tilbrook, B., and Wanninkhof, R. H.: The reinvigoration of
the Southern Ocean carbon sink, Science, 349, 1221–1224,
<ext-link xlink:href="https://doi.org/10.1126/science.aab2620" ext-link-type="DOI">10.1126/science.aab2620</ext-link>,
2015.</mixed-citation></ref>
      <ref id="bib1.bibx24"><label>Lenton et al.(2006)Lenton, Matear, and Tilbrook</label><mixed-citation>Lenton, A., Matear, R. J., and Tilbrook, B.: Design of an observational
strategy for quantifying the Southern Ocean uptake of CO<inline-formula><mml:math id="M380" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula>, Global
Biogeochem. Cy., 20, GB4010, <ext-link xlink:href="https://doi.org/10.1029/2005GB002620" ext-link-type="DOI">10.1029/2005GB002620</ext-link>, 2006.</mixed-citation></ref>
      <ref id="bib1.bibx25"><label>Lenton et al.(2012)Lenton, Metzl, Takahashi, Kuchinke, Matear, Roy,
Sutherland, Sweeney, and Tilbrook</label><mixed-citation>Lenton, A., Metzl, N., Takahashi, T. T., Kuchinke, M., Matear, R. J., Roy,
T.,
Sutherland, S. C., Sweeney, C., and Tilbrook, B.: The observed evolution of
oceanic pCO<inline-formula><mml:math id="M381" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> and its drivers over the last two decades, Global
Biogeochem. Cy., 26, 1–14, <ext-link xlink:href="https://doi.org/10.1029/2011GB004095" ext-link-type="DOI">10.1029/2011GB004095</ext-link>,
2012.</mixed-citation></ref>
      <ref id="bib1.bibx26"><label>Lenton et al.(2013)Lenton, Tilbrook, Law, Bakker, Doney, Gruber,
Ishii, Hoppema, Lovenduski, Matear, McNeil, Metzl, Fletcher, Monteiro,
Rödenbeck, Sweeney, and Takahashi</label><mixed-citation>Lenton, A., Tilbrook, B., Law, R. M., Bakker, D., Doney, S. C., Gruber, N.,
Ishii, M., Hoppema, M., Lovenduski, N. S., Matear, R. J., McNeil, B. I.,
Metzl, N., Mikaloff Fletcher, S. E., Monteiro, P. M. S., Rödenbeck, C.,
Sweeney, C., and Takahashi, T.: Sea–air CO<inline-formula><mml:math id="M382" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> fluxes in the Southern Ocean
for the period 1990–2009, Biogeosciences, 10, 4037–4054,
<ext-link xlink:href="https://doi.org/10.5194/bg-10-4037-2013" ext-link-type="DOI">10.5194/bg-10-4037-2013</ext-link>, 2013.</mixed-citation></ref>
      <ref id="bib1.bibx27"><label>Le Quéré et al.(2007)Le Quéré,
Rödenbeck, Buitenhuis, Conway, Langenfelds, Gomez, Labuschagne,
Ramonet, Nakazawa, Metzl, Gillett, and Heimann</label><mixed-citation>Le Quéré, C., Rödenbeck, C., Buitenhuis, E. T., Conway,
T. J., Langenfelds, R., Gomez, A., Labuschagne, C., Ramonet, M., Nakazawa,
T., Metzl, N., Gillett, N. P., and Heimann, M.: Saturation of the Southern
Ocean CO<inline-formula><mml:math id="M383" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> Sink Due to Recent Climate Change, Science, 316, 1735–1738,
<ext-link xlink:href="https://doi.org/10.1126/science.1137004" ext-link-type="DOI">10.1126/science.1137004</ext-link>,
2007.</mixed-citation></ref>
      <ref id="bib1.bibx28"><label>Le Quéré et al.(2016)</label><mixed-citation>Le Quéré, C., Andrew, R. M., Canadell, J. G., Sitch, S., Korsbakken, J.
I., Peters, G. P., Manning, A. C., Boden, T. A., Tans, P. P., Houghton, R.
A., Keeling, R. F., Alin, S., Andrews, O. D., Anthoni, P., Barbero, L., Bopp,
L., Chevallier, F., Chini, L. P., Ciais, P., Currie, K., Delire, C., Doney,
S. C., Friedlingstein, P., Gkritzalis, T., Harris, I., Hauck, J., Haverd, V.,
Hoppema, M., Klein Goldewijk, K., Jain, A. K., Kato, E., Körtzinger, A.,
Landschützer, P., Lef`evre, N., Lenton, A., Lienert, S., Lombardozzi, D.,
Melton, J. R., Metzl, N., Millero, F., Monteiro, P. M. S., Munro, D. R.,
Nabel, J. E. M. S., Nakaoka, S.-I., O'Brien, K., Olsen, A., Omar, A. M., Ono,
T., Pierrot, D., Poulter, B., Rödenbeck, C., Salisbury, J., Schuster, U.,
Schwinger, J., Séférian, R., Skjelvan, I., Stocker, B. D., Sutton, A. J.,
Takahashi, T., Tian, H., Tilbrook, B., van der Laan-Luijkx, I. T., van der
Werf, G. R., Viovy, N., Walker, A. P., Wiltshire, A. J., and Zaehle, S.:
Global Carbon Budget 2016, Earth Syst. Sci. Data, 8, 605–649,
<ext-link xlink:href="https://doi.org/10.5194/essd-8-605-2016" ext-link-type="DOI">10.5194/essd-8-605-2016</ext-link>, 2016.</mixed-citation></ref>
      <ref id="bib1.bibx29"><label>Loh(2011)</label><mixed-citation>Loh, W.-Y.: Classification and regression trees, Wires Data Min. Knowl., 1,
14–23,
<ext-link xlink:href="https://doi.org/10.1002/widm.8" ext-link-type="DOI">10.1002/widm.8</ext-link>,
2011.</mixed-citation></ref>
      <ref id="bib1.bibx30"><label>Louppe(2014)</label><mixed-citation>
Louppe, G.: Understanding random forests: from theory to practice, PhD
thesis, University of Liege, 2014.</mixed-citation></ref>
      <ref id="bib1.bibx31"><label>Maritorena and Siegel(2005)</label><mixed-citation>Maritorena, S. and Siegel, D. A.: Consistent merging of satellite ocean
color
data sets using a bio-optical model, Remote Sens. Environ., 94,
429–440, <ext-link xlink:href="https://doi.org/10.1016/j.rse.2004.08.014" ext-link-type="DOI">10.1016/j.rse.2004.08.014</ext-link>, 2005.</mixed-citation></ref>
      <ref id="bib1.bibx32"><label>Masarie et al.(2014)Masarie, Peters, Jacobson, and
Tans</label><mixed-citation>Masarie, K. A., Peters, W., Jacobson, A. R., and Tans, P. P.: ObsPack: a
framework for the preparation, delivery, and attribution of atmospheric
greenhouse gas measurements, Earth Syst. Sci. Data, 6, 375–384,
<ext-link xlink:href="https://doi.org/10.5194/essd-6-375-2014" ext-link-type="DOI">10.5194/essd-6-375-2014</ext-link>, 2014.</mixed-citation></ref>
      <ref id="bib1.bibx33"><label>Menemenlis et al.(2008)Menemenlis, Campin, Heimbach, Hill, Lee,
Nguyen, Schodlok, Zhang, and J-M. Campin</label><mixed-citation>
Menemenlis, D., Campin, J.-M., Heimbach, P., Hill, C. N., Lee, T., Nguyen,
A.,
Schodlok, M.,  and Zhang, H.: ECCO2: High Resolution Global
Ocean and Sea Ice Data Synthesis, Mercator Ocean Quarterly Newsletter, 31,
13–21,
2008.</mixed-citation></ref>
      <ref id="bib1.bibx34"><label>Met Office(2017)</label><mixed-citation>Met Office: Iris: A Python library for analysing and visualising
meteorological and oceanographic data sets, Exeter, Devon, v1.2 Edn.,
available at: <uri>http://scitools.org.uk/</uri>, last access: 12 August 2017.</mixed-citation></ref>
      <ref id="bib1.bibx35"><label>Metzl et al.(2006)Metzl, Brunet, Jabaud-Jan, Poisson, and
Schauer</label><mixed-citation>Metzl, N., Brunet, C., Jabaud-Jan, A., Poisson, A., and Schauer, B.: Summer
and winter air-sea CO<inline-formula><mml:math id="M384" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> fluxes in the Southern Ocean, Deep-Sea Res. Pt.  I, 53, 1548–1563,
<ext-link xlink:href="https://doi.org/10.1016/j.dsr.2006.07.006" ext-link-type="DOI">10.1016/j.dsr.2006.07.006</ext-link>, 2006.</mixed-citation></ref>
      <ref id="bib1.bibx36"><label>Mongwe et al.(2016)Mongwe, Chang, and Monteiro</label><mixed-citation>Mongwe, N. P., Chang, N., and Monteiro, P. M. S.: The seasonal cycle as a
mode
to diagnose biases in modelled CO<inline-formula><mml:math id="M385" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> fluxes in the Southern Ocean, Ocean
Model., 106, 90–103, <ext-link xlink:href="https://doi.org/10.1016/j.ocemod.2016.09.006" ext-link-type="DOI">10.1016/j.ocemod.2016.09.006</ext-link>,
2016.</mixed-citation></ref>
      <ref id="bib1.bibx37"><label>Monteiro(2010)</label><mixed-citation>Monteiro, P. M. S.: A Global Sea Surface Carbon Observing System: Assessment
of Changing Sea Surface CO<inline-formula><mml:math id="M386" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> and Air-Sea CO<inline-formula><mml:math id="M387" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> Fluxes, Proceedings of
OceanObs'09: Sustained Ocean Observations and Information for Society,
702–714, <ext-link xlink:href="https://doi.org/10.5270/OceanObs09.cwp.64" ext-link-type="DOI">10.5270/OceanObs09.cwp.64</ext-link>,
2010.</mixed-citation></ref>
      <ref id="bib1.bibx38"><label>Mountrakis et al.(2011)Mountrakis, Im, and Ogole</label><mixed-citation>Mountrakis, G., Im, J., and Ogole, C.: Support vector machines in remote
sensing: A review, ISPRS J. Photogramme.g, 66,
247–259, <ext-link xlink:href="https://doi.org/10.1016/j.isprsjprs.2010.11.001" ext-link-type="DOI">10.1016/j.isprsjprs.2010.11.001</ext-link>,
2011.</mixed-citation></ref>
      <ref id="bib1.bibx39"><label>Pedregosa et al.(2011)Pedregosa, Varoquaux, Gramfort, Michel,
Thirion, Grisel, Blondel, Prettenhoffer, Weiss, Dubourg, Vanderplas, Passos,
and Cournapeau</label><mixed-citation>
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, C., Thirion, B., Grisel,
O., Blondel, M., Prettenhoffer, P., Weiss, R., Dubourg, V., Vanderplas, J.,
Passos, A., and Cournapeau, D.: Scikit-learn: Machine learning in Python,
J. Mach. Learn. Res., 12, 2825–2830,
2011.</mixed-citation></ref>
      <ref id="bib1.bibx40"><label>Reynolds et al.(2007)Reynolds, Smith, Liu, Chelton, Casey, and
Schlax</label><mixed-citation>Reynolds, R. W., Smith, T. M., Liu, C., Chelton, D. B., Casey, K. S., and
Schlax, M. G.: Daily high-resolution-blended analyses for sea surface
temperature, J. Climate, 20, 5473–5496,
<ext-link xlink:href="https://doi.org/10.1175/2007JCLI1824.1" ext-link-type="DOI">10.1175/2007JCLI1824.1</ext-link>, 2007.</mixed-citation></ref>
      <ref id="bib1.bibx41"><label>Rödenbeck et al.(2014)Rödenbeck, Bakker, Metzl, Olsen,
Sabine, Cassar, Reum, Keeling, and Heimann</label><mixed-citation>Rödenbeck, C., Bakker, D. C. E., Metzl, N., Olsen, A., Sabine, C., Cassar,
N., Reum, F., Keeling, R. F., and Heimann, M.: Interannual sea–air CO<inline-formula><mml:math id="M388" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula>
flux variability from an observation-driven ocean mixed-layer scheme,
Biogeosciences, 11, 4599–4613, <ext-link xlink:href="https://doi.org/10.5194/bg-11-4599-2014" ext-link-type="DOI">10.5194/bg-11-4599-2014</ext-link>,
2014.</mixed-citation></ref>
      <ref id="bib1.bibx42"><label>Rödenbeck et al.(2015)Rödenbeck, Bakker, Gruber, Iida,
Jacobson, Jones, Landschützer, Metzl, Nakaoka, Olsen, Park, Peylin,
Rodgers, Sasse, Schuster, Shutler, Valsala, Wanninkhof, and
Zeng</label><mixed-citation>Rödenbeck, C., Bakker, D. C. E., Gruber, N., Iida, Y., Jacobson, A. R.,
Jones, S., Landschützer, P., Metzl, N., Nakaoka, S., Olsen, A., Park,
G.-H., Peylin, P., Rodgers, K. B., Sasse, T. P., Schuster, U., Shutler, J.
D., Valsala, V., Wanninkhof, R., and Zeng, J.: Data-based estimates of the
ocean carbon sink variability – first results of the Surface Ocean <inline-formula><mml:math id="M389" display="inline"><mml:mi>p</mml:mi></mml:math></inline-formula>CO<inline-formula><mml:math id="M390" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula>
Mapping intercomparison (SOCOM), Biogeosciences, 12, 7251–7278,
<ext-link xlink:href="https://doi.org/10.5194/bg-12-7251-2015" ext-link-type="DOI">10.5194/bg-12-7251-2015</ext-link>, 2015.</mixed-citation></ref>
      <ref id="bib1.bibx43"><label>Russell et al.(2014)Russell, Sarmiento, Cullen, Hotinski, Johnson,
Riser, and Talley</label><mixed-citation>
Russell, J. L., Sarmiento, J. L., Cullen, H., Hotinski, R., Johnson, K. S.,
Riser, S. C., and Talley, L. D.: The Southern Ocean Carbon and Climate
Observations and Modeling Program (SOCCOM), Ocean Carbon and Biogeochemistry
Article, 7, 1–28,
2014.</mixed-citation></ref>
      <ref id="bib1.bibx44"><label>Sasse et al.(2013)Sasse, McNeil, and Abramowitz</label><mixed-citation>Sasse, T. P., McNeil, B. I., and Abramowitz, G.: A novel method for
diagnosing seasonal to inter-annual surface ocean carbon dynamics from bottle
data using neural networks, Biogeosciences, 10, 4319–4340,
<ext-link xlink:href="https://doi.org/10.5194/bg-10-4319-2013" ext-link-type="DOI">10.5194/bg-10-4319-2013</ext-link>, 2013.</mixed-citation></ref>
      <ref id="bib1.bibx45"><label>Smola et al.(2004)Smola, Schölkopf, and Olkopf</label><mixed-citation>Smola, A. J., Schölkopf, B., and Olkopf, B.: A Tutorial on Support
Vector Regression, Stat. Comput., 14, 199–222, <ext-link xlink:href="https://doi.org/10.1023/B:Stco.0000035301.49549.88" ext-link-type="DOI">10.1023/B:Stco.0000035301.49549.88</ext-link>, 2004.</mixed-citation></ref>
      <ref id="bib1.bibx46"><label>Tagliabue et al.(2014)Tagliabue, Sallée, Bowie, Lévy,
Swart, and Boyd</label><mixed-citation>Tagliabue, A., Sallée, J.-B., Bowie, A. R., Lévy, M., Swart, S.,
and Boyd, P. W.: Surface-water iron supplies in the Southern Ocean sustained
by deep winter mixing, Nat. Geosci., 7, 314–320,
<ext-link xlink:href="https://doi.org/10.1038/NGEO2101" ext-link-type="DOI">10.1038/NGEO2101</ext-link>, 2014.</mixed-citation></ref>
      <ref id="bib1.bibx47"><label>Takahashi et al.(2002)Takahashi, Sutherland, Sweeney, Poisson, Metzl,
Tilbrook, Bates, Wanninkhof, Feely, Sabine, Olafsson, and
Nojiri</label><mixed-citation>Takahashi, T. T., Sutherland, S. C., Sweeney, C., Poisson, A., Metzl, N.,
Tilbrook, B., Bates, N. R., Wanninkhof, R. H., Feely, R. A., Sabine, C. L.,
Olafsson, J., and Nojiri, Y.: Global sea-air CO<inline-formula><mml:math id="M391" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> flux based on
climatological surface ocean pCO<inline-formula><mml:math id="M392" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula>, and seasonal biological and temperature
effects, Deep-Sea Res. Pt. II, 49,
1601–1622, <ext-link xlink:href="https://doi.org/10.1016/S0967-0645(02)00003-6" ext-link-type="DOI">10.1016/S0967-0645(02)00003-6</ext-link>, 2002.
</mixed-citation></ref><?xmltex \hack{\newpage}?>
      <ref id="bib1.bibx48"><label>Takahashi et al.(2012)Takahashi, Sweeney, Hales, Chipman, Newberger,
Goddard, Iannuzzi, and Sutherland</label><mixed-citation>Takahashi, T. T., Sweeney, C., Hales, B., Chipman, D. W., Newberger, T.,
Goddard, J. G., Iannuzzi, R., and Sutherland, S. C.: The Changing Carbon
Cycle in the Southern Ocean, Oceanography, 25, 26–37,
<ext-link xlink:href="https://doi.org/10.5670/oceanog.2012.71" ext-link-type="DOI">10.5670/oceanog.2012.71</ext-link>, 2012.</mixed-citation></ref>
      <ref id="bib1.bibx49"><label>Thomalla et al.(2011)Thomalla, Fauchereau, Swart, and
Monteiro</label><mixed-citation>Thomalla, S. J., Fauchereau, N., Swart, S., and Monteiro, P. M. S.: Regional
scale characteristics of the seasonal cycle of chlorophyll in the Southern
Ocean, Biogeosciences, 8, 2849–2866, <ext-link xlink:href="https://doi.org/10.5194/bg-8-2849-2011" ext-link-type="DOI">10.5194/bg-8-2849-2011</ext-link>,
2011.</mixed-citation></ref>
      <ref id="bib1.bibx50"><label>Vapnik(1999)</label><mixed-citation>Vapnik, V.: An overview of statistical learning theory, IEEE T.
Neural Netw., 10, 988–999, <ext-link xlink:href="https://doi.org/10.1109/72.788640" ext-link-type="DOI">10.1109/72.788640</ext-link>,
1999.</mixed-citation></ref>
      <ref id="bib1.bibx51"><label>Wanninkhof(2014)</label><mixed-citation>Wanninkhof, R. H.: Relationship between wind speed and gas exchange over the
ocean revisited, Limnol. Oceanogr., 12, 351–362,
<ext-link xlink:href="https://doi.org/10.4319/lom.2014.12.351" ext-link-type="DOI">10.4319/lom.2014.12.351</ext-link>,
2014.</mixed-citation></ref>
      <ref id="bib1.bibx52"><label>Weiss(1974)</label><mixed-citation>Weiss, R.: Carbon dioxide in water and seawater: the solubility of a
non-ideal
gas, Mar. Chem., 2, 203–215, <ext-link xlink:href="https://doi.org/10.1016/0304-4203(74)90015-2" ext-link-type="DOI">10.1016/0304-4203(74)90015-2</ext-link>,
1974.</mixed-citation></ref>
      <ref id="bib1.bibx53"><label>Williams et al.(2017)Williams, Juranek, Feely, Johnson, Sarmiento,
Talley, Dickson, Gray, Wanninkhof, Russell, Riser, and
Takeshita</label><mixed-citation>Williams, N. L., Juranek, L. W., Feely, R. A., Johnson, K. S., Sarmiento,
J. L., Talley, L. D., Dickson, A. G., Gray, A. R., Wanninkhof, R. H.,
Russell, J. L., Riser, S. C., and Takeshita, Y.: Calculating surface ocean
pCO<inline-formula><mml:math id="M393" display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> from biogeochemical Argo floats equipped with pH: An uncertainty
analysis, Global Biogeochem. Cy., 31, 591–604,
<ext-link xlink:href="https://doi.org/10.1002/2016GB005541" ext-link-type="DOI">10.1002/2016GB005541</ext-link>, 2017.</mixed-citation></ref>

  </ref-list><app-group content-type="float"><app><title/>

    </app></app-group></back>
    <!--<article-title-html>Empirical methods for the estimation of Southern Ocean CO<sub>2</sub>: support vector and random forest regression</article-title-html>
<abstract-html><p class="p">The Southern Ocean accounts for 40 % of oceanic CO<sub>2</sub> uptake, but the
estimates are bound by large uncertainties due to a paucity in observations.
Gap-filling empirical methods have been used to good effect to approximate
<i>p</i>CO<sub>2</sub> from satellite observable variables in other parts of the ocean,
but many of these methods are not in agreement in the Southern Ocean. In this
study we propose two additional methods that perform well in the Southern
Ocean: support vector regression (SVR) and random forest regression (RFR).
The methods are used to estimate Δ<i>p</i>CO<sub>2</sub> in the Southern Ocean based
on SOCAT v3, achieving similar trends to the SOM-FFN method by
Landschützer et al.(2014). Results show that the SOM-FFN and RFR approaches
have RMSEs of similar magnitude (14.84 and 16.45 µatm, where
1 atm  =  101 325 Pa) where the SVR method has a larger RMSE
(24.40 µatm). However, the larger errors for SVR and RFR are, in
part, due to an increase in coastal observations from SOCAT v2 to v3, where
the SOM-FFN method used v2 data. The success of both SOM-FFN and RFR depends
on the ability to adapt to different modes of variability. The SOM-FFN
achieves this by having independent regression models for each cluster, while
this flexibility is intrinsic to the RFR method. Analyses of the estimates
shows that the SVR and RFR's respective sensitivity and robustness to
outliers define the outcome significantly. Further analyses on the methods
were performed by using a synthetic dataset to assess the following: which
method (RFR or SVR) has the best performance? What is the effect of using
time, latitude and longitude as proxy variables on Δ<i>p</i>CO<sub>2</sub>? What is
the impact of the sampling bias in the SOCAT v3 dataset on the estimates? We
find that while RFR is indeed better than SVR, the ensemble of the two
methods outperforms either one, due to complementary strengths and weaknesses
of the methods. Results also show that for the RFR and SVR implementations,
it is better to include coordinates as proxy variables as RMSE scores are
lowered and the phasing of the seasonal cycle is more accurate. Lastly, we
show that there is only a weak bias due to undersampling. The synthetic data
provide a useful framework to test methods in regions of sparse data coverage
and show potential as a useful tool to evaluate methods in future studies.</p></abstract-html>
<ref-html id="bib1.bib1"><label>Atlas et al.(2011)Atlas, Hoffman, Ardizzone, Leidner, Jusem, Smith,
and Gombos</label><mixed-citation>
Atlas, R., Hoffman, R. N., Ardizzone, J., Leidner, S. M., Jusem, J. C.,
Smith,
D. K., and Gombos, D.: A Cross-calibrated, Multiplatform Ocean Surface Wind
Velocity Product for Meteorological and Oceanographic Applications, B.
Am. Meteorol. Soc., 92, 157–174,
<a href="https://doi.org/10.1175/2010BAMS2946.1" target="_blank">https://doi.org/10.1175/2010BAMS2946.1</a>,
2011.
</mixed-citation></ref-html>
<ref-html id="bib1.bib2"><label>Bakker et al.(2008)Bakker, Hoppema, Schr, Geibert, and
Baar</label><mixed-citation>
Bakker, D. C. E., Hoppema, M., Schröder, M., Geibert, W., and de Baar, H.
J. W.: A rapid transition from ice covered CO<sub>2</sub>–rich waters to a
biologically mediated CO<sub>2</sub> sink in the eastern Weddell Gyre,
Biogeosciences, 5, 1373–1386, <a href="https://doi.org/10.5194/bg-5-1373-2008" target="_blank">https://doi.org/10.5194/bg-5-1373-2008</a>, 2008.
</mixed-citation></ref-html>
<ref-html id="bib1.bib3"><label>Bakker et al.(2016)</label><mixed-citation>
Bakker, D. C. E., Pfeil, B., Landa, C. S., Metzl, N., O'Brien, K. M., Olsen,
A., Smith, K., Cosca, C., Harasawa, S., Jones, S. D., Nakaoka, S.-I., Nojiri,
Y., Schuster, U., Steinhoff, T., Sweeney, C., Takahashi, T., Tilbrook, B.,
Wada, C., Wanninkhof, R., Alin, S. R., Balestrini, C. F., Barbero, L., Bates,
N. R., Bianchi, A. A., Bonou, F., Boutin, J., Bozec, Y., Burger, E. F., Cai,
W.-J., Castle, R. D., Chen, L., Chierici, M., Currie, K., Evans, W.,
Featherstone, C., Feely, R. A., Fransson, A., Goyet, C., Greenwood, N.,
Gregor, L., Hankin, S., Hardman-Mountford, N. J., Harlay, J., Hauck, J.,
Hoppema, M., Humphreys, M. P., Hunt, C. W., Huss, B., Ibánhez, J. S. P.,
Johannessen, T., Keeling, R., Kitidis, V., Körtzinger, A., Kozyr, A.,
Krasakopoulou, E., Kuwata, A., Landschützer, P., Lauvset, S. K., Lefèvre,
N., Lo Monaco, C., Manke, A., Mathis, J. T., Merlivat, L., Millero, F. J.,
Monteiro, P. M. S., Munro, D. R., Murata, A., Newberger, T., Omar, A. M.,
Ono, T., Paterson, K., Pearce, D., Pierrot, D., Robbins, L. L., Saito, S.,
Salisbury, J., Schlitzer, R., Schneider, B., Schweitzer, R., Sieger, R.,
Skjelvan, I., Sullivan, K. F., Sutherland, S. C., Sutton, A. J., Tadokoro,
K., Telszewski, M., Tuma, M., van Heuven, S. M. A. C., Vandemark, D., Ward,
B., Watson, A. J., and Xu, S.: A multi-decade record of high-quality
<i>f</i>CO<sub>2</sub> data in version 3 of the Surface Ocean CO<sub>2</sub> Atlas (SOCAT), Earth
Syst. Sci. Data, 8, 383–413, <a href="https://doi.org/10.5194/essd-8-383-2016" target="_blank">https://doi.org/10.5194/essd-8-383-2016</a>, 2016.
</mixed-citation></ref-html>
<ref-html id="bib1.bib4"><label>Biastoch et al.(2008)Biastoch, Böning, Getzlaff, Molines, and
Madec</label><mixed-citation>
Biastoch, A., Böning, C. W., Getzlaff, J., Molines, J.-M., and Madec,
G.:
Causes of Interannual–Decadal Variability in the Meridional Overturning
Circulation of the Midlatitude North Atlantic Ocean, J. Climate, 21,
6599–6615, <a href="https://doi.org/10.1175/2008JCLI2404.1" target="_blank">https://doi.org/10.1175/2008JCLI2404.1</a>,
2008.
</mixed-citation></ref-html>
<ref-html id="bib1.bib5"><label>Boyd and Ellwood(2010)</label><mixed-citation>
Boyd, P. W. and Ellwood, M. J.: The biogeochemical cycle of iron in the
ocean, Nat. Geosci., 3, 675–682, <a href="https://doi.org/10.1038/ngeo964" target="_blank">https://doi.org/10.1038/ngeo964</a>, 2010.
</mixed-citation></ref-html>
<ref-html id="bib1.bib6"><label>Breiman(2001)</label><mixed-citation>
Breiman, L.: Random forests, Mach. Learn., 45, 5–32,
<a href="https://doi.org/10.1023/A:1010933404324" target="_blank">https://doi.org/10.1023/A:1010933404324</a>, 2001.
</mixed-citation></ref-html>
<ref-html id="bib1.bib7"><label>Carranza and Gille(2015)</label><mixed-citation>
Carranza, M. M. and Gille, S. T.: Southern Ocean wind-driven entrainment
enhances satellite chlorophyll-a through the summer, J. Geophys.
Res.-Oceans, 120, 304–323, <a href="https://doi.org/10.1002/2014JC010203" target="_blank">https://doi.org/10.1002/2014JC010203</a>,
2015.
</mixed-citation></ref-html>
<ref-html id="bib1.bib8"><label>Caruana and Niculescu-Mizil(2006)</label><mixed-citation>
Caruana, R. and Niculescu-Mizil, A.: An empirical comparison of supervised
learning algorithms, Proceedings of the 23th International Conference on
Machine Learning, pp. 161–168, <a href="https://doi.org/10.1145/1143844.1143865" target="_blank">https://doi.org/10.1145/1143844.1143865</a>,
2006.
</mixed-citation></ref-html>
<ref-html id="bib1.bib9"><label>Chierici et al.(2012)Chierici, Signorini, Mattsdotter-Björk,
Fransson, and Olsen</label><mixed-citation>
Chierici, M., Signorini, S. R., Mattsdotter-Björk, M., Fransson, A.,
and
Olsen, A.: Surface water fCO<sub>2</sub> algorithms for the high-latitude Pacific
sector of the Southern Ocean, Remote Sens. Environ., 119, 184–196,
<a href="https://doi.org/10.1016/j.rse.2011.12.020" target="_blank">https://doi.org/10.1016/j.rse.2011.12.020</a>,
2012.
</mixed-citation></ref-html>
<ref-html id="bib1.bib10"><label>Dufour et al.(2012)Dufour, Sommer, Zika, Gehlen, Orr, Mathiot, and
Barnier</label><mixed-citation>
Dufour, C. O., Sommer, L. L., Zika, J. D., Gehlen, M., Orr, J. C., Mathiot,
P.,
and Barnier, B.: Standing and transient eddies in the response of the
Southern Ocean meridional overturning to the Southern annular mode, J.
Climate, 25, 6958–6974, <a href="https://doi.org/10.1175/JCLI-D-11-00309.1" target="_blank">https://doi.org/10.1175/JCLI-D-11-00309.1</a>, 2012.
</mixed-citation></ref-html>
<ref-html id="bib1.bib11"><label>Fay and McKinley(2014)</label><mixed-citation>
Fay, A. R. and McKinley, G. A.: Global open-ocean biomes: mean and temporal
variability, Earth Syst. Sci. Data, 6, 273–284,
<a href="https://doi.org/10.5194/essd-6-273-2014" target="_blank">https://doi.org/10.5194/essd-6-273-2014</a>, 2014.
</mixed-citation></ref-html>
<ref-html id="bib1.bib12"><label>Friedrich and Oschlies(2009)</label><mixed-citation>
Friedrich, T. and Oschlies, A.: Neural network-based estimates of North
Atlantic surface pCO<sub>2</sub> from satellite data: A methodological study, J.
Geophys. Res., 114, 1–12, <a href="https://doi.org/10.1029/2007JC004646" target="_blank">https://doi.org/10.1029/2007JC004646</a>, 2009.
</mixed-citation></ref-html>
<ref-html id="bib1.bib13"><label>Frölicher et al.(2015)Frölicher, Sarmiento, Paynter,
Dunne, Krasting, and Winton</label><mixed-citation>
Frölicher, T. L., Sarmiento, J. L., Paynter, D. J., Dunne, J. P.,
Krasting, J. P., and Winton, M.: Dominance of the Southern Ocean in
anthropogenic carbon and heat uptake in CMIP5 models, J. Climate,
28, 862–886, <a href="https://doi.org/10.1175/JCLI-D-14-00117.1" target="_blank">https://doi.org/10.1175/JCLI-D-14-00117.1</a>, 2015.
</mixed-citation></ref-html>
<ref-html id="bib1.bib14"><label>Gade(2010)</label><mixed-citation>
Gade, K.: A Non-singular Horizontal Position Representation, J.
Navigation, 63, 395–417, <a href="https://doi.org/10.1017/S0373463309990415" target="_blank">https://doi.org/10.1017/S0373463309990415</a>,
2010.
</mixed-citation></ref-html>
<ref-html id="bib1.bib15"><label>Gregor et al.(2017)</label><mixed-citation>
Gregor, L., Kok, S., and Monteiro, P. M. S.: Dataset for empirically
estimated <i>p</i>CO<sub>2</sub> for the Southern Ocean: RFR and SVR,
<a href="https://doi.org/10.6084/m9.figshare.5369038" target="_blank">https://doi.org/10.6084/m9.figshare.5369038</a>, 2017.
</mixed-citation></ref-html>
<ref-html id="bib1.bib16"><label>Hastie et al.(2009)Hastie, Tibshirani, and Friedman</label><mixed-citation>
Hastie, T., Tibshirani, R., and Friedman, J.: The Elements of Statistical
Learning: Data mining, Inference, and Prediction, Springer, 2nd Edn.,
2009.
</mixed-citation></ref-html>
<ref-html id="bib1.bib17"><label>Hoyer and Hamman(2017)</label><mixed-citation>
Hoyer, S. and Hamman, J. J.: xarray: N-D labeled Arrays and Datasets in
Python, Journal of Open Research Software, 5, 1–6, <a href="https://doi.org/10.5334/jors.148" target="_blank">https://doi.org/10.5334/jors.148</a>,
2017.
</mixed-citation></ref-html>
<ref-html id="bib1.bib18"><label>Johnson et al.(2017)Johnson, Plant, Coletti, Jannasch, Sakamoto,
Riser, Swift, Williams, Boss, Haëntjens, Talley, and
Sarmiento</label><mixed-citation>
Johnson, K. S., Plant, J. N., Coletti, L. J., Jannasch, H. W., Sakamoto,
C. M.,
Riser, S. C., Swift, D. D., Williams, N. L., Boss, E., Haëntjens, N.,
Talley, L. D., and Sarmiento, J. L.: Biogeochemical sensor performance in
the SOCCOM profiling float array, J. Geophys. Res.-Oceans, 122, 6416–6436,
<a href="https://doi.org/10.1002/2017JC012838" target="_blank">https://doi.org/10.1002/2017JC012838</a>,
2017.
</mixed-citation></ref-html>
<ref-html id="bib1.bib19"><label>Jones et al.(2012)</label><mixed-citation>
Jones, S. D., Le Quéré, C., and Rdenbeck, C.: Autocorrelation
characteristics of surface ocean pCO<sub>2</sub> and air-sea CO<sub>2</sub> fluxes, Global
Biogeochem. Cy., 26, 1–12, <a href="https://doi.org/10.1029/2010GB004017" target="_blank">https://doi.org/10.1029/2010GB004017</a>, 2012.
</mixed-citation></ref-html>
<ref-html id="bib1.bib20"><label>Kanamitsu et al.(2002)Kanamitsu, Ebisuzaki, Woollen, Yang, Hnilo,
Fiorino, and Potter</label><mixed-citation>
Kanamitsu, M., Ebisuzaki, W., Woollen, J., Yang, S. K., Hnilo, J. J.,
Fiorino,
M., and Potter, G. L.: NCEP-DOE AMIP-II reanalysis (R-2), B.
Am. Meteorol. Soc., 83, 1631–1643+1559,
<a href="https://doi.org/10.1175/BAMS-83-11-1631" target="_blank">https://doi.org/10.1175/BAMS-83-11-1631</a>, 2002.
</mixed-citation></ref-html>
<ref-html id="bib1.bib21"><label>Khatiwala et al.(2013)Khatiwala, Tanhua, Mikaloff Fletcher, Gerber,
Doney, Graven, Gruber, McKinley, Murata, Ríos, and
Sabine</label><mixed-citation>
Khatiwala, S., Tanhua, T., Mikaloff Fletcher, S., Gerber, M., Doney, S. C.,
Graven, H. D., Gruber, N., McKinley, G. A., Murata, A., Ríos, A. F., and
Sabine, C. L.: Global ocean storage of anthropogenic carbon, Biogeosciences,
10, 2169–2191, <a href="https://doi.org/10.5194/bg-10-2169-2013" target="_blank">https://doi.org/10.5194/bg-10-2169-2013</a>, 2013.
</mixed-citation></ref-html>
<ref-html id="bib1.bib22"><label>Landschützer et al.(2014)Landschützer, Gruber, Bakker,
and Schuster</label><mixed-citation>
Landschützer, P., Gruber, N., Bakker, D. C., and Schuster, U.: Recent
variability of the global ocean carbon sink, Global Planet. Change,
927–949, <a href="https://doi.org/10.1002/2014GB004853" target="_blank">https://doi.org/10.1002/2014GB004853</a>,
2014.
</mixed-citation></ref-html>
<ref-html id="bib1.bib23"><label>Landschützer et al.(2015)Landschützer, Gruber, Haumann,
Rödenbeck, Bakker, Van Heuven, Hoppema, Metzl, Sweeney, Takahashi,
Tilbrook, and Wanninkhof</label><mixed-citation>
Landschützer, P., Gruber, N., Haumann, F. A., Rödenbeck, C.,
Bakker, D. C., Van Heuven, S. M., Hoppema, M., Metzl, N., Sweeney, C.,
Takahashi, T. T., Tilbrook, B., and Wanninkhof, R. H.: The reinvigoration of
the Southern Ocean carbon sink, Science, 349, 1221–1224,
<a href="https://doi.org/10.1126/science.aab2620" target="_blank">https://doi.org/10.1126/science.aab2620</a>,
2015.
</mixed-citation></ref-html>
<ref-html id="bib1.bib24"><label>Lenton et al.(2006)Lenton, Matear, and Tilbrook</label><mixed-citation>
Lenton, A., Matear, R. J., and Tilbrook, B.: Design of an observational
strategy for quantifying the Southern Ocean uptake of CO<sub>2</sub>, Global
Biogeochem. Cy., 20, GB4010, <a href="https://doi.org/10.1029/2005GB002620" target="_blank">https://doi.org/10.1029/2005GB002620</a>, 2006.
</mixed-citation></ref-html>
<ref-html id="bib1.bib25"><label>Lenton et al.(2012)Lenton, Metzl, Takahashi, Kuchinke, Matear, Roy,
Sutherland, Sweeney, and Tilbrook</label><mixed-citation>
Lenton, A., Metzl, N., Takahashi, T. T., Kuchinke, M., Matear, R. J., Roy,
T.,
Sutherland, S. C., Sweeney, C., and Tilbrook, B.: The observed evolution of
oceanic pCO<sub>2</sub> and its drivers over the last two decades, Global
Biogeochem. Cy., 26, 1–14, <a href="https://doi.org/10.1029/2011GB004095" target="_blank">https://doi.org/10.1029/2011GB004095</a>,
2012.
</mixed-citation></ref-html>
<ref-html id="bib1.bib26"><label>Lenton et al.(2013)Lenton, Tilbrook, Law, Bakker, Doney, Gruber,
Ishii, Hoppema, Lovenduski, Matear, McNeil, Metzl, Fletcher, Monteiro,
Rödenbeck, Sweeney, and Takahashi</label><mixed-citation>
Lenton, A., Tilbrook, B., Law, R. M., Bakker, D., Doney, S. C., Gruber, N.,
Ishii, M., Hoppema, M., Lovenduski, N. S., Matear, R. J., McNeil, B. I.,
Metzl, N., Mikaloff Fletcher, S. E., Monteiro, P. M. S., Rödenbeck, C.,
Sweeney, C., and Takahashi, T.: Sea–air CO<sub>2</sub> fluxes in the Southern Ocean
for the period 1990–2009, Biogeosciences, 10, 4037–4054,
<a href="https://doi.org/10.5194/bg-10-4037-2013" target="_blank">https://doi.org/10.5194/bg-10-4037-2013</a>, 2013.
</mixed-citation></ref-html>
<ref-html id="bib1.bib27"><label>Le Quéré et al.(2007)Le Quéré,
Rödenbeck, Buitenhuis, Conway, Langenfelds, Gomez, Labuschagne,
Ramonet, Nakazawa, Metzl, Gillett, and Heimann</label><mixed-citation>
Le Quéré, C., Rödenbeck, C., Buitenhuis, E. T., Conway,
T. J., Langenfelds, R., Gomez, A., Labuschagne, C., Ramonet, M., Nakazawa,
T., Metzl, N., Gillett, N. P., and Heimann, M.: Saturation of the Southern
Ocean CO<sub>2</sub> Sink Due to Recent Climate Change, Science, 316, 1735–1738,
<a href="https://doi.org/10.1126/science.1137004" target="_blank">https://doi.org/10.1126/science.1137004</a>,
2007.
</mixed-citation></ref-html>
<ref-html id="bib1.bib28"><label>Le Quéré et al.(2016)</label><mixed-citation>
Le Quéré, C., Andrew, R. M., Canadell, J. G., Sitch, S., Korsbakken, J.
I., Peters, G. P., Manning, A. C., Boden, T. A., Tans, P. P., Houghton, R.
A., Keeling, R. F., Alin, S., Andrews, O. D., Anthoni, P., Barbero, L., Bopp,
L., Chevallier, F., Chini, L. P., Ciais, P., Currie, K., Delire, C., Doney,
S. C., Friedlingstein, P., Gkritzalis, T., Harris, I., Hauck, J., Haverd, V.,
Hoppema, M., Klein Goldewijk, K., Jain, A. K., Kato, E., Körtzinger, A.,
Landschützer, P., Lef`evre, N., Lenton, A., Lienert, S., Lombardozzi, D.,
Melton, J. R., Metzl, N., Millero, F., Monteiro, P. M. S., Munro, D. R.,
Nabel, J. E. M. S., Nakaoka, S.-I., O'Brien, K., Olsen, A., Omar, A. M., Ono,
T., Pierrot, D., Poulter, B., Rödenbeck, C., Salisbury, J., Schuster, U.,
Schwinger, J., Séférian, R., Skjelvan, I., Stocker, B. D., Sutton, A. J.,
Takahashi, T., Tian, H., Tilbrook, B., van der Laan-Luijkx, I. T., van der
Werf, G. R., Viovy, N., Walker, A. P., Wiltshire, A. J., and Zaehle, S.:
Global Carbon Budget 2016, Earth Syst. Sci. Data, 8, 605–649,
<a href="https://doi.org/10.5194/essd-8-605-2016" target="_blank">https://doi.org/10.5194/essd-8-605-2016</a>, 2016.
</mixed-citation></ref-html>
<ref-html id="bib1.bib29"><label>Loh(2011)</label><mixed-citation>
Loh, W.-Y.: Classification and regression trees, Wires Data Min. Knowl., 1,
14–23,
<a href="https://doi.org/10.1002/widm.8" target="_blank">https://doi.org/10.1002/widm.8</a>,
2011.
</mixed-citation></ref-html>
<ref-html id="bib1.bib30"><label>Louppe(2014)</label><mixed-citation>
Louppe, G.: Understanding random forests: from theory to practice, PhD
thesis, University of Liege, 2014.
</mixed-citation></ref-html>
<ref-html id="bib1.bib31"><label>Maritorena and Siegel(2005)</label><mixed-citation>
Maritorena, S. and Siegel, D. A.: Consistent merging of satellite ocean
color
data sets using a bio-optical model, Remote Sens. Environ., 94,
429–440, <a href="https://doi.org/10.1016/j.rse.2004.08.014" target="_blank">https://doi.org/10.1016/j.rse.2004.08.014</a>, 2005.
</mixed-citation></ref-html>
<ref-html id="bib1.bib32"><label>Masarie et al.(2014)Masarie, Peters, Jacobson, and
Tans</label><mixed-citation>
Masarie, K. A., Peters, W., Jacobson, A. R., and Tans, P. P.: ObsPack: a
framework for the preparation, delivery, and attribution of atmospheric
greenhouse gas measurements, Earth Syst. Sci. Data, 6, 375–384,
<a href="https://doi.org/10.5194/essd-6-375-2014" target="_blank">https://doi.org/10.5194/essd-6-375-2014</a>, 2014.
</mixed-citation></ref-html>
<ref-html id="bib1.bib33"><label>Menemenlis et al.(2008)Menemenlis, Campin, Heimbach, Hill, Lee,
Nguyen, Schodlok, Zhang, and J-M. Campin</label><mixed-citation>
Menemenlis, D., Campin, J.-M., Heimbach, P., Hill, C. N., Lee, T., Nguyen,
A.,
Schodlok, M.,  and Zhang, H.: ECCO2: High Resolution Global
Ocean and Sea Ice Data Synthesis, Mercator Ocean Quarterly Newsletter, 31,
13–21,
2008.
</mixed-citation></ref-html>
<ref-html id="bib1.bib34"><label>Met Office(2017)</label><mixed-citation>
Met Office: Iris: A Python library for analysing and visualising
meteorological and oceanographic data sets, Exeter, Devon, v1.2 Edn.,
available at: <a href="http://scitools.org.uk/" target="_blank">http://scitools.org.uk/</a>, last access: 12 August 2017.
</mixed-citation></ref-html>
<ref-html id="bib1.bib35"><label>Metzl et al.(2006)Metzl, Brunet, Jabaud-Jan, Poisson, and
Schauer</label><mixed-citation>
Metzl, N., Brunet, C., Jabaud-Jan, A., Poisson, A., and Schauer, B.: Summer
and winter air-sea CO<sub>2</sub> fluxes in the Southern Ocean, Deep-Sea Res. Pt.  I, 53, 1548–1563,
<a href="https://doi.org/10.1016/j.dsr.2006.07.006" target="_blank">https://doi.org/10.1016/j.dsr.2006.07.006</a>, 2006.
</mixed-citation></ref-html>
<ref-html id="bib1.bib36"><label>Mongwe et al.(2016)Mongwe, Chang, and Monteiro</label><mixed-citation>
Mongwe, N. P., Chang, N., and Monteiro, P. M. S.: The seasonal cycle as a
mode
to diagnose biases in modelled CO<sub>2</sub> fluxes in the Southern Ocean, Ocean
Model., 106, 90–103, <a href="https://doi.org/10.1016/j.ocemod.2016.09.006" target="_blank">https://doi.org/10.1016/j.ocemod.2016.09.006</a>,
2016.
</mixed-citation></ref-html>
<ref-html id="bib1.bib37"><label>Monteiro(2010)</label><mixed-citation>
Monteiro, P. M. S.: A Global Sea Surface Carbon Observing System: Assessment
of Changing Sea Surface CO<sub>2</sub> and Air-Sea CO<sub>2</sub> Fluxes, Proceedings of
OceanObs'09: Sustained Ocean Observations and Information for Society,
702–714, <a href="https://doi.org/10.5270/OceanObs09.cwp.64" target="_blank">https://doi.org/10.5270/OceanObs09.cwp.64</a>,
2010.
</mixed-citation></ref-html>
<ref-html id="bib1.bib38"><label>Mountrakis et al.(2011)Mountrakis, Im, and Ogole</label><mixed-citation>
Mountrakis, G., Im, J., and Ogole, C.: Support vector machines in remote
sensing: A review, ISPRS J. Photogramme.g, 66,
247–259, <a href="https://doi.org/10.1016/j.isprsjprs.2010.11.001" target="_blank">https://doi.org/10.1016/j.isprsjprs.2010.11.001</a>,
2011.
</mixed-citation></ref-html>
<ref-html id="bib1.bib39"><label>Pedregosa et al.(2011)Pedregosa, Varoquaux, Gramfort, Michel,
Thirion, Grisel, Blondel, Prettenhoffer, Weiss, Dubourg, Vanderplas, Passos,
and Cournapeau</label><mixed-citation>
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, C., Thirion, B., Grisel,
O., Blondel, M., Prettenhoffer, P., Weiss, R., Dubourg, V., Vanderplas, J.,
Passos, A., and Cournapeau, D.: Scikit-learn: Machine learning in Python,
J. Mach. Learn. Res., 12, 2825–2830,
2011.
</mixed-citation></ref-html>
<ref-html id="bib1.bib40"><label>Reynolds et al.(2007)Reynolds, Smith, Liu, Chelton, Casey, and
Schlax</label><mixed-citation>
Reynolds, R. W., Smith, T. M., Liu, C., Chelton, D. B., Casey, K. S., and
Schlax, M. G.: Daily high-resolution-blended analyses for sea surface
temperature, J. Climate, 20, 5473–5496,
<a href="https://doi.org/10.1175/2007JCLI1824.1" target="_blank">https://doi.org/10.1175/2007JCLI1824.1</a>, 2007.
</mixed-citation></ref-html>
<ref-html id="bib1.bib41"><label>Rödenbeck et al.(2014)Rödenbeck, Bakker, Metzl, Olsen,
Sabine, Cassar, Reum, Keeling, and Heimann</label><mixed-citation>
Rödenbeck, C., Bakker, D. C. E., Metzl, N., Olsen, A., Sabine, C., Cassar,
N., Reum, F., Keeling, R. F., and Heimann, M.: Interannual sea–air CO<sub>2</sub>
flux variability from an observation-driven ocean mixed-layer scheme,
Biogeosciences, 11, 4599–4613, <a href="https://doi.org/10.5194/bg-11-4599-2014" target="_blank">https://doi.org/10.5194/bg-11-4599-2014</a>,
2014.
</mixed-citation></ref-html>
<ref-html id="bib1.bib42"><label>Rödenbeck et al.(2015)Rödenbeck, Bakker, Gruber, Iida,
Jacobson, Jones, Landschützer, Metzl, Nakaoka, Olsen, Park, Peylin,
Rodgers, Sasse, Schuster, Shutler, Valsala, Wanninkhof, and
Zeng</label><mixed-citation>
Rödenbeck, C., Bakker, D. C. E., Gruber, N., Iida, Y., Jacobson, A. R.,
Jones, S., Landschützer, P., Metzl, N., Nakaoka, S., Olsen, A., Park,
G.-H., Peylin, P., Rodgers, K. B., Sasse, T. P., Schuster, U., Shutler, J.
D., Valsala, V., Wanninkhof, R., and Zeng, J.: Data-based estimates of the
ocean carbon sink variability – first results of the Surface Ocean <i>p</i>CO<sub>2</sub>
Mapping intercomparison (SOCOM), Biogeosciences, 12, 7251–7278,
<a href="https://doi.org/10.5194/bg-12-7251-2015" target="_blank">https://doi.org/10.5194/bg-12-7251-2015</a>, 2015.
</mixed-citation></ref-html>
<ref-html id="bib1.bib43"><label>Russell et al.(2014)Russell, Sarmiento, Cullen, Hotinski, Johnson,
Riser, and Talley</label><mixed-citation>
Russell, J. L., Sarmiento, J. L., Cullen, H., Hotinski, R., Johnson, K. S.,
Riser, S. C., and Talley, L. D.: The Southern Ocean Carbon and Climate
Observations and Modeling Program (SOCCOM), Ocean Carbon and Biogeochemistry
Article, 7, 1–28,
2014.
</mixed-citation></ref-html>
<ref-html id="bib1.bib44"><label>Sasse et al.(2013)Sasse, McNeil, and Abramowitz</label><mixed-citation>
Sasse, T. P., McNeil, B. I., and Abramowitz, G.: A novel method for
diagnosing seasonal to inter-annual surface ocean carbon dynamics from bottle
data using neural networks, Biogeosciences, 10, 4319–4340,
<a href="https://doi.org/10.5194/bg-10-4319-2013" target="_blank">https://doi.org/10.5194/bg-10-4319-2013</a>, 2013.
</mixed-citation></ref-html>
<ref-html id="bib1.bib45"><label>Smola et al.(2004)Smola, Schölkopf, and Olkopf</label><mixed-citation>
Smola, A. J., Schölkopf, B., and Olkopf, B.: A Tutorial on Support
Vector Regression, Stat. Comput., 14, 199–222, <a href="https://doi.org/10.1023/B:Stco.0000035301.49549.88" target="_blank">https://doi.org/10.1023/B:Stco.0000035301.49549.88</a>, 2004.
</mixed-citation></ref-html>
<ref-html id="bib1.bib46"><label>Tagliabue et al.(2014)Tagliabue, Sallée, Bowie, Lévy,
Swart, and Boyd</label><mixed-citation>
Tagliabue, A., Sallée, J.-B., Bowie, A. R., Lévy, M., Swart, S.,
and Boyd, P. W.: Surface-water iron supplies in the Southern Ocean sustained
by deep winter mixing, Nat. Geosci., 7, 314–320,
<a href="https://doi.org/10.1038/NGEO2101" target="_blank">https://doi.org/10.1038/NGEO2101</a>, 2014.
</mixed-citation></ref-html>
<ref-html id="bib1.bib47"><label>Takahashi et al.(2002)Takahashi, Sutherland, Sweeney, Poisson, Metzl,
Tilbrook, Bates, Wanninkhof, Feely, Sabine, Olafsson, and
Nojiri</label><mixed-citation>
Takahashi, T. T., Sutherland, S. C., Sweeney, C., Poisson, A., Metzl, N.,
Tilbrook, B., Bates, N. R., Wanninkhof, R. H., Feely, R. A., Sabine, C. L.,
Olafsson, J., and Nojiri, Y.: Global sea-air CO<sub>2</sub> flux based on
climatological surface ocean pCO<sub>2</sub>, and seasonal biological and temperature
effects, Deep-Sea Res. Pt. II, 49,
1601–1622, <a href="https://doi.org/10.1016/S0967-0645(02)00003-6" target="_blank">https://doi.org/10.1016/S0967-0645(02)00003-6</a>, 2002.

</mixed-citation></ref-html>
<ref-html id="bib1.bib48"><label>Takahashi et al.(2012)Takahashi, Sweeney, Hales, Chipman, Newberger,
Goddard, Iannuzzi, and Sutherland</label><mixed-citation>
Takahashi, T. T., Sweeney, C., Hales, B., Chipman, D. W., Newberger, T.,
Goddard, J. G., Iannuzzi, R., and Sutherland, S. C.: The Changing Carbon
Cycle in the Southern Ocean, Oceanography, 25, 26–37,
<a href="https://doi.org/10.5670/oceanog.2012.71" target="_blank">https://doi.org/10.5670/oceanog.2012.71</a>, 2012.
</mixed-citation></ref-html>
<ref-html id="bib1.bib49"><label>Thomalla et al.(2011)Thomalla, Fauchereau, Swart, and
Monteiro</label><mixed-citation>
Thomalla, S. J., Fauchereau, N., Swart, S., and Monteiro, P. M. S.: Regional
scale characteristics of the seasonal cycle of chlorophyll in the Southern
Ocean, Biogeosciences, 8, 2849–2866, <a href="https://doi.org/10.5194/bg-8-2849-2011" target="_blank">https://doi.org/10.5194/bg-8-2849-2011</a>,
2011.
</mixed-citation></ref-html>
<ref-html id="bib1.bib50"><label>Vapnik(1999)</label><mixed-citation>
Vapnik, V.: An overview of statistical learning theory, IEEE T.
Neural Netw., 10, 988–999, <a href="https://doi.org/10.1109/72.788640" target="_blank">https://doi.org/10.1109/72.788640</a>,
1999.
</mixed-citation></ref-html>
<ref-html id="bib1.bib51"><label>Wanninkhof(2014)</label><mixed-citation>
Wanninkhof, R. H.: Relationship between wind speed and gas exchange over the
ocean revisited, Limnol. Oceanogr., 12, 351–362,
<a href="https://doi.org/10.4319/lom.2014.12.351" target="_blank">https://doi.org/10.4319/lom.2014.12.351</a>,
2014.
</mixed-citation></ref-html>
<ref-html id="bib1.bib52"><label>Weiss(1974)</label><mixed-citation>
Weiss, R.: Carbon dioxide in water and seawater: the solubility of a
non-ideal
gas, Mar. Chem., 2, 203–215, <a href="https://doi.org/10.1016/0304-4203(74)90015-2" target="_blank">https://doi.org/10.1016/0304-4203(74)90015-2</a>,
1974.
</mixed-citation></ref-html>
<ref-html id="bib1.bib53"><label>Williams et al.(2017)Williams, Juranek, Feely, Johnson, Sarmiento,
Talley, Dickson, Gray, Wanninkhof, Russell, Riser, and
Takeshita</label><mixed-citation>
Williams, N. L., Juranek, L. W., Feely, R. A., Johnson, K. S., Sarmiento,
J. L., Talley, L. D., Dickson, A. G., Gray, A. R., Wanninkhof, R. H.,
Russell, J. L., Riser, S. C., and Takeshita, Y.: Calculating surface ocean
pCO<sub>2</sub> from biogeochemical Argo floats equipped with pH: An uncertainty
analysis, Global Biogeochem. Cy., 31, 591–604,
<a href="https://doi.org/10.1002/2016GB005541" target="_blank">https://doi.org/10.1002/2016GB005541</a>, 2017.
</mixed-citation></ref-html>--></article>
