Estimating mixed layer nitrate in the North Atlantic Ocean

Here we present an equation for the estimation of nitrate in surface waters of the North Atlantic Ocean (40 ◦ N to 52 N, 10 W to 60 W). The equation was derived by multiple linear regression (MLR) from nitrate, sea surface temperature (SST) observational data and model mixed layer depth (MLD) data. The observational data were taken from merchant vessels that have crossed the North Atlantic on a regular basis in 2002/2003 and from 2005 to the present. It is important to find a robust and realistic estimate of MLD because the deepening of the mixed layer is crucial for nitrate supply to the surface. We compared model data from two models (FOAM and Mercator) with MLD derived from float data (using various criteria). The Mercator model gives a MLD estimate that is close to the MLD derived from floats. MLR was established using SST, MLD from Mercator, time and latitude as predictors. Additionally a neural network was trained with the same dataset and the results were validated against both model data as a “ground truth” and an independent observational dataset. This validation produced RMS errors of the same order for MLR and the neural network approach. We conclude that it is possible to estimate nitrate concentrations with an uncertainty of ±1.4 μmol L−1 in the North Atlantic.


Introduction
Estimating seasonal new production is fundamental for our understanding of the global carbon cycle.Especially in regions where nitrate is depleted during summer the amount of nitrate that is available at the onset of the productive season Correspondence to: T. Steinhoff (tsteinhoff@ifm-geomar.de) is essential for further production estimates estimates.Interannual changes of the nitrate availability will directly influence new production and carbon drawdown (Koeve, 2001).But estimating nutrient fluxes into the upper ocean and their subsequent utilisation by marine primary production is still a big challenge in oceanography.Even though there are continuous sampling programs at Bermuda Atlantic Time Series (BATS, Bates, 2007) station in the western North Atlantic and European Station for Time Series in the Ocean, Canary Islands (ESTOC, González-Dávila et al., 2007) in the eastern part of the subtropical North Atlantic, it is impossible to map nutrient variability for the whole basin.The mechanism of nutrient supply is very different at the two stations: at BATS it is mainly driven by eddies and at ESTOC by winter convection (Cianca et al., 2007).Furthermore these two stations are located in the subtropical gyre where seasonality is low.In the temperate North Atlantic, between 30 • N and 60 • N, the coverage of surface nutrient data is sparse especially because of very few wintertime observations.Körtzinger et al. (2008) and Hartman et al. (2010) reported the seasonal cycle of nutrient data for the years 2003/2004 with data from a single location, the Porcupine Abyssal Plain site (PAP), located in the temperate North East Atlantic Ocean (49 • N, 16.5 • W).
Some work has been done to estimate winter nitrate concentrations from nitrate-density relationships (Garside and Garside, 1995), nitrate-temperature/density relationships (Kamykowski and Zentara, 1986;Sherlock et al., 2007) or to estimate nutrient fields from remotely sensed data (Goes et al., 2000;Kamykowski et al., 2002;Switzer et al., 2003).Several other attempts were made to estimate wintertime nitrate concentration (e.g.Takahashi et al., 1985;Glover and Brewer, 1988;Körtzinger et al., 2001;Koeve, 2001) as the values at the onset of the productive season are crucial to assess new production (Minas and Codespoti, 1993).
Published by Copernicus Publications on behalf of the European Geosciences Union.
T. Steinhoff et al.: Estimating nitrate in the North Atlantic Another possible application of predicting seasonal nutrient cycles, that are not based on climatology, is the parameterization of CO 2 partial pressure in seawater (pCO 2 ).Studies have been performed to relate the pCO 2 in the North Atlantic to remotely sensed data (Lefèvre et al., 2005;Jamet et al., 2007;Lüger et al., 2008;Chierici et al., 2009;Friedrich and Oschlies, 2009a,b;Telszewski et al., 2009) as it is driven by many factors: thermodynamics, biology, mixing and airsea gas exchange.Chlorophyll a (chl-a) concentrations are often employed to estimate the biological driver of the pCO 2 , but the utility of chl-a for this purpose is rather limited (Ono et al., 2004;Lüger et al., 2008).Given that nitrate changes are directly related to new production we believe that estimation of the entire seasonal cycle of nitrate could also improve pCO 2 predictions.
Here we present (and compare) two methods using observational data to estimate mixed-layer nitrate in the North Atlantic between 40 • N and 52 • N and 10 • W and 60 • W. The first method is a multi linear regression (MLR) and the second method used the same data to train a neural network.
However, the quality of any prediction depends on the quality of the predictors.Therefore we chose the variables to be used in the prediction, sea surface temperature (SST) and MLD, very carefully.Reliable and well tested SST products are available (e.g. the Advanced Microwave Scanning Radiometer-EOS (AMSR-E) on NASA EOS Aqua satellite, Emery et al., 2006).The situation is more complicated for MLD because there is no uniform criterion for its estimation.Numerous criteria for the estimation of MLD can be found in the literature (e.g.Kara et al., 2003;de Boyer Montégut et al., 2004) and often the criteria need to be adjusted regionally.The proposed criteria vary from simple temperature difference criteria to advanced methods such as the curvature criterion of Lorbacher et al. (2006) that uses the shape of vertical profiles (temperature or density).For all these criteria temperature/density profiles are required for the MLD estimation.Alternatively, MLD climatologies or MLD estimates from models can be used.In this study we compare MLD calculated from in-situ measured profiles (i.e. by ARGO floats), MLD climatology of Monterey and Levitus (1997), and MLD estimates from two different models (FOAM and Mercator).

Discrete water samples
We used data from water samples taken on "Volunteer Observing Ships" (VOS) along a trans-Atlantic route between Europe and North America (Fig. 1a).The studies were part of two European research projects: CArbon VAriability Studies by Ships Of Opportunity (CAVASSOO) and CAR-BOOCEAN.During CAVASSOO (2002CAVASSOO ( /2003) )   Lines, Stockholm, Sweden).The M/V Falstaff was also used at the onset of CARBOOCEAN in 2005 but was changed to a new ship, the M/V Atlantic Companion (Atlantic Container Lines, New Jersey, USA), in 2006.Both ships were outfitted with autonomous instruments that measure pCO 2 (Lüger et al., 2004).At the same time sea surface temperature (SST) and salinity (SSS) were measured using Seabird thermosalinographs (SBE21 or SBE45) with external SBE38 temperature sensors that were located near the seawater intake.The setup of the sampling line were different on both ships.Onboard M/V Falstaff a 4 m insulated pipe was connected to the small starboard side sea chest, used for the evaporator of the ship, leading to the SBE21.The manifold of the SBE21 was used to divide the water and one line was used for discrete water samples (i.e.nutrients).During CAVAS-SOO the water flowed to the manifold just by hydrostatic pressure.During CARBOOCEAN a torque-flow pump was installed before the SBE21.Onboard M/V Atlantic Companion a 15 m insulated pipe was connected to a rear starboard side seachest, that was used only by us.A torque-flow pump was installed near the seachest to pump the water to the sampling site.The water depth of both seachests varied between 4 and 8 m depending on the draught of the ship.On both ships samples were taken by trained IFM-GEOMAR personnel and we employed the same sampling procedure for the nitrate samples: seawater was drawn into 60 mL plastic bottles that were immediately frozen at −18 • C. The storage and transportation of the samples was no problem because the freezer was next to the sampling point and samples were removed when the ships stopped in Germany.They were transported in cooling boxes to the laboratory and analyzed at the IFM-GEOMAR, Kiel, following the method of Hansen and Koroleff (1999).The overall accuracy of these samples is ±3% in the range of 0-10 µmol L −1 .The data were manually inspected for each cruise seperately and data were flagged (good, suspicious or bad).In this study we used only data that were flagged "good" and that were taken at water depths deeper than 1000 m, in order to exclude any influence by shelf waters.We used 413 samples (spread over 4 years) from 28 different cruises for our calculation.Figure 1a shows the positions of the samples.The earlier data taken on the M/V Falstaff (black triangles) are located closer to the southern end of the study region, covering a latitudinal band between 40 • N and 50 • N. The data from the M/V Atlantic Companion (grey squares) are located further to the north between 45 • N and 55 • N. Figure 1b shows the number of month where data were available in 2 • Latitude × 2.5 • Longitude boxes.The maximum number of month with samples per pixel is 6.The cruises are equally distributed over the seasons, so that the linear interpolation approach is an adequate method to fill the gaps between the measured locations.

Mixed layer depth
We compared MLD estimations from the climatology of Monterey and Levitus (1997), the output of two ocean models, and calculated by applying different criteria on vertical temperature profiles measured by the ARGO float network in order to identify the most suitable MLD estimate.The data from the ARGO floats were collected and made freely available by the Coriolis project and programmes that contribute to it (http://www.coriolis.eu.org).

MLD calculated from ARGO data
We downloaded all profile data available for the time period 2002-2007 in our study region from the ARGO website.All profiles were linearly interpolated onto 5 m depth intervals.MLD was calculated only from temperature profiles for this comparison, because the number of profiles including both temperature and salinity is less than the number of temperature profiles.We note that calculations based on a temperature criterion represent the iso-thermal layer (ILD) which can be different from the MLD (Kara et al., 2003), but we assume this difference to be negligible for our comparison study.Thomson and Fine (2002) have shown that using temperature related MLD estimates are preferable for biological applications.We used only profiles with at least 10 data points, with the uppermost data points shallower than 15 m.For the specified time period we found more than 23 000 profiles.The MLD was calculated using the commonly used threshold difference method with various T ( T = 0.2 • C, 0.5 • C and 1 • C).We used the uppermost data point of each profile (≤15 m) as the surface reference temperature.In addition to these simple difference criteria, we also applied the curvature criterion of Lorbacher et al. (2006) that defines the MLD by the curvature of the given profile (temperature or density).We used a Matlab routine that was provided by the authors for the calculation.

Climatological MLD
The MLD climatology of Monterey and Levitus (1997) contains monthly MLD fields on a 1 • ×1 • grid for the global ocean.MLD is calculated based on three different criteria: a temperature difference, a density difference, and a variable density change.As previous stated, we used only the data calculated with the temperature difference criterion, which employs a surface-to-depth difference of 0.5 • C.

Modelled MLD
For the modelled MLD we chose the output from two models: (a) Forecasting Ocean Assimilation Model (FOAM) from the Met Office, UK (http://www.ncof.co.uk/FOAM-System-Description.html) and (b) Mercator Project, France (www.mercator-ocean.fr).The two models provide daily MLD from 2002 with a spatial resolution of 1/8 • ×1/8 • and 1/6 • ×1/6 • , for FOAM and Mercator, respectively.A difference criterion of 1 • C for temperature and 0.05 kg m −3 for density is used in the FOAM model while difference criteria of 0.2 • C and 0.01 kg m −3 are used for Mercator.

Comparison of MLD estimates
We randomly chose 31 temperature profiles from ARGO floats for the comparison, for which we calculated the MLD using the different criteria mentioned above.Figure 2 shows two typical examples of temperature profiles with the various MLD estimates.We also tried to manually identify a best MLD estimate as a reference point by the "eyeball" method, i.e. a phenomenological identification of the MLD.We determined the climatological value and the model data for this specific time and position and calculated the difference between the various MLD estimates and the "eyeball" reference MLD.The mean values for these 31 profiles are shown in Fig. 3.Although the criterion of Lorbacher et al. (2006) appears to yield the best MLD estimate, we decided to use the MLD output from a model for further calculations for the www.biogeosciences.net/7/795/2010/Biogeosciences, 7, 795-807, 2010 following reasons: 1) despite the huge number of floats in the ocean (3190 floats in October 2008) the coverage is spatially and temporally sparse compared to the model, 2) existence of a diel cycle in the MLD (Price et al., 1986) may bias MLD estimation when using real profiles.We calculated the difference between the MLD computed with the Lorbacher criterion and the MLD estimations from Mercator and FOAM, respectively ( MLD = MLD Lorb.− MLD Model ).The mean difference was −24.2±104.2m (RMSE) for Mercator and 117.5±405.2m (RMSE) for FOAM and therefore we opted for the MLD from Mercator.

Multiple Linear Regression (MLR)
Our ultimate goal was to develop a predictive equation for mixed layer nitrate on the basis of a minimum number of variables that are publicly available.Our initial list of predictors were the following parameters: SST, MLD, latitude (Lat), longitude (Lon) and time (t), where t is the day of the year.To take into account that the first and the last day of a year have nearly the same influence on our dataset we performed a sinusoidal transformation to the actual day of the year analogous to Nojiri et al. (1999).We employed a common logarithmic expression of SSTnitrate relationship because SST shows rather large variability when the nitrate concentration is low (i.e. during summer time).The logarithmic formulation results in a drecrease of this non-linear character of the SST.We explicitly used in situ SST and not remotely sensed data to keep the errors associated with the establishment of the algorithm as small as possible.The use of different SST products derived for example from satellites is discussed below.

Mean difference from "true" MLD [m]
All MLRs were calculated using the STATISTICA software package (StatSoft, Tulsa, USA).In the first step we used all variables for the MLR (Lat, Lon, log(SST), MLD and t).The longitudinal information turned out to be statistically insignificant and we repeated the MLR without longitude.Now all residing variables turned out to be statistically significant.This procedure ensured that the resulting equation includes only a minimum of the available parameters that are necessary to estimate the nitrate cycle.Our set of predictive variables results in the following best-fit equation: where nitrate is in µmol L −1 , SST in • C, MLD in m and t denotes the day of the year.The MLR used 413 datapoints and the adjusted R 2 value is 0.82.
To study possible improvements by adding chl-a data from SeaWifs (http://oceancolor.gsfc.nasa.gov) to the initial dataset we performed a MLR with SST, MLD, Lat, Lon, time and chl-a as predictors.The chl-data were 8 day composites with 9km resolution (at the equator).The adjusted R 2 value of the resulting equation is also 0.82.A major drawback of adding chl-a is the reduction of datapoints for the MLR.Due to the typical clouds above the North Atlantic the cases were we had data for nitrate, MLD and chl-a were reduced to 230.Therefore we did not include chl-a in the algorithm.

Self-Organizing Map (SOM)
The regression coefficients provide information about physical relationships between nitrate and SST or MLD, respectively, if the predictors of a MLR are independent.The drawback of this method is the limitation to a linear relation and (even for a polynomial regression) the fitting to a predefined function.Therefore, a neural network approach was additionally employed using a self-organizing map.SOMs were introduced to science by Kohonen (1982) and successfully applied to oceanographic data by Lefèvre et al. (2005), Friedrich and Oschlies (2009a,b) and Telszewski et al. (2009).SOMs are able to estimate a target value (e.g.nitrate) from related parameters (e.g.MLD, SST) without fitting to a predefined function by recognizing relationships in the observational data during the training process.The same predictive parameters used in the MLR (Eq. 1) were employed in the SOM.

Validation against observational data
We performed several tests to evaluate the predictive power of the algorithm for mixed layer nitrate (Eq. 1) and for the SOM estimations.Figure 4 shows that the calculated data (MLR) are generally in good agreement with the measurements, although there are obvious differences in spring and autumn.Negative nitrate values are predicted during the summer when nitrate is depleted but for a simple linear approach allowing for random error the prediction of negative values is the only way to produce a period of zero nitrate.All predicted negative values were set to zero for further calculations.The higher deviations in spring and autumn may arise from small scale variability (patchiness) and cannot be reproduced through our simple MLR approach.By comparing the measured training data with the estimated data it results in a mean underestimation of nitrate of 0.1±1.4µmol L −1 (RMSE).The same comparison with the algorithm that includes the chl-a term leads to an overestimation of 8.2±8.3 µmol L −1 (RMSE), which might be due to the sparse data coverage.
In addition, we randomly chose 100 data points to exclude from the entire dataset and performed a MLR with the remaining data.The coefficients of the resulting equation were of the same order as the ones in Eq. ( 1).We used this equation to estimate the nitrate concentration for the 100 data points we deleted for the MLR.We performed this test three times and calculated the mean deviation for the chosen data points each time.The resulting deviations were between −0.4 µmol L −1 and 0.0 µmol L −1 with an RMSE of 1.4 µmol L −1 for all runs.
We also employed a completely independent data set for comparison.During CAVASSOO the National Oceanography Center, Southampton, UK (NOC) took nitrate samples onboard the VOS M/V Santa Lucia and M/V Santa Figure 5 shows the intra and interannual variability of SST, MLD and nitrate concentration for the time period between 2002 and 2007 for two example locations: eastern (49 • N, 16.5 • W, PAP) and western (40 • N, 49 • W) North Atlantic.SST and nitrate are also available for a whole annual cycle in 2002/2003 at the PAP site (Körtzinger et al., 2008), resulting in another independent dataset.The West Atlantic location was chosen to illustrate the limitations of a MLR approach because the Labrador current may introduce short term variability on a daily timescale.The corresponding SST and nitrate values measured onboard one of the VOS lines mentioned above were added to the plot if one of the VOS line crossed an area of 1 ) latitude/longitude around the location within one day.The annual amplitude of SST is more pronounced in the western region and the short term variability is also higher.The VOS SST measurements are in good agreement with AMSR-E in the eastern region.Deviations can be seen in the westerly region due to the high short term variability there, especially if data are from a 2 • ×2 • grid cell.This short term variability at the westerly location also results in deviation of the VOS measured nitrate data from the predicted concentrations.
The MLD amplitude is slightly higher at PAP station.The instruments at PAP were deployed in approximately 30 m depth and Körtzinger et al. (2008) excluded data that were measured below the thermocline.The SST data measured at PAP and from the VOS lines agree with the data from AMSR-E.This results in good agreement between measurements and predicted values of nitrate.In contrast to the measured and SOM estimated nitrate data the (MLR) calculated nitrate data show a smooth seasonality.A comparison of the latter two results in a RMS error of 0.9 µmol L −1 .A comparison of the SOM calculated data and the measured values at PAP results in a RMS error of 1.4 µmol L −1 .However, the SOM estimates in the western part are in better agreement with the measured data.Figure 5 (second row) shows the difference between MRL estimated nitrate and the other nitrate products.The SOM estimates and the VOS measurements deviate mostly in the same order and direction.

Validation using a biogeochemical model
Predicted nitrate concentrations were also validated against nitrate concentrations predicted by a high-resolution nitrogen-based nitrate-phytoplankton-zooplankton-detritus model of the North Atlantic.All model details are described in Oschlies et al. (2000) and Eden and Oschlies (2006).The advantage of this model-based validation is that the model produces daily nitrate fields with a horizontal resolution of 1/12 • ×1/12 • latitude/longitude which can be used as a basin wide "ground-truth" to assess the accuracy of the nitrate estimates generated by the MLR and the SOM, respectively.The model output of SST, MLD and nitrate was sampled according to the time (day of the year) and position of the actual nitrate measurements during the period of June 2002 to May 2003, where error of nitrate measurements was not considered.This model-generated data set was then used to calculate a MLR and to train a SOM.The input parameters for both approaches were the same as for the observational data: Lat, SST, MLD and time (day of the year).Monthly mean model outputs of SST and MLD were used to generate nitrate estimates from both methods.Figure 6i shows the annual cycle of nitrate simulated by the model in the domain covered by the nitrate sampling (40 • N to 52 • N, 10 • W to 50 • W) and the annual cycle of the nitrate estimates derived from the model-generated data set by the MLR and the SOM, respectively.The general pattern of the annual cycle can be reproduced by the estimates.High winter nitrate concentrations are underestimated.The SOM estimate has a higher accuracy in reproducing the late summer nitrate minimum.It is apparent that the mapping fails in the north-western part of the basin because of the spatial and temporal distribution of the nitrate mapping error (Fig. 6a-d (MLR) and 6e-h (SOM)).This applies for both the MLR and for the SOM.High nitrate values occurring in the Labrador current cannot be reproduced by either estimate.This disparity may be due to the sparse observational coverage of the considered region or to the highly variable current system in this region.The basin wide RMS-error for our model-based validation amounts to 2.1 µmol L −1 for the SOM estimate and 2.2 µmol L −1 for the MLR estimate which is significantly higher than the error derived from the validation against independent observational data.

Error estimation
The uncertainty in nitrate estimation of 1.4 µmol L −1 appears for different validation approaches.Therefore we speculate that this is the uncertainty within the training dataset itself that arises for instance from sampling/measurement errors or problems in the water supply (e.g.biofouling).Thus it is impossible to estimate nitrate with an uncertainty better than ±1.4 µmol L −1 with the presented algorithm.Another error source can be small scale variability that can not be covered with a simple MLR approach, whereas mesoscale variability should only add random noise to the data.The effect could be seen when we tried to estimate nitrate in the area of the Labrador current (Fig. 6): the uncertainty of the estimates increases rapidly.A minor drawback of a MLR is the linear correlation of the predictors itself.So it is clear that SST is corellated to Lat or time.In oceanography it is a general problem that variables are correlated, but given that each variable influences the result in a different way it was acceptable to use variables that are somehow correlated.Olsen et al. (2004) analysed the deviation between satellite derived SST and in situ measured SST.They found differences of up to 1 • C. The minimum and maximum SST within our dataset is 5.9 • C and 25.6 • C, respectively.The maximum error would be between 4% and 20%.An error of 20% (4%) in SST would result in an error in nitrate of 0.4 µmol L −1 (0.1 µmol L −1 ).In contrast, Emery et al. (2006) showed that the Advanced Microwave Scanning Radiometer-EOS (AMSR-E) on NASA EOS Aqua satellite produces SST www.biogeosciences.net/7/795/2010/Biogeosciences, 7, 795-807, 2010 data that are in good agreement with the in situ measured SST.We suggest that using the temperature from AMSR-E (http://www.ssmi.com/amsr)will introduce only a small error.
The choice of MLD can lead to huge over-or underestimations of MLD.To assess the influence of over/underestimation of MLD, we calculated the error in nitrate that will arise from an uncertainty of 50 m in the MLD.The resulting error is approximately 0.3 µmol L −1 .

MLD estimations and variability
Our results indicate that it is possible to find a robust MLD estimate with a good temporal and spatial resolution.Although using in-situ profiles results in the best estimation of the MLD at a specific position and time the resolution of the ARGO network is too sparse for reflecting an annual cycle of MLD on the scale of the entire North Atlantic Ocean.In contrast, climatological MLD estimates (e.g.Monterey and Levitus, 1997;de Boyer Montégut et al., 2004) have uniform resolution but do not reflect interannual changes and show Biogeosciences, 7, 795-807, 2010 www.biogeosciences.net/7/795/2010/considerable deviations from observations (e.g.tend to significantly overpredict MLD).Using MLD generated by models could be a compromise: data are produced on a daily basis, on a regular grid, and can be in good agreement with observations.Here we compare results from two models since the model dependent differences are large.
For in-situ profiles, such as those measured by floats, the curvature criterion of Lorbacher et al. (2006) results in MLD that are closest to those which are eyeball-defined (Fig. 3).The model output from FOAM yields MLD that are significantly deeper than in-situ observations.This finding is in good agreement with de Boyer Montégut et al. (2004) who, among others, showed that a temperature criterion of 1 • C (see Fig. 2.2.3) is too large for the subpolar North Atlantic.We chose the model output of the Mercator project, as this provides high resolution and MLD that are close to the eyeball-defined MLD.
We examined the variability in the reliably estimated MLD during the entire time period to understand the dynamics in the region.The mixed layer in the subpolar North Atlantic shows a clear seasonality (Fig. 5): during summer the MLD may be only a few tens of meters while, in wintertime, depths greater than 350 m can be reached.We carefully inspected the dynamics of the winter MLD since its deepening supplies nutrients to the sea surface (Oschlies, 2002).This makes the MLD one of the main forcing features in this region's biogeochemistry (Oschlies, 2002).It is well known that the maximum winter MLD increases with latitude and numerous studies have shown that a local maximum in the winter MLD exists between 45 • N and 50 • N in the North Atlantic (e.g.Koeve, 2001;de Coëtlogon and Frankignoul, 2003).The region of occurrence of the maximum MLD is known to be a region of most intense wintertime ocean heat loss (Marshall, 2005).Due to this rapid cooling at the surface the density rapidly increases and the surface waters along the North Atlantic Drift (NAD) are mixed much deeper than to the north and south of this region.Figure 7 shows the MLD as calculated by Mercator for 10 March, 2006.The maximum MLD along the NAD is clearly visible.

Nitrate estimations
The nitrate data show a clear seasonality (Fig. 4a), with nitrate depletion during summer and the highest values in spring (8-12 µmol L −1 ).The data during summer show low variability due to depletion of nitrate in the whole study area.Higher scatter can be observed in the data during the rest of the year.This is probably due to small scale variability of the sampled surface water (patchiness) as well as to the large latitudinal range of the cruise tracks (Fig. 1a).
The validation of the presented algorithm shows that 82% of the nitrate variabilty is explained by Eq. ( 1).It turned out that it was a reasonable approach to use a MLR for the 413 datapoints that are spread over 4 years, because the data that were used for the algorithm are distributed over the whole range of the predicting variables (Lat, day of the year, SST and MLD).
As expected, the latitude and time in the MLR-algorithm explain most of the nitrate variability (Garcia et al., 2006).The latitudinal dependence was mentioned in various studies (e.g Koeve, 2001;Kamykowski et al., 2002) and, together with time, it represents nearly a climatological annual nitrate cycle that is very stable within our study area.
The algorithm can be adjusted to capture the interannually varying conditions by adding SST and MLD (provided they are not taken from climatologies).They reflect the actual conditions that can drive biological production and can change from year to year as well as from one place to another.The MLD appears to be a good indicator of the variable vertical supply of nitrate.Figure 8 shows the difference between measured nitrate and nitrate estimations that were calculated from an equation that uses only Lat and time and from Eq. (1).For this purpose we performed a MLR with the same dataset using only Lat and time as predictors.Then the difference between the measured nitrate concentrations of the independent dataset of the NOC and estimations from the climatological approach and from Eq. (1) were calculated.During most of the year the difference between the algorithm that uses only Lat and time and the one that uses also SST and MLD as predictors is small.But the most interesting time are the winter months (February-March) when MLD reaches its maximum and fresh nitrate is mixed into the surface.Figure 8 shows clearly that both algorithms overestimate the winter nitrate concentration, but the error with the climatological approach is bigger compared to Eq. ( 1).So we can state that inclusion of SST and MLD shows an improvement that is of the order of the interannual variability.This variability in the nitrate supply that is linked to variability in MLD will result in interannual variations in biological production that will therefore influence the carbon drawdown.1).The measured nitrate data are from the NOC dataset that was not used for the establishment of the algorithm.
The increase in winter time mixed layer between 2004 and 2007 (Fig. 5, third row) in both basins, east and west, results in an increase in nitrate concentrations.Figure 9a-c shows averaged values of the February/March data of MLD, SST and estimated MLR nitrate concentration at PAP site.
Adding chl-a to the algorithm did not lead to improvements of the result what might be caused by two factors: (i) the reduction of number of samples from 413 to 230; and (ii) The ratio of converted nitrate to chl-a is not a constant ratio (e.g.Hydes et al., 2001;Kähler and Koeve, 2001).At the onset of a bloom where nitrate is high and chl-a is low the nitrate consumption is high and particulate organic material has low C/N values (Körtzinger et al., 2001).During the bloom this conversion factor will change due to the reduced availability of nitrate and this change can not be covered by an easy algorithm like the presented one.
The small interannual variations of ca.1.5 µmol L −1 (e.g.Fig. 5) can be explained by the fact that the data fall in nearly one biogeographic province as defined by Oliver and Irwin (2008).Following the classification of Longhurst (2007), our sampling area covers three provinces (Gulf Stream (GFST), North Atlantic drift (NADR) and North Atlantic subtropical gyre (NAST(E)) province).There are certain differences between these provinces, but the ecological processes are primarily driven by the same physical processes.Longhurst (2007) defined 6 cases of physical control, of which GFST and NADR are assigned to the same case: "nutrient-limited spring production peak".NAST(E) is assigned to another case ("winter-spring production with nutrient limitation") but these samples contribute only 5% of our dataset and are located at the northern border of the NAST(E) province.As the borders between the certain provinces are not very strict, we can state that our dataset also fits in one biogeographical province as defined by Longhurst (2007).
The overall uncertainty in predicting nitrate with the presented MLR-algorithm and SOM is 1.4 µmol L −1 .As this uncertainty appears by different validation approaches we believe that this is the internal uncertainty of the presented dataset.Initially this does not appear to be better than algorithms presented in former studies (e.g.Kamykowski and Zentara, 1986;Garside and Garside, 1995;Goes et al., 2000;Kamykowski et al., 2002;Switzer et al., 2003).But some of them are presenting algorithms only for other regions than the North Atlantic (Goes et al., 2000;Kamykowski and Zentara, 1986) or present gridded estimations on 10 • ×10 • grid (Switzer et al., 2003).The algorithm presented by Garside and Garside (1995) is based only on SST and they report a RMSE of 1.1 µmol L −1 for their North Atlantic dataset.Applying their equation to our dataset leads to an overestimation of 1.4±2.9µmol L −1 (RMSE).The algorithm used in this study estimates nitrate with an uncertainty of 1.4 µmol L −1 , both for the reproduction of the training dataset itself and for the independent dataset from NOC.This gives confidence in the presented equation.However, here we present one simple algorithm for a whole region that is easy to use and the desired input data (MLD, SST) can be accessed easily.As an example we calculated the annual new production for 2004 at PAP station and compared it with the calculations from Körtzinger et al. (2008).Using the same MLD and C/N ratio as Körtzinger et al. (2008) we calculated the new production to be 6.5±2.7 mol C m −2 yr −1 (their result from measurements was 6.4±1.1 mol C m −2 yr −1 ).Using the MLD estimates from Mercator (the same as used for nitrate estimation) the new production is estimated to be 2.6±1.3 mol C m −2 yr −1 .Note, that the difference between this two calculations is only due to the different MLD estimations.A predictive accuracy of 1.4 µmol L −1 is not better than the measurements itself but it offers the potential to estimate nitrate with a sufficient accuracy within the whole study area.
In this study, the SOM predicted nitrate data are not better than the MLR estimates, despite in the regions that were influenced by the Labrador current.This is a clear limitaion of an easy MLR approach, as the MLR was trained with data from a different biogeographic province and is not possible to extrapolate the presented equation to differnt regions.This is the advantage of the SOM as it recognizes the relationship of the predictors at the specific position.We speculate that taking full advantage of the benefits of SOM would require better data coverage.The basinscale validation of the MLR and the SOM against a biogeochemical model produced RMS errors of 2.1 µmol L −1 (SOM) and 2.2 µmol L −1 (MLR) respectively.These errors are considerably larger than those obtained from the validation against independent observational data.In particular high nitrate values in winter and spring in the Labrador Current region could not have been reproduced by our estimation techniques.This clearly shows the temporal and spatial limitations of the presented method.The predictive potential of both techniques is mostly restricted to interpolation between lines of observations.For the water masses that could be classified as nearly the same biogeochemical province the interpolations works well, because the training dataset contains data from all seasons in different years.This enabled us to find good estimation results with an easy MLR approach.An extrapolation to water masses not or barely covered by the variability range of the measurements suffers from larger estimation errors.Thus we can state that an extrapolation in time might work sufficiently well, but an extrapolation in space will not work, because of different dependencies (e.g.main nutrient supply in the study region by convection).

Implications of nitrate estimation to pCO 2
We calculated the increase in pCO 2 that should result from increased nitrate concentration as mentioned above (Fig. 9b): we assume that the nitrate concentration until 2005 constitutes a baseline and that the associated dissolved inorganic carbon (DIC) supply will result in pCO 2 values that are in equilibrium with the atmosphere.We found that the increased nitrate and the associated increased DIC (calculated from C/N ratio of 7.2, Körtzinger et al., 2001) will result in pCO 2 values that are increasing faster than in the atmosphere (Fig. 9d).Both rates of pCO 2 increase are within the range of previous observations (Corbière et al., 2007;Takahashi et al., 2009, and references herein) and we speculate that the observed changes in rates of pCO 2 increase may be due to the variable winter MLD and, thus, the nitrate supply.
As mentioned above a lot of effort is being made to predict seawater pCO 2 in the North Atlantic Ocean very precisely using remotely sensed data.One important driving force of the pCO 2 is the SST due to the thermodynamic effect, that is well known (e.g.Takahashi et al., 1993).But the pCO 2 is also affected by high biological activity (Watson et al., 1991;Lüger et al., 2004;Körtzinger et al., 2008) especially in the temperate North Atlantic which is hard to assess by remote sensing (Ono et al., 2004;Lüger et al., 2008, and references herein) as satellite chlorophyll data proved to be rather useless as a predictor in their study.As the biological production in the world oceans follows a nearly constant stoichiometry (Redfield et al., 1963) it is easy to calculate the change in DIC from a known nitrate change and subsequently the effect on pCO 2 can be calculated.Following the MLR approach of other studies but substituting satellite chlorophyll with calculated nitrate using the algorithm presented above has the potential to obtain better pCO 2 estimates.A residual error of 1.4 µmol L −1 in nitrate would still translate, however, into a pCO 2 error of ≥18 µatm what is comparable to the RMSE of the parameterization used by Corbière et al. (2007) (17.4 µatm).One could argue in favour of using the MLD and SST twice (first for the nitrate estimation and second for the pCO 2 estimation) as the SST dependence of pCO 2 is different from the dependence of nitrate.
This hypothesis has to be tested in future work and also the ongoing research onboard VOS lines will produce more nitrate data that could support and/or improve the presented algorithm and especially the SOM approach will become more important with a larger dataset.

Fig. 1 .
Fig. 1.(a) Location of samples used in this study.Triangles denote samples taken in 2002 and 2003 and squares denote samples taken since 2005.For validation purpose we also used data from another VOS line (UK-Caribbean), that was sampled in 2002/2003 by the National Oceanographic Center (NOC), Southampton (circles).The diamonds denote the position of the three time series stations BATS, ESTOC and PAP, as well as one location that is used for demonstration (refer to Fig. 5).(b) Number of month sampled within a 2 • Latitude × 2.5 • Longitude box.

Fig. 2 .
Fig. 2. Two examples of vertical temperature profiles from ARGO floats with the MLD assigned using various techniques or sources.The orange Xs denote the eyeball reference MLD.Top: summer temperature profile.Bottom: winter temperature profile.

Fig. 3 .
Fig. 3. Comparison of MLD estimates.The bars denote the mean difference between the eyeball defined MLD and the respective estimated MLD for 31 profiles.The exact mean differences are shown above each bar.Negative values indicate that the product defines a deeper MLD than the eyeball reference and vice versa.The error bars are the standard deviation (1σ ).Abbreviations: Lor: criterion of Lorbacher et al. (2006), Merc: data from Mercator model, T0.2: temperature difference criterion with T =0.2 • C, T1: temperature difference criterion with T =1 • C, M+L: data from Monterey and Levitus (1997) climatology, FOAM: data from FOAM model.
Fig. 4. (a) Surface nitrate concentration versus day of the year for all data taken between 2002 and 2007.Black dots are the measured concentrations and grey triangles denote predicted concentrations.(b) Difference between measured and calculated nitrate data.Negative values show overestimation of the MLR with respect to the measured nitrate and vice versa.

Fig. 5 .
Fig.5.Seasonality of nitrate, MLD and SST for the time period between 2002 and 2007 for a single location in the western and eastern parts (PAP site) of the North Atlantic, respectively.The panels in the first row show the nitrate concentration calculated with the MLR and SOM, respectively.Nitrate measurements from PAP mooring and VOS lines that passed within a 1 • ×1 • and 2 • ×2 • grid cell, respectively, are shown.The panels in the second row show the difference between measured and estimated nitrate using the same data as in the upper panels.Negative values show overestimation of the MLR and vice versa.The panels in the third row show the MLD taken from Mercator (mean value of ±0.5 • Lat/Lon around the location).The lower two panels show the SST from AMSR-E (mean value of ±0.5 • Lat/Lon around the location) and measured SST from VOS lines.Also shown are the SST measurements from the PAP site in the eastern part.

Fig. 6 .
Fig.6.RMS errors for nitrate estimates in µmol L −1 using a MLR (a-d) or SOM (e-h) technique in summer, fall, winter and spring, respectively.(i) Annual cycle of simulated "true" nitrate (black), and nitrate estimates using the SOM (red) or MLR (blue) technique for the region shown in (a-h).RMS errors and annual cycles were calculated using a biogeochemical model.

Fig. 8 .
Fig. 8. Monthly mean difference between measured and estimated nitrate.The nitrate estimations were made (1) only from Lat and time and (2) with Eq. (1).The measured nitrate data are from the NOC dataset that was not used for the establishment of the algorithm.

Fig. 9 .
Fig. 9. (a-c) Average values of wintertime (February, March) MLD, SST and nitrate estimates from MLR at the location of PAP site.Error bars are standard deviations (1σ ).(d) Seawater pCO 2 that is in equilibrium with the atmosphere and corrected for the extra amount of DIC associated with the extra nitrate.
samples were collected from the merchant vessel M/V Falstaff (Wallenius Steinhoff et al.:Estimating nitrate in the North Atlantic Maria, respectively.Both ships were sailing between the UK and Carribean (Fig.1) and produced more than 600 nitrate samples between May 2002 and December 2003 in the area north of 40 • N. We used the SST from their dataset and the matching MLD from Mercator to estimate corresponding nitrate data with both methods: SOM and MLR.Since we do not have the MLD for all 2002 dates we included only 344 datapoints.In average nitrate was underestimated by 0.6±1.2µmol L −1 (RMSE) with the MLR and overestimated by 0.4±1.5 µmol L −1 (RMSE) with the SOM.Using only data between 10 • W and 50 • W (the SOM was trained only in this region) the MLR underestimates nitrate by 0.5±1.1 µmol L −1 (RMSE) and the SOM overestimates it by 0.3±1.3µmol L −1 (RMSE).