Evaluation of ocean dimethylsulﬁde concentration and emission in CMIP6 models

. Characteristics and trends of surface ocean dimethylsulﬁde (DMS) concentrations and ﬂuxes into the atmosphere of four Earth System Models (ESMs: CNRM-ESM2-1, MIROC-ES2L, NorESM2-LM and UKESM1-0-LL) are analysed over the recent past (1980–2009) and into the future, using Coupled Model Intercomparison Project 6 (CMIP6) simulations. The DMS concentrations in historical simulations systematically underestimate the most widely-used observed climatology, but compare more favourably against two recent observation based datasets. The models better reproduce observations in mid to high 5 latitudes, as well as in polar and westerlies marine biomes. The resulting multi-model estimate of contemporary global ocean DMS emissions is of 16–24 Tg S year − 1 , which is narrower than the observational-derived range of 16 to 28


Introduction
Despite several decades of investigations, the quantification of interactions between aerosols and climate remains poorly constrained and understood (Bender, 2020). Recent work using the latest generation of ESMs suggests that aerosol-climate interactions constitute a key driver of the inter-model spread of the simulated response to rising CO 2 (Meehl et al., 2020). One of 20 the sources of uncertainties is related to natural aerosols whose relative abundance compared to that of anthropogenic aerosols directly influences the level of the anthropogenic aerosol forcing (Schmidt et al., 2012;Carslaw et al., 2013). Dimethylsulfide (DMS) is a by-product of microbial food webs and is considered the largest natural source of sulfur to the atmosphere (e.g., Liss et al., 1994;Simó, 2001;Stefels, 2000). Once in the atmosphere, DMS is mainly oxidised into SO 2 and then gas-phase sulfuric acid, which rapidly condenses onto pre-existing aerosol particles or nucleates to form new sulfate aerosol particles 25 (Carslaw et al., 2010;Liss et al., 2014). Among natural aerosols, sulfate aerosols formed from DMS represent a major part of the aerosol-climate interactions in large pristine regions such as the Southern Ocean (Mulcahy et al., 2018, and references therein) or the Arctic (Abbatt et al., 2019, and references therein).
Changes in climate variables, for instance surface wind, sea surface temperature (SST) or downwelling irradiance, can affect both the production of DMS and its surface concentration, as well as its transfer rate from the ocean to the atmosphere, 30 potentially driving a DMS-climate feedback Carslaw et al., 2010;Quinn and Bates, 2011). The importance of such a feedback is debated due to a lack of comprehensive observations operating across a wide range of Earth system realms, from marine biogeochemistry to cloud microphysics (Boucher et al., 2014, and references therein). Therefore, modelling estimates of the strength of this feedback are poorly constrained. The latest assessed value of this feedback is −0.02 W m −2 K −1 (Ciais et al., 2013) which has been estimated from a single model (HadGEM2-ES). A recent estimate 35 deduced from three CMIP6 models (GISS-E2-1-G-CC, NorESM2-LM, UKESM1-0-LL) suggests a slight amplification of global warming due to a positive feedback of 0.005±0.006 W m −2 K −1 (Thornhill et al., 2021). These global estimates hide large regional differences both in terms of radiative forcing and in terms of changes in DMS emissions under global warming (Thornhill et al., 2021). So far, studies have focused more closely on high latitudes regions eventhough a few recent ones demonstrate the dominant role of DMS on marine low cloud aldebo over most oceans (e.g., Quinn et al., 2017) or illustrate 40 regional impacts on low latitudes (Zavarsky et al., 2018). In polar regions, studies suggested an overall negative DMS feedback (e.g., Kim et al., 2018;Mahmood et al., 2019). However, because they are particularly sensitive to global warming, the quantification of feedback in these regions is complex (Goosse et al., 2018). Therefore, an improved understanding of how the pattern of marine DMS emissions may change with climate and environmental changes is required to constrain the magnitude of the DMS climate feedback.
further tuned over the northern latitudes, allowing an assessment of the recent evolution of DMS in this region . These advances coincide also with those of global models, from ocean biogeochemistry ones (Le Clainche et al., 2010;Séférian et al., 2020) to full ESM ones enabling investigations on either (i) the physical factors that impact DMS behaviour, for instance Xu et al. (2016) demonstrate that there seems to be a two-way interaction between DMS and ENSO in the tropical region, or (ii) the ecological factors, for instance representing in the model more explicitely diverse phytoplankton groups (e.g., 55 Phaeocystis: Wang et al., 2015).
In this paper, we use the most up-to-date generation of observational data products and long-term measurements to assess estimates of the surface ocean DMS concentrations and emissions to the atmosphere from the latest generation of ESMs, using their contributions to the 6 th phase of the Coupled Model Intercomparison Project (Eyring et al., 2016). The goal of this work is two-fold. First, we aim to pull together various lines of evidence (observational data products, long-term measurements, model 60 simulations) that will be used in further multi-model analysis of DMS-climate interactions. And, second, by combining these lines of evidence, we provide an assessment of both the direction and the magnitude of the change in marine DMS emissions.
In Section 2, we provide details of the key characteristics of the ESMs and the observational datasets used in this work. In Section 3, we focus on the analysis of the modern mean state. Building upon this analysis, we investigate the recent and future evolution of marine DMS concentration and emission in Section 4. We assess the reliability of the model predictions in the 65 light of key biogeochemical and physical drivers in Section 4.2. We further focus on the Arctic domain where model long-term behaviours are scrutinized against long-term measurements of DMS concentrations and emissions in Section 4.3.
2 Models and observational datasets

Models
The present work draws on the results of four state-of-the-art ESMs that have contributed to CMIP6 (CNRM-ESM2-1, MIROC-70 ES2L, NorESM2-LM and UKESM1-0-LL), whose key characteristics are provided in Table 1. Table 1. Key characteristics of the ocean and marine biogeochemical components of the ESMs. Column 2: horizontal grid points (tripolar grids) and number of vertical levels. Column 4 in brackets: prognostic variables involved in the DMS parameterisations.  Table 1 shows that the ocean component of the four ESMs studied here offer a relatively coarse resolution: they all use tripolar grids (such as eORCA1 in CNRM-ESM2-1), with a nominal grid size of 1°and grid refinements in the tropics (circa 0.3°). As a consequence, we can anticipate that these models will suffer from deficiencies in replicating observations in regions where small-scale features are important, such as in coastal areas. 75 Nonetheless, as documented in the reference papers of CNRM-ESM2-1, MIROC-ES2L, NorESM2-LM and UKESM1-0-LL, these ESMs are able to simulate the main large-scale features of the ocean circulation. Recent work has also suggested that these models have improved their performance in simulating the ocean mixed-layer depth (MLD), an important driver for marine biogeochemistry and marine DMS emissions (Séférian et al., 2020).

DMS concentration and flux parameterisations 80
As shown in Table 1, the four studied ESMs use DMS parameterisations of various complexity. Two models use empirical parameterisations to compute DMS concentrations from chlorophyll and other variables (MIROC-ES2L and UKESM1-0-LL), while the other two use prognostic models including marine biota (CNRM-ESM2-1, NorESM2-LM). Characteristics of these parameterisations are essential to understand model deficiencies at simulating observed features of DMS concentration.
Besides, a close look at these parameterisations is also necessary to infer the ability of the biogeochemistry models to simulate 85 the evolution of the DMS concentration and emission in the future, and the climate feedback that it may trigger. Here, we detail DMS parameterisations used in the four ESMs.
Prognostic DMS parameterisations In CNRM-ESM2-1, DMS concentration is computed by the biogeochemical model PISCES (Aumont and Bopp, 2006), embedded within the global general ocean circulation model NEMO. A prognostic DMS scheme was introduced in PISCES by Bopp et al. (2008), and updated by Belviso et al. (2012) based on the PlankTOM5 model 90 of Vogt et al. (2010). The version of PISCES used in CNRM-ESM2-1 for CMIP6 is PISCESv2-gas, based on PISCES-v2 (Aumont et al., 2015) with the addition of a specific module to compute the cycle of gases relevant to climate. In brief, the model simulates three processes releasing the precursor of DMS, dimethylsulfoniopropionate (DMSP) to seawater: grazing by zooplankton, exudation by phytoplankton, and cell lysis. Each of these processes is parameterised specifically for the two phytoplankton functional groups represented in PISCES: nanophytoplankton and diatoms. DMSP is then converted to DMS 95 with yields that increase with bacterial nutrient stress. Three more processes describe the sinks for DMS in seawater: bacterial and photochemical degradation, and ventilation to the atmosphere. A more thorough description of PISCES can be found in Belviso et al. (2012), with some adjustments further listed in Masotti et al. (2016). An additional stress factor accounting for the change in pH is also included in this version following the study of Six et al. (2013). The fluxes to the atmosphere are then computed using the parameterisation of gas exchange coefficients of Wanninkhof (2014). The air resistance is neglected. There 100 is currently no online coupling of the fluxes of DMS towards the atmospheric model in CNRM-ESM2-1. Instead, a prescribed DMS flux is applied as an input to the aerosol scheme (see Michou et al., 2020, for details).
NorESM2-LM includes a fully interactive description of the DMS cycle across the ocean-atmosphere interface following Kloster et al. (2006). As opposed to PISCES, the conversion of DMSP to DMS is not explicitly described in the model. Instead, DMS is directly released in the water, and is computed as a function of temperature and simulated detritus export production 105 . DMS production is further modified by the export rate of opal and CaCO 3 shell material -that is, calcite producing organisms are assumed to have a higher sulfur-to-carbon ratio than opal producing organisms. Although a tunable pH dependency, that was not present in the original parameterisation of Kloster et al. (2006), has been implemented in NorESM2, it has not been activated in CMIP6 runs . As for CNRM-ESM2-1, three sink processes for ocean DMS are accounted for: bacterial consumption, photolysis and ventilation to the atmosphere. As in CNRM-ESM2-1, 110 the latter is parameterised according to Wanninkhof (2014). DMS concentration in the air is modified via chemical reactions as described in Kirkevåg et al. (2018) and Seland et al. (2020). Compared to CNRM-ESM2-1 and NorESM2-LM, MIROC-ES2L and UKESM1-0-LL use a much simpler approach to simulate the marine DMS cycle. Indeed, DMS concentration is diagnosed using empirical parameterisations, that relate the DMS concentration to other marine biogeochemical or ocean hydrodynamical variables such 115 as chlorophyll (hereafter Chl) and MLD. Despite their relative simplicity, the two parameterisations remain quite different.

Diagnostic DMS parameterisations
In MIROC-ES2L, the seawater concentration of DMS is computed according to the parameterisation of Aranami and Tsunogai (2004), which relates the sea surface DMS concentration to the MLD, and to surface water Chl concentration. This DMS parameterisation is a modified version of that of Simó and Dachs (2002), calibrated with further measurements carried out in the northern North Pacific (Aranami and Tsunogai, 2004). These parameterisations distinguish between two regimes, depending 120 on the Chl/MLD ratio: Low Chl/MLD ratio is found in open ocean, and holds for over 80 % of the ocean global surface (Simó and Dachs, 2002 (Hajima et al., 2020). Here, the MLD is defined as the depth where the potential density becomes larger than that at the sea surface by 0.125 kg m −3 (Simó and Dachs, 2002).
The flux of DMS to the atmosphere is also computed according to Aranami and Tsunogai (2004). The gas transfer velocity is calculated following Nightingale et al. (2000), but the Schmidt number used for DMS adjustment is calculated according to 130 Wanninkhof (2014), as advised in the OMIP-BGC protocol (Orr et al., 2017). The DMS emission is then considered by the aerosol module (Hajima et al., 2020).
In UKESM1-0-LL, the seawater concentration of DMS is computed within the ocean biogeochemistry model MEDUSA (Yool et al., 2013) and is interactively coupled with the atmosphere. The parameterisation of DMS concentration is based on the work of Anderson et al. (2001), and linearly relates the DMS concentration to a composite variable formed by the 135 logarithm of the product of Chl concentration (C, mg m −3 ), light (J, mean daily shortwave, W m −2 ) and a nutrient term (Q, dimensionless) that depends on nitrate concentration. The parameterisation includes a minimum DMS concentration if the composite variable is lower than a threshold, s, as follows: In the original parameterisation, the fitted parameter values were: a = 2.29, b = 8.24 and s = 1.72 (Anderson et al., 2001).

140
During calibration of UKESM1-0-LL, the minimum DMS concentration (a) was changed to 1 nM and the threshold (s) was adjusted to 1.56 (Sellar et al., 2019). This tuning was required to reduce the excessively strong negative forcing induced by the higher DMS minimum concentration. Finally, the flux of DMS from the surface ocean to the atmosphere is parameterised according to the air-sea gas transfer scheme of Liss and Merlivat (1986). DMS concentration in the atmosphere is subsequently modified through a number of gas phase aerosol precursor reactions of the UKESM1-0-LL stratospheric/tropospheric chemistry 145 scheme (see Mulcahy et al., 2020, Table 2 for the list of reactions).

Simulations
In this paper, we use monthly outputs of the CMIP6 historical (1850-2014) and ssp585 scenario (2015-2100) experiments. All datasets were downloaded from ESGF nodes. The number of realisations of each model for both experiments is reported in  the resulting monthly climatology is considered to be representative as an average over this period. The dataset is processed as follows: no quality control is applied on the data, but the largest values above the 99.9 percentile are removed (i.e., values above 148 nM). Monthly data is first binned into a 1°×1°grid, and a monthly mean is then calculated in each of the 54 static biogeographical provinces defined by Longhurst (2007, see Appendix A2 for details). In each of these provinces, a minimum 180 of three data points is required to get a valid (monthly) value. For regions with at least four valid values over the year, a temporal interpolation is applied to fill the gaps. Conversely, in provinces with three or less months of data, neighboring or biogeographically equivalent provinces are used to construct an annual cycle, which is scaled with the few available data, if any.
This results in a "first-guess field" global dataset (Lana et al., 2011, section 2.2). Last, a distance-weighted interpolation process (radius of influence of 555 km) is applied twice to smooth the resulting global monthly climatology: a first time to smooth the 185 transitions across the province borders, and a second time after re-introducing individual in-situ data (Lana et al., 2011, section 2.3 and supplementary Fig. S2). On top of the monthly climatology, L11 provides an assessment of the uncertainty range, through two other monthly datasets representing the estimated minimum and maximum DMS concentrations. The raw data, binned into the 1°×1°grid, is also provided along with the climatology.
Despite the large number of measurements included in this DMS climatology, the spatial and temporal data coverage is presumably lower, may lead to a positive bias in L11. Another potential positive bias in L11 stems from the overrepresentation of biologically productive conditions in the in-situ DMS database from which L11 is built upon. This is supported by the study of Galí et al. (2018, Fig. 7 and Sect. 4.1) who pointed out that the distribution of DMS concentration in L11 is right-skewed as compared to DMS concentration derived from satellite chlorophyll measurements. Conversely, recent studies report on high DMS concentrations measured in the North Atlantic (Bell et al., 2021) and in a coastal station of the West Antarctic Peninsula 200 or in the Ross Sea (Webb et al., 2019;del Valle et al., 2009, respectively) which are not represented in L11. To conclude, although L11 presents some weaknesses that are inherent to the original data and the interpolation methodology, it has been considered so far as a reference (Tesdal et al., 2016) and is the only DMS climatology solely based on in-situ measurements.
It has also been widely used to calibrate or validate other DMS estimation techniques (see for instance Galí et al., 2019). We thus used L11 as the leading reference climatology in this study, along with the more recent products presented hereafter. The 205 annual mean of this climatology is shown in Fig. 1. Its global area-weighted mean is 2.35 nM (area-weighted median 2.25 nM, see Table 3).
Another methodology using an artificial neural network (ANN) has very recently been developed by Wang et al. (2020) (here referred to as W20) in order to improve the interpolation method used by Lana et al. (2011). This study relies on the same database of in-situ DMS measurements, which now contains twice as many measurements (over 93k after removing While DMS data measurements are scarce in vast areas of the oceans, and especially at high latitudes during the winter, these other variables are better constrained through observations or climatologies. After training the ANN on a subset of the 215 available data and validating the technique on another subset, it is thus possible to use the ancillary variables as predictors of DMS concentration in undersampled areas. Along with this ANN method, the authors also performed a conventional linear and multi-linear regression analyses, to compare the skills of the three methods. While the latter reproduces only 39 % of the observed DMS variance, the ANN approach captures 66 % of it. The yearly mean of the climatology derived from the ANN method is shown in Fig. 1. Its global area-weighted mean is 1.75 nM (area-weighted median 1.65 nM, see Table 3).

220
Another strategy to produce a reliable climatology of sea surface DMS concentration has been proposed recently by Galí et al. (2018) (here referred to as G18). These authors developed a remote sensing algorithm to derive total DMSP (DMSPt) from the Chl concentration and the ratio between euphotic layer depth and MLD, which are both satellite-retrieved . In a subsequent work, Galí et al. (2018) proposed a further parameterisation to derive DMS concentration from DMSPt and PAR. The resulting climatology of Galí et al. (2018) can be regarded as an intermediate between L11 and W20 225 on the one side, which both use DMS in-situ measurements to infer DMS concentration over the globe, and the models using empirical parameterisations for DMS on the other side, where the input variables (Chl, MLD, PAR) are calculated instead of being satellite-retrieved observation. This method also has its own limitations, and the developed algorithm used to relate the proxies together has to be tuned. While the resulting parameterisation can be better constrained in specific basins, it remains approximate when applied to the global oceans. Another limitation of this approach is the lack of satellite observations over 230 sea-ice and at low solar elevations, resulting in observational gaps in high latitudes (> 48°) in winter. In coastal regions, the remotely sensed Chl signal may be biased by the presence of riverine coloured dissolved organic matter (CDOM), and ultimately lead to an overestimation of the computed DMS concentration Hayashida et al., 2020). The annual mean of this climatology is shown in Fig. 1. Compared to L11, the global mean G18 DMS concentration is 33 % smaller: 1.69 nM (weighted median 1.53 nM, see Table 3).

Marine DMS emissions
The flux of DMS to the atmosphere has been assessed by several authors, but the resulting datasets are not readily available.
However, a recent work from Granier et al. (2019), as part of the European Copernicus Atmosphere Service (CAMS) project, provides a climatology of DMS flux (hereafter referred as CAMS19). We used the latest version (3.1) of this dataset. This climatology is computed using the DMS sea surface concentration from L11, the gas exchange coefficient of Nightingale et al. The calculated flux also accounts for the sea-ice cover, which is usually assumed to linearly reduce the DMS emission. Indeed, gas transfer across the sea-ice and in partly ice-covered areas involves a series of complex processes and is subject to debate (Lovely et al., 2015;Rutgers van der Loeff et al., 2014), however the linear scaling of the flux to the open-water fraction is 245 generally accepted as a good first-order approximation (Prytherch et al., 2017). The resulting DMS flux is provided as daily means on a 0.5°grid, and can be accessed from https://eccad3.sedoo.fr/. The annual mean DMS concentration over the period 1980-2009 is plotted in Fig. 1 for the 4 studied models, the MMM, L11, G18 and W20. The global area-weighted mean and median are also provided in Table 3. In Fig. 1, an additional hatching is added on the models to show the location where the modelled DMS concentration is not in the range of the three climatologies (the range being defined by the minimun and the maximum of the three climatologies, including the minimum and maximum range given by L11). Note that the Arctic region which is undocumented in G18 throughout the year is excluded from this 255 treatment so that the range is always built on three DMS concentration values.
Overall, the striking observation that can be made is the lack of agreement between models, and between models and observational products. Table 3 shows that L11 stands out with the highest median (and mean) DMS concentration. With respect to the lowest global median DMS concentration (UKESM1-0-LL), that of L11 is over 60 % higher. The models generally agree on enhanced DMS concentration in the eastern equatorial Pacific. Most models also predict higher DMS 260 concentration in coastal regions, with some exceptions: for instance, MIROC-ES2L shows little or no enhancement along the southern coasts of Australia and South America (south of the 30°S latitude). Conversely, this model predicts significant DMS  latitude band can likely be explained by the inverse relationship defined in the parameterisation of Aranami and Tsunogai (2004) between the DMS concentration and the MLD, which is deep in the Southern Ocean. This major discrepancy between models had already been highlighted by Tesdal et al. (2016), and seems even more pronounced in these CMIP6 results.
Individual model patterns shown in Fig. 1 have not changed significantly compared to previous publications presenting these models, or the parameterisations they are based on. For instance, the DMS concentration modelled by previous versions As expected, the output of MIROC-ES2L resembles those of the parameterisations of Simó and Dachs (2002) and Aranami 275 and Tsunogai (2004), the latter being the closest (Tesdal et al., 2016, Fig. 2). As compared to the latter, the enhancement in the eastern equatorial Pacific is stronger in MIROC-ES2L. Interestingly, such strong enhancement was also observed when the parameterisation of Simó and Dachs (2002) was embedded in HadOCC (Tesdal et al., 2016, Fig. 2).
NorESM2-LM results can be compared to those obtained with the rather similar ocean biogeochemistry model HAMOCC, within the MPI-ESM-LR model, as presented by Tesdal et al. (2016). Both models present similar concentration enhancement 280 off the coasts of western Africa, and in the equatorial Pacific, plus in a thin latitude band between 30°S and 40°S. However, there is little agreement at higher latitudes, especially in the northern Pacific and the Arctic, plus around Antarctica, where NorESM2-LM simulates smaller DMS concentration than HAMOCC.
UKESM1-0-LL features specific patterns that are distinct from the other three models, especially regarding the uniform low concentration over vast areas. This reflects the threshold set to 1 nM in the parameterisation based on that of Anderson west as compared to the other models. In that respect, the output of UKESM1-0-LL shares several common features with the HadOCC results shown in Tesdal et al. (2016). Lastly, as mentioned above, UKESM1-0-LL also stands out by the high DMS concentration in the Southern Ocean (40°S-60°S), which can be explained by the high bias in Chl in this region (Séférian 290 et al., 2020) and the positive relationship between DMS and Chl in the parameterisation of Anderson et al. (2001). The patterns of DMS concentration are also very similar to the patterns of net primary production (NPP), which feature summer maximums in both hemispheres (Sellar et al., 2019, Fig. 19). This suggest a strong relationship between both variables in UKESM1-0-LL. Table 4. Spatial correlation coefficients between L11, G18 and W20 and the models (first three rows), and between the individual models (rows 4-7), derived from the data displayed in Fig. 1 The general findings outlined above are strengthened by the analysis of the Pearson spatial pattern correlation, which is presented in Table 4. When compared to L11, the annual pattern correlation ranges from 0.08 to 0.26 for individual models.

295
When compared to G18 and W20, this pattern correlation is improved for all models, ranging from 0.13 to 0.46 for the latter. Note that because of the year-round undocumented area in the Arctic in G18 due to the lack of satellite observations, the G18 pattern correlation is computed on a slightly smaller area than the ones of the other two climatologies. However, computing the pattern correlation with L11 and W20 after masking them by the same extent as in G18 has a minor effect, with correlation coefficients reduced by only 0.01 or 0.02, which does not change the conclusions of this comparison. Regardless 300 the observational product, UKESM1-0-LL has the lowest pattern correlation amongst the four models, which can likely be attributed to the constant minimum DMS concentration prescribed in the parameterisation of Anderson et al. (2001). It is also worth noting that whatever the climatology used to compare with, the MMM has the highest pattern correlation (or second highest when comparing with L11), even if the improvement as compared to individual models is small.
The cross correlation between models, presented in the second part of Table 4, is rather small as well, and even slightly 305 negative between MIROC-ES2L and UKESM1-0-LL. Interestingly, the highest correlation (0.62) is between both prognostic models, CNRM-ESM2-1 and NorESM2-LM. Conversely, as in the comparison with the climatologies, UKESM1-0-LL has the lowest correlations with each of the other three models.
The last row in Table 4 shows the area fraction, for each model, that lies in the range of values of the three climatologies. Up to 84 % of the total area falls in this range for CNRM-ESM2-1, which can be related to the smoother patterns displayed by this 310 model ( Fig. 1). Conversely, the DMS concentration modelled in UKESM1-0-LL lies in the range of climatologies over only 37 % of the surface. Again, the constant minimum value seems to be responsible for this poorer agreement: as can be seen in

325
Further investigations would be required to explain these discrepancies between measurements and models or climatologies.
Some specific processes, such as the DMS concentration enhancement following sea-ice break-up (Webb et al., 2019) are not accounted for in the models, but are not sufficient to explain all discrepancies. Overall, assessing the relevance of high DMS events at the global scale and the spatial resolution of climate models is mandatory to improve them.
The monthly mean   on the 30°latitude bands of both hemispheres was also present, while this feature is much less clear in CNRM-ESM2-1.
The zonal mean seasonal cycle of MIROC-ES2L is very similar to that of both Simó and Dachs (2002) and Aranami and Tsunogai (2004). The summer maximum in the northern hemisphere occurs only in June and July in MIROC-ES2L, while it  NorESM2-LM shows a noticeable double maximum in both hemispheres, around 40°and at latitudes higher than 60°. In the northern hemisphere, the maximum around 40°N is mostly driven by high concentration in the Atlantic from May to August, and to a lesser extent in the Pacific in June and July (see UKESM1-0-LL shows a northern maximum starting earlier than the other models, in agreement with L11. However, as 345 pointed out by Hayashida et al. (2020), this feature could be an extrapolation artefact in L11 (see Sect. 2.2.1). Fig. 3 also shows that the already noticed feature of high DMS concentration in a 40°S to 60°S latitude band is mostly an austral summer feature, as for the NPP (Sellar et al., 2019, Fig. 19). The fixed minimum DMS concentration of 1 nM is also striking on this figure, with large zonal bands with uniform DMS concentration.  We further extended the pattern correlation analysis presented above for annual mean to monthly data. Figure 4 presents 350 three spatial monthly statistics between the models and L11, namely correlation coefficients, biases (in % of the reference) and root mean square errors (RMSE). The same statistics using G18 and W20 as reference are also provided in SI Figs. SI-4. All models outputs have been gridded to the 1°grid of the reference to compute these statistics. All models show similar seasonal variations across all three statistics. Biases (negative except for two months of CNRM-ESM2-1) and RMSE are significantly higher during the austral summer (November to February), probably due to the high DMS concentration featured in L11

355
around Antarctica, that no model agrees with. The same explanation holds for northern hemisphere summer months (May to September), where the bias and RMSE are slightly larger than during the other months of the year, because the summer maximum of DMS concentration in all models is less intense and has a smaller spatial extent (Fig.3).
Pattern correlations have low values (lower than 0.2) in February-March and again in September to November, which can be related to the smoother features and inter-hemisphere DMS gradients during these months. Conversely, during the summer 360 months in both hemispheres, even if the models do not reach such elevated DMS values as in L11, the stronger concentration gradients contribute to a higher correlation. When compared to G18 and W20 ( Fig. SI-4), the pattern correlations show smoother seasonal variations.
Overall, the MMM has the lowest RMSE in nearly all months, with significantly lower values than each model when compared to G18 and W20 ( Fig. SI-4). The MMM and NorESM2-LM also have the best correlation coefficients with L11 in 365 almost all months, with coefficients higher than 0.4 during seven months of the year. They also have the best pattern correlations when compared to G18 and W20.

Seasonal variation analysis
Further to the spatial analysis presented in the previous section, we now focus on the analysis of seasonal variations. For that purpose, we used the same ocean partitioning into 54 biogeographical provinces (Longhurst, 2007) that Lana et al. (2011) used 370 to built the L11 climatology (see the description in Sect. 2.2.1 and Appendix A2). These 54 provinces are also grouped into six biomes (polar N and S, westerlies N and S, trades and coastal), to bring further information regarding the models' strengths and weaknesses.
In Fig. 5 we present the seasonal variations, in each of the 54 ocean provinces, of the 4 models and MMM, along with L11, with its min-max range as a shaded envelope. This figure is inspired from that in Lana et al. (2011, Fig. 2 Table 5. This figure shows the skills of each model in reproducing the seasonal cycle for each region. The seasonal cycle in the northern Atlantic (regions 2-6 and 11) is relatively well reproduced regarding the timing and duration of seasonal maximum, but the amplitude is generally lower in the models. The agreement is also satisfying in the northern Pacific (regions 30, 32, 33, 44) and in the regions located south of ∼ 40°S (regions 51-54). Equatorial regions are characterised by a weak seasonality and 385 low DMS concentrations, and models poorly reproduce the climatology in these provinces (regions 9, 17 and 41 for instance).
In Table 5, the regions are sorted according to the six biomes, which helps at identifying general model behaviour. It is noteworthy that the seasonal cycle in most polar and westerly provinces of both hemispheres is well captured by all models.
Conversely, all models poorly reproduce the seasonal cycle in most trade wind regions. Regarding coastal regions, some are rather well reproduced regarding their seasonal cycles, while others not. There is a relatively good consistency between models: 390 in coastal regions where the seasonal cycle is well (respectively poorly) reproduced, this is generally similar for all (or all but one) models. When looking in detail at the coastal regions, it is clear that model skills are better in coastal regions located in high latitudes (for instance region 11, 15, 20, 43, 44, 50) than in those located at low latitudes, which is consistent with the conclusions regarding the other biomes.
Among the four models, none appears to have significantly better skills than the other regarding the seasonal cycle. The   (2007), for the models as in Fig. 1 (color lines) and L11 (black line). Also plotted are other data of L11: (1) minimum and maximum as light grey envelopes; and (2) "Pixel" data, i.e. raw data binned on the 1°×1°grid, as diamonds (note that they are not identical to the data plotted in Fig. 2 of Lana et al. (2011), see text for details). The oceanic region numbering (54 regions) is repeated on the lower right corner map, together with colours according to Longhurst (2007) for the six biomes (dark and light blue: polar N and S, dark and light green: westerlies N and S, orange: trades, yellow: coastal). Several hypotheses can be proposed to explain the poorer correlation of the modelled seasonal cycles with the observations at low latitudes. First, it was already shown that models present some features at low latitudes that do not agree with climatologies.

400
For instance, the strong enhancement of DMS concentration in the eastern equatorial Pacific found in all models except CNRM-ESM2-1. While this model feature leaves an imprint in the annual mean (see Fig. 1), the monthly analysis shows that it is mostly a spring (February-May) phenomenon (see Fig. 3). Indeed, when looking at the corresponding regions 39 and 40 (to a lesser extent, the Atlantic region 7), it is clear that this typical model behaviour can explain the poor temporal correlation. The second argument to explain the low correlation in equatorial/subtropical latitudes, is that the seasonality is very weak around 405 the Equator (Fig .3). As noted by previous authors (see for instance Lana et al., 2012, Fig. 4), this weak seasonality can be responsible for the low or even negative correlation, since the model uncertainties may have a larger amplitude than the seasonal variability. The third explanation is that uncertainties also originate from the climatology itself. To investigate this effect, we performed the same province-based seasonality analysis using W20. The corresponding table built with the same presentation as Table 5 is shown in the SI (Table SI-5). While trades regions still show lower correlations than higher latitudes provinces 410 (polar and westerlies), the overall agreement is significantly better than with L11, which suggests that the disagreement visible in Table 5 can be partly attributed to L11 uncertainties. The flux of DMS features spatial patterns and seasonal cycles that stem from both the surface ocean concentration and the 435 wind speed. To help understand how these main drivers impact the resulting flux, the maps of annual wind speed and the zonal monthly wind speed are shown in the Supplementary (Fig. SI-6).
Overall, the maps of annual mean flux show a large spatial variation, which mirror in part the patterns of annual mean concentration (see Fig. 1 global emission as compared to other models, and will be further explained in the next subsection. Lastly, Fig. 7 also confirms that the DMS emission in UKESM1-0-LL across the 30°N -30°S band is significantly lower than the other models, but also lower than all the other estimates based on climatologies. This results in global annual DMS emissions lower than those of the other models (see also Table 6). Min, area-weighted median and maximum are provided for each panel.

Seasonal cycle analysis
465 Figure 8 shows the seasonal cycle of the DMS flux. First, a year-round low emission in a thin latitude band around the Equator is featured by all models and CAMS19. This is clearly related to the low wind speed in the ITCZ (Inter Tropical Convergence Zone), which is slightly more pronounced in CNRM-ESM2-1 and MIROC-ES2L (see Fig. SI-6). This figure also shows that  30°N and 50°N, we see that CNRM-ESM2-1 and MIROC-ES2L compute significantly higher surface wind speed during winter months than the other two models. Apart from that of Liss and Merlivat (1986), this difference is further amplified 475 by the quadratic dependence of wind speed in the flux parameterisations. In this latitude band and during winter months, CNRM-ESM2-1 also predicts DMS concentration (around 2 nM) which is nearly twice as much as in the other models. The combination of this relatively high concentration and high wind speed thus results in the highest emission in winter for CNRM-  Fig. 7, are also given. a : the range results from various wind and SST products used in the flux calculation. Data from Table 4 of Kettle and Andreae (2000). b : the estimated range depends on the assigned bias in DMS concentration as derived from remote-sensing measurements (Galí et al., 2018); c : the flux estimated by Galí et al. (2018) (2000)

Comparison with other studies
To conclude this section, Table 6 summarises global mean emissions of the four CMIP6 models along with several estimates from other studies for the modern period. Figure 9 also provides a convenient way to compare models in a glance regarding their annual global mean concentration, and emission. This Table does  other studies also using the flux parameterisation of Nightingale et al. (2000). This is consistent with the rather low global mean DMS concentration in MIROC-ES2L (1.77 nM, Table 3 4 Historical and future evolution 4.1 Global and regional trends of DMS concentration and flux Figure 9 shows the evolution of global annual mean DMS concentration, and global total DMS flux over the entire historical 515 and ssp585 period (1850-2100). Both variables are relatively stable over the historical period, with the onset of significant trends between 1970 and 1980. At the end of the century, the spread of these two quantities is larger than at the end of the historical period (more than twice as large for the concentration). The interannual variability of the ensemble mean is larger for NorESM2-LM due to its lower ensemble size (see Table 2). The ensemble spread increases with the number of realisations.  Table 2). The Pearson correlation coefficient between both time series is indicated for each model on the bottom plot.
Each model also has distinct variability, which is especially noticeable in the case of CNRM-ESM2-1, whose spread in the 520 DMS flux is significantly larger than that for UKESM1-0-LL although the latter has more realisations.   Table 7. It reveals that for each model, 530 whatever the sign of the trend, the relative change in flux (−4 to 13.8 %) is shifted towards positive values compared to the relative change in concentration (−12.3 to 10 %), thus mitigating negative flux trends, but reinforcing positive ones. This is likely explained by the positive dependence of the gas-exchange parameterisations on the SST, which is sharply increasing in ssp585 simulations (see the timeseries of modelled SST in Fig. SI-7). Conversely, the mean annual wind speed over the oceans shows no or a weakly negative trend, depending on the model (Fig. SI-7), and thus cannot explain an increased DMS flux.

535
Further investigations would be required to evaluate these effects on a finer spatial and temporal scale, however a more detailed analysis of the drivers involved in flux parameterisations is beyond the scope of this study. Notwithstanding the uncertainties with regards to the other drivers, the major role of the DMS concentration to explain the trend in DMS flux is an important result which confirms and goes farther than the conclusion of Tesdal et al. (2016), who showed that in modern climate the global total flux of DMS depends primarily on the global mean DMS concentration. To gain further insight into the respective trends of DMS concentration and flux, Fig. 10 compares both trends on the six ocean biomes. Overall, the behaviour depicted at the regional scale is similar to what is seen at a global scale: when a trend of DMS concentration is present, the trend of DMS flux has generally the same sign for most models and biomes. A few exceptions to this occur when other processes are at play, and overcome the DMS concentration as main driver of the flux.
This is namely the case in polar biomes, where the fraction of sea-ice cover is expected to be an important driver due to the 545 linear scaling of DMS emission to the ice-free water fraction (see Sect. 3.2). To help understand the underlying mechanisms, Figure  are not necessarily comparable to ours with respect to scenarios, model type and methods, we summarize the results of these studies regarding global DMS trends. An early study by Gabric et al. (2004) using a 3×CO 2 scenario found an annual mean  DMS flux increase by 14 %. Bopp et al. (2003) conclude that under 2×CO 2 (assuming a 1 % increase per year), the global DMS emission increases by 2 %, and they further observe that there are large spatial heterogeneities (flux change ranging from −15 to 30 % in zonal mean).  addressed the effect of upper mixing shoaling induced by global warming 580 on the flux of DMS, and found a small increase by 1.2 %. Kloster et al. (2007) performed a global analysis with HAMOCC, under scenario SRES A1B, and found a decrease in the global DMS flux by 10 %. They also concluded that the simulated changes have large spatial variations. Their results (see Kloster et al., 2007, Fig. 4) indeed show negative trends almost all over the oceans, apart from the Arctic and in a thin band around Antarctica. Some features are slightly similar to those displayed by NorESM2-LM, which is not surprising given the similarity in the ocean carbon cycle models. Wang et al. (2018)  focused specifically on regional trends, such as in the Southern Ocean (Cameron-Smith et al., 2011;Qu et al., 2020). Both studies reported a very large increase in DMS flux, however this regional behaviour, which strongly depends on the sea-ice cover, cannot be generalised at the global scale. Last, we can also mention the sensitivity analysis performed by Wang et al.
(2020, Fig. 9). After they built the W20 climatology using the ANN (see Sect. 2.2.1), they individually changed the most influential variables (SST, MLD, PAR, SSS and three nutrients) and assessed the resulting change in concentration. The ANN 605 has a non-linear, positive or negative response to each of these variables, and this sensitivity analysis reveals that the overall trend would be a complex combination of multiple factors.
To conclude, there is no agreement in the literature with regards to the sign and amplitude of the trends of DMS concentration and flux in the future (Hopkins et al., 2020). Numerous processes are at play, and there is considerable uncertainty in the overall result. There is no general consensus either in the CMIP6 models regarding the DMS trends, neither on the regional patterns, The aim of this section is to answer whether other variables could give further insight on the most likely trend of DMS concentration, especially in low latitudes regions which determine the global trend as found in the previous section.
The biological control of the change in DMS concentration has been highlighted in previous modelling studies (e.g. Bopp et al., 2003;Kloster et al., 2007) and merits to be re-assessed here in a multi-model perspective. In the following, we investigate this relationship for the CMIP6 models using marine net primary productivity (NPP). Figure 13 shows how changes in ocean 620 DMS are associated with changes in NPP across in the four CMIP6 models over 1980-2009 and 2071-2100 (the total annual NPP is also plotted in Fig. SI-9, to compare with the similar plot for DMS concentration, Fig. 9). Table 8. Metrics of the scatter plots of Fig. 13, change in DMS concentration versus change in NPP: slopes of the linear regressions (10 −2 nM per g C year −1 ) and determination coefficients. Metrics are shown for the four models, over two periods (1980-2009 and 2071-2100) in global mean and for the 6 biomes, as in Fig. 13 Figure 13 and Table 8  confirmed for these models in the future (2071-2100), with larger differences and less scatter, which depict a robust relationship between both variables. Beside the qualitative agreement between these three models, the slopes of linear regressions are highly variable between models, between biomes, and between both periods (Table 8), thus considerably limiting the predictive capability of such relationship. In contrast, MIROC-ES2L shows no clear correlation between DMS and NPP, and even a weak negative correlation across the biomes in the future. This specific behaviour, contrasting with that of the other models, 630 stems from the parameterisation of Aranami and Tsunogai (2004), in which there is a loose biological control on the DMS  Table 8. Mind the different scales between both columns. concentration, through the positive relationship with the Chl which only occurs in high productivity zones (see Sect. 2.1.1 and Fig. SI-9 showing the timeseries of annual mean Chl, to compare with the similar plot for DMS concentration, Fig. 9).
Although the limited current knowledge about the NPP-DMSP-DMS relationships hampers our ability to constrain this emergent property, several lines of evidence tend to suggest that there is a positive correlation between NPP and DMS con-635 centration. Firstly, noting that some studies observed no correlation between DMS and Chl a (e.g., Wang et al., 2020, and references therein), a number of other studies showed positive correlations between NPP and DMS production: the link between NPP and DMSP is highlithed at the local scale (e.g.,  and at a basin wide scale (e.g., Uhlig et al., 2019), that between NPP and DMS concentration again at a basin wide scale in Osman et al. (2019), and the link between DMSP and DMS concentration has been described in several studies (e.g., Stefels, 2000;Yoch, 2002;Asher et al., 2017;Lizotte et al., 640 2017). Secondly, factorial experiments conducted by Wang et al. (2020) using an artificial neural network show that a 10 % decrease of Chl a leads to a reduction in DMS concentration in large open-ocean domains. Finally, previous modelling work of Bopp et al. (2003) and of Kloster et al. (2007) show that the response of the marine biology (i.e., declining NPP) is one of the prominent drivers of changes in DMS emissions. The first group of models (CNRM-ESM2-1, NorESM2-LM and UKESM1-0-LL) thus captures a relationship which is consistent with such ocean field experiments, while the response simulated in 645 MIROC-ES2L is not consistent with the current understanding of the DMSP production pathways by marine phytoplankton (Stefels et al., 2007).
Such relationship between changes in NPP and changes in DMS has consequences for future projections because it suggests that the overall model response in DMS concentration and emission will mirror changes in NPP. A recent study by Kwiatkowski et al. (2020, see Figs. 1e and 2o) synthesised the prediction of ten CMIP5 and thirteen CMIP6 models for several diagnostics, 650 and concluded to a relative change in NPP of −2.99 ± 9.11 % in ssp585 scenario (2080-2099) relative to the 1870-1899 mean (Kwiatkowski et al., 2020, Table 4). As compared to the CMIP5 generation of models, which predicted a relative change of −8.54 ± 5.88 % in scenario RCP8.5, the uncertainty has thus increased, and now covers positive trends. This confirms that the response of models in the low-latitude oceans remains highly uncertain, even regarding the sign of the trends, thus limiting the current ability to predict future changes of DMS concentration and emissions in these regions, and thus on a global scale. which are cells where 90 % or more of the surface is sea-ice free. We also show results considering all pixels of a given region to evaluate the importance of the sea-ice masking. Time series are shown over part of the CMIP6 historical period, 1950-2014, and this extends the G19 time series (1998)(1999)(2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015)(2016).
Several of the time series presented in Figure 14 show interannual variations whose amplitude depends on the model, with larger interannual variations for CNRM-ESM2-1 and NorESM2-LM, and smaller ones for the other two models. Time series 670 of ice-free extent are similar among models except for MIROC-ES2L which predicts significantly less sea-ice. However, the dynamics of sea-ice retreat from 1980 onwards is roughly consistent between models, with a median rate of +0.23×10 6 km 2 per decade. This Figure provides new insights regarding the behaviour of models in the Arctic. First, all models show higher DMS concentration over the ice-free pixels (dashed lines) as compared to the entire domains (full lines). This means that the models consistently predict lower DMS concentration below the sea-ice, in line with reduced photosynthetically active 675 radiation. We want to note here however that a number of recent studies highlighted the large DMS production of ice algae and acknowledged that models likely underestimate the contribution of bottom ice DMS (Hayashida et al., 2020, and references therein). The only exception is seen in the Atlantic sector for MIROC-ES2L, where the mean DMS concentration over ice-free pixels is lower than that in the whole sector. This is likely due to the specific negative correlation between DMS and MLD in the parameterisation of Aranami and Tsunogai (2004) (see Sect. 2.1.1). The MLD is expected to be thicker in the absence of 680 sea-ice, thus leading to lower DMS concentration. Because the fraction of ice-free water is much higher in the Atlantic sector (∼70 %) than in the non-Atlantic sector (∼30 %), this specific effect of the MLD is expected to leave a stronger imprint on the DMS concentration in the latter, thus explaining this exception.
As compared to the DMS concentration values found by G19 between 1998 and 2016 (in the 2.7-3.0 nM range in the Atlantic sector, in the 3.5-4.5 nM range in the non-Atlantic sector), MIROC-ES2L shows a rather good agreement, while 685 CNRM-ESM2-1 and NorESM2-LM display much lower and higher concentration values, respectively. UKESM1-0-LL has a concentration value in the Atlantic sector which agrees with that in G19, but the concentration in the non-Atlantic sector is roughly half that in G19. G19 further discussed the differences in biogeochemical and meteorological characteristics in both Arctic sub-sectors, to explain why DMS concentration is larger in the non-Atlantic sector. In particular, as the non-Atlantic sector includes the Siberian shelves, which seem to be quite productive owing to nutrient inputs from large rivers (Terhaar 690 et al., 2021), the G19 data may be biased high in the Siberian shelves due to optical interference of continental materials (Hayashida et al., 2020). While uncertainty in satellite DMS appears higher in the non-Atlantic sector, ESMs possibly struggle to capture the biogeochemical functioning in shallow Arctic seas, due to both too-low resolution and non-represented processes.
Notewithstanding the biases in models as compared to G19, only MIROC-ES2L and NorESM2-LM correctly capture this difference between sectors with higher DMS concentration in the non-Atlantic sector. Scatter plots of DMS emissions versus ice-free extent over the 1950-2014 period are shown in Figure 15, with related metrics 710 in Table 9, to provide further insight into this relationship. As in G19 (see Figs. 4H and 4I), information is also given over a fourth domain, the central basin with heavier sea-ice cover, whose area is subtracted here from the Atlantic and non-Atlantic sectors. Over the pan-Arctic region, determination coefficients (R 2 ) are largely higher in all CMIP6 models than those of G19.
This reflects the linear dependence of the flux to the free-water fraction in the models, though in observations additional factors such as ocean productivity can be invoked to explain scatter in the DMS emission versus sea-ice extent relationship (Galí et al.,715 2018; Lewis et al., 2020). Smaller interannual variability in models compared to satellite observations can also contribute to higher R 2 . Slopes of the relationship differ for the four models, with the smallest slope for UKESM1-0-LL (10.4 in Gg S per 10 % of free-water) and the largest slope for NorESM2-LM (31.8 Gg S per 10 % of free-water). To first order, these slopes result from the combination of the mean DMS flux and of the trend in sea-ice retreat. In total, the CMIP6 summertime DMS emissions extrapolated at 100 % sea-ice free water vary between 72 and 310 Gg S, enlarging the corresponding estimation of 720 144 ± 66 in G19. Because the current ice-free extent is significantly larger (in the 60-85 % range) over the Atlantic sector, the error at 100 % extrapolation is reduced, with emission estimates ranging between 27-50 Gg S. Conversely there is a huge discrepancy in the central sector (11-151 Gg S).
We further extended this analysis to the ssp585 scenario, using the 65-year long period from 2036 to 2100 so that both analyses rely onto the same time windows. Scatter plots of DMS emissions versus ice-free extent (see right column of Figure   725 15) confirm the linear relationships determined from the years 1950-2014 between these two fields (see previous paragraph).
Unlike in the 1950-2014 years, the Atlantic sector appears to be mostly free of ice during the summer months, thus the relationship between DMS flux (and thus emission) and ice-free extent is almost lost. Extrapolations of annual DMS emissions at 100 % ice-free extent for the 2036-2100 period (from 86 to 282 Gg S for the pan-Arctic region) are comparable to projections inferred from the 1950-2014 period (72 to 310 Gg S, see Table 9). In the non-Atlantic sector and central basin, the disagreement 730 in extrapolated values at 100 % ice-free water is reduced following the reduced extrapolation errors. All CMIP6 models thus depict a consistent evolution throughout the 21 st century in the Arctic region, where DMS emission is determined to first order by the ice-free extent, while other factors are of secondary importance. This modelled behaviour agrees well with the conclusions of Hayashida et al. (2020), who found that the decline of Arctic sea ice is associated with a quasi-linear positive trend of DMS flux.

Conclusions
In this study, we analyse surface ocean DMS concentration and flux into the atmosphere from four CMIP6 ESMs (CNRM-  2036-2100 years. DMS emissions at 100 % ice-free extent are indicated for better clarity when the extrapolated values exceed the y-axis maximum. The related metrics of these scatter plots are displayed in Table 9) Table 9. Metrics of the scatter plots of Fig. 15, DMS emissions versus ice free extent: slopes of the regression lines (Gg S per 10 percent of free-water extent), projected DMS emissions at 100 % free-water (Gg S year −1 ), and determination coefficients. Metrics are shown for the pan-Arctic (with sums of the individual regions in parentheses), Atlantic and non-Atlantic sectors (excluding the central basin), and the central basin as in Galí et al. (2019), for the four models. Values from Galí et al. (2019) are also presented. Our analysis of contemporary  climatologies of simulated surface DMS concentration shows that, overall, agreement is poor between models, and also between models and reference datasets, such as the L11 dataset. This is consistent with previous work (Tesdal et al., 2016). The use of multiple modern observational climatologies, L11, G18 and W20 sheds ad-745 ditional light on the multi-model performance analysis. As concluded by previous authors (see for instance Galí et al., 2018, Sect. 4.1), the widely used L11 climatology likely overestimates climatological surface DMS concentration at the spatial resolution of climate models due to the combination of scarce and biased sampling. This could in part explain the unanimous low bias of the CMIP6 models when compared to L11. Models show a better agreement with observation based estimates when compared to W20. The range of model global annual median DMS concentrations (1.39-1.90 nM) encompasses the W20 me-750 dian (1.65 nM), whereas that of L11 is 2.25 nM. Our work also shows that models have better spatial correlation with W20 as a reference dataset (coefficients from 0.13 to 0.46 for annual fields) than with L11 as the reference (coefficients from 0.08 to 0.26).
Analysis of the annual cycles in each of the 54 biogeographical provinces defined by Longhurst (2007) reveals that models better reproduce the annual cycles in mid to high latitudes (polar and westerlies biomes) than in low latitudes (trades biomes),

755
in agreement with past studies (e.g., Le Clainche et al., 2010). Coastal provinces seem to respond in a similar way, with the ones located in mid to high latitudes often displaying a better agreement between models and observations. We note, however, that annual cycles in low latitudes are less pronounced and this may partly explain weaker correlations. Coastal model deficiencies may also be associated to the coarse model grid resolutions and poor process representation of coastal marine biota and sediments.

760
The multimodel ensemble mean (MMM) shows good skills in reproducing spatial patterns and seasonal variability, and compares generally better with observational climatologies than individual models.
The comparison of marine DMS emissions confirms the importance of the air-sea flux parameterisation on the resulting flux. The estimates of DMS emissions relying on observed surface ocean DMS concentrations and state-of-the-art air-sea flux parameterisations range between 16-28 Tg S year −1 , while CMIP6 model estimates show a smaller consistent range (i.e., 765 16-24 Tg S year −1 ). As a consequence, the multi-model best estimate is 19±3 Tg S year −1 , which is 10 % lower than the 18-24 Tg S year −1 best estimate proposed by Tesdal et al. (2016), with an identical uncertainty range.
The comparison of trends of DMS fluxes and of DMS concentrations over the whole simulation period (historical + scenario; 1850-2100) reveals that the current generation of CMIP6 ESMs disagree on the sign of these trends. Two models (CNRM-ESM2-1 and MIROC-ES2L) simulate an increase in ocean DMS concentrations and emissions, whereas two other models 770 (NorESM2-LM and UKESM1-0-LL) predict a moderate decrease. As a consequence, our work shows that the lead-order uncertainty in the future evolution of marine DMS emissions has not been reduced in the current generation of models compared to that of previous modelling experiments (e.g. Bopp et al., 2003;Kloster et al., 2007;Halloran et al., 2010). On the contrary, there is no consensus on how the current generation of models simulate the long-term trends in DMS concentrations and emissions in low-latitude biomes. Further investigating the relationship between DMS concentration and biological productivity reveals that three models (CNRM-ESM2-1, NorESM2-LM and UKESM1-0-LL) predict a positive correlation between the trend in ocean surface DMS concentrations and the trend in marine primary production, while the 790 fourth model (MIROC-ES2L) displays no strong correlation between these variables. This raises questions regarding the ability of empirical parameterisations of DMS concentration to predict future evolution, since they have been calibrated in present conditions. Despite the qualitative agreement of the three models regarding the NPP-DMS relationship, the predictive ability is limited given the large uncertainties in the future evolution of marine primary production. The modelling challenge here is particularly vast as Kwiatkowski et al. (2020) shows that this uncertainty is larger in CMIP6 models than in CMIP5 models.

795
Although none of the marine biogeochemical models studied here are currently intended to represent specific taxa of marine phytoplankton, it is interesting to connect the most likely behavior of those taxa in response to climate change. For instance, Dani and Loreto (2017) suggest that climate change may reduce the range of latitudes where DMS-producer phytoplankton taxa thrive, and hence lead to a reduction in DMS emission to the atmosphere in warm low-latitudes oceans. Overall, our work shows that there is a major uncertainty in low-latitude ocean where the change in DMS concentration results from the interplay 800 of marine biology factors with many other environmental drivers (e.g., temperature, salinity, stratification, nutrient availability, acidification, large-scale circulation), which all may affect in both directions the trends in DMS concentration (Wang et al., 2020). Further analysis to disentangle the role of these factors is required, for instance along the lines of the meta-analysis of Galí and Simó (2015) that specifically addresses the issue of the "summer paradox". This would require important coordination among modellers to work in a multi-model perspective as only a few CMIP6 models include DMS and their DMS-related output 805 are limited and insufficient at present to conduct such analysis. In turn, this large uncertainty in DMS concentration results in uncertainty in marine DMS emission to the atmosphere. Progress in the representation of the biogeochemical, and particularly the lower trophic ecosystem, dynamics in the low-latitude oceans will improve our ability to tighten the range of uncertainty in marine DMS emissions and hence ultimately constrain the direction and the magnitude of the DMS-climate feedbacks.
The climatology of Lana et al. (2011) is available from the Surface Ocean -Lower Atmosphere Study (SOLAS) web site (https://www. bodc.ac.uk/solas_integration/implementation_products/group1/dms/, accessed 12 July 2019). Four datasets are provided (in .csv format): the monthly DMS climatology; its upper and lower bounds based on an assessment of the uncertainty in the climatology estimate; and the original in-situ measurement dataset binned on a 1°×1°grid. These files were converted in netCDF format for comparison with model outputs.

815
The climatology of Galí et al. (2018) is available at https://doi.org/10.5281/zenodo.2558511. Several grid resolution are provided, we only used the 1°×1°file (algorithm: globsat; chlorophyll product: CHL; euphotic layer product: KD490) provided in netCDF format. A few file structure adjustments were required to make it CF compliant and be able to compare with model outputs.
The climatology of Wang et al. (2020) is available at https://zenodo.org/record/3833233 on a regular 1°×1°grid. Apart from a format conversion from the provided .mat file to netCDF, this dataset was used without pre-processing.

820
The scripts used to process and plot the data are available upon request to the author. All plots presented in this paper have been produced with NCL v6.6.2.

43
Appendix A: Methods A1 Model data processing A1.1 Global analysis 825 The four studied models use tripolar grids in the oceans, none of the four being common to each other. Conversely, the climatologies are provided on regular 1°× 1°grids. Thus, using a common grid is required for all model-climatology comparisons, and MMM computation. The remapping is achieved using CDO (version 1.9.8: Schulzweida, 2019). While a conservative interpolation (operator remapcon) would be the most suited method to handle concentration or flux variables, it fails when applied to tripolar grids. Thus, we turned towards a distance-weighted interpolation (operator remapdis). A minor issue arises 830 after remapping ocean data, with pixels defined inland along the coastlines. Thus, a land-sea mask is systematically applied after any remapping step, in order to keep the surface integral correct. The relative error generated by these data processing steps has been evaluated to be in the ±0-2 % range.
The multimodel ensemble mean (MMM) is computed in pixels where at least three models have valid data. This criterion has been chosen so that no pixel in MMM is based on a single model (for instance, the Caspian Sea is only described in 835 UKESM1-0-LL), but to retain areas which would be described in all but one model (the Red Sea and Persian Gulf are not described in the ocean model of MIROC-ES2L).
Readers may note differences in metrics provided in this article and metrics provided in other literature (e.g., mean and median of the L11 climatology displayed in Table 3 and that in Galí et al. (2018, Sect. 3.2.1)). We checked the numbers we present and we are confident in our processing done with the CDO and NCL tools. Note that differences in median values may 840 arise as we compute area-weighted medians.

A1.2 Arctic analysis
In the study of Galí et al. (2019), the mean DMS concentration and flux is computed on ice-free pixels only, since both are derived from satellite observations. In order to provide comparable results, we also computed the mean over ice-free pixels only (with the same criterion as Galí et al. (2019) of >90 % free-water) along with the regular area-weighted mean. We used 845 monthly sea-ice concentration datasets from each model to compute the means over ice-free pixels. We paid special attention to the calculation of the mean, which is thus done on 3D arrays (2D in space and 4 months per year) with missing data in pixels where the sea-ice cover exceeds the chosen threshold. The mean is calculated in a single step using a weighting that combines pixel area and available months duration (if any).
A special case arose for NorESM2-LM, for which the ocean grid has one more row (j=385) than the sea-ice grid (j=384).

850
In order to match both grids to apply a sea-ice mask over DMS data, we were advised (Yanchun He, Mats Bentsen, personnal communication, June 2020) to remove the last row using cdo selindexbox,1,360,1,384. When doing so, a warning is displayed since both (resized) grids do not share exactly the same coordinates.
In Fig. 15, the central basin in the Arctic is defined by Ardyna et al. (2013), which is a revised partitioning based on the work by Spalding et al. (2007) and the WWF (World Wildlife Fund) agency. The mask file (in netcdf format) of this partitioning was 855 provided by M. Galí.

A2 Longhurst oceanic provinces
As presented in Sect. 2.2.1, the climatology of Lana et al. (2011) is built upon a "first-guess analysis", which consists of partitioning all available in-situ data into ocean biogeographic provinces. The partitioning used by Lana et al. (2011) is that proposed by Longhurst (2007) which consists of 54 provinces. A shapefile defining these provinces is available at https: 860 //www.marineregions.org/downloads.php#longhurst. Note that another version of this global ocean partitioning, including 56 provinces (plus a 57 th one located in the Chesapeake Bay) is often referred to, and corresponds to an older version (Longhurst et al., 1995). However, most provinces are identical in both versions, and only a handful of provinces differ (two provinces along the Californian coast, OCAL and CCAL, are merged in a single province in the latest version, and the partitioning of the northern North Pacific differs).

865
We are aware of more recent work about ocean partitioning into biomes that provides both updated static partitioning and dynamic partitioning (Reygondeau et al., 2013;Fay and McKinley, 2014). Such dynamic partitioning accounts for yearly variation of the environmental and biogeochemical characteristics of the oceans, and thus offers a more accurate representation of the biomes. However, since our study uses the climatology developed by Lana et al. (2011) as the main reference, we decided to keep the same static partitioning for the biomes-based analysis.

870
Author contributions. JB, MM, PN and RS designed the study; JB performed research and analyzed data; JB, MM and RS wrote the paper with significant inputs from all coauthors.
Competing interests. The authors declare no competing interests.