Increase in ocean acidity variability and extremes under increasing atmospheric CO 2

. Ocean acidity extreme events are short-term peri-ods of relatively high [H + ] concentrations. The uptake of anthropogenic CO 2 emissions by the ocean is expected to lead to more frequent and intense ocean acidity extreme events, not only due to changes in the long-term mean but also due to changes in short-term variability. Here, we use daily mean output from a ﬁve-member ensemble simulation of a comprehensive Earth system model under low- and high-CO 2 emission scenarios to quantify historical and future changes in ocean acidity extreme events. When deﬁning extremes relative to a ﬁxed preindustrial baseline, the projected increase in mean [H + ] causes the entire surface ocean to reach a near-permanent acidity extreme state by 2030 under both the low-and high-CO 2 -emission scenarios. When deﬁning extremes relative to a shifting baseline (i.e., neglecting the changes in mean [H + ]), ocean acidity extremes are also projected to increase because of the simulated increase in [H + ] variability; e.g., the number of days with extremely high surface [H + ] conditions is projected to increase by a factor of 14 by the end of the 21st century under the high-CO


Introduction
Since the beginning of the industrial revolution, the ocean has absorbed about a quarter of the carbon dioxide (CO 2 ) released by human activities through burning fossil fuel and altering land use (Friedlingstein et al., 2019). Oceanic uptake of anthropogenic CO 2 slows global warming by reducing atmospheric CO 2 but also leads to major changes in the chemical composition of seawater through acidification (Gattuso and Buddemeier, 2000;Caldeira and Wickett, 2003;Orr et al., 2005;Doney et al., 2009 sat , i.e., the product of calcium and carbonate ion concentrations relative to the product at saturation. Undersaturated waters with < 1 are corrosive for calcium carbonate minerals. Each type of calcium carbonate mineral has its individual saturation state due to different solubilities, e.g, C for calcite and A for arag-onite. Over the last four decades the surface ocean pH has declined by about 0.02 pH units per decade (Bindoff et al., 2020). Continued CO 2 uptake by the ocean will further exacerbate ocean acidification in the near future (Caldeira and Wickett, 2003;Orr et al., 2005;Bindoff et al., 2020;Terhaar et al., 2020), with potentially major consequences for marine life (Doney et al., 2009) and ocean biogeochemical cycling (Gehlen et al., 2012).
Superimposed onto the long-term decadal-to centennialscale ocean acidification trend are short-term extreme events on daily to monthly timescales during which ocean pH and are much lower than usual (Hofmann et al., 2011;Joint et al., 2011;Hauri et al., 2013). These events can be driven by different processes, such as ocean mixing, biological production and remineralization, mineral dissolution, temperature and air-sea gas exchange variations, or a combination thereof (Lauvset et al., 2020). In eastern boundary upwelling systems, for example, short-term upwelling events and mesoscale processes can lead to low-surface-pH events and to short-term shoaling of the saturation horizon (i.e., the depth between the supersaturated upper ocean and the undersaturated deep ocean; Feely et al., 2008;Leinweber and Gruber, 2013). Ocean pH can also rapidly change as a consequence of microbial activity (Joint et al., 2011). Phytoplankton blooms and accompanying respiration drastically increase the partial pressure of CO 2 (pCO 2 ) and reduce pH in the thermocline (Sarmiento and Gruber, 2006). Such extreme events may have pH levels that are much lower than the mean pH conditions projected for the near future (Hofmann et al., 2011).
Most of the scientific literature on ocean acidification has focused on gradual changes in the mean state in ocean chemistry (Orr et al., 2005;Bopp et al., 2013;Frölicher et al., 2016;Terhaar et al., 2019b). However, to understand the full consequences of ocean acidification on marine organisms and ecosystem services, it is also necessary to understand how variability and extremes in ocean acidity change under increasing atmospheric CO 2 (Kroeker et al., 2020). The ability of marine organisms and ecosystems to adapt to ocean acidification may depend on whether the species have evolved in a chemically stable or a highly variable environment (Rivest et al., 2017;Cornwall et al., 2020). Furthermore, if the frequency and intensity of short-term extreme events strongly increase, in addition to the longterm acidification, some organisms may have difficulties in adapting, especially if key CO 2 system variables cross some critical thresholds, e.g., from calcium carbonate supersaturation to undersaturation. Key plankton species such as coccolithophores (Riebesell et al., 2000), foraminifera, and pteropods (Bednaršek et al., 2012) were found to be adversely affected by low carbonate ion concentrations. After only several days of being exposed to waters which are undersaturated with respect to aragonite, some species such as pteropods already show reduced calcification, growth, and survival rates (Kroeker et al., 2013;Bednaršek et al., 2014).
Carbonate system variability also plays a role in shaping the diversity and biomass of benthic communities (Hall-Spencer et al., 2008;Kroeker et al., 2011). In laboratory experiments in which deep-water corals are exposed to low-pH waters for a week, some corals exhibit reduced calcification, while recovery may be possible when the low-pH condition persists for several months, stressing the importance of high-frequency variability and short-term acidification events (Form and Riebesell, 2012). There is also growing evidence that the organism response to variability in ocean acidity could change with ocean acidification (Britton et al., 2016). Therefore, understanding the temporal variability of ocean carbonate chemistry and how that will change is important for understanding the impacts of ocean acidification on marine organisms and ecosystems (Hofmann et al., 2011).
Changes in extremes can arise from changes in the mean, variability, or shape of the probability distribution (Coles, 2001). There exists no general accepted definition of an extreme event beyond the common understanding that an extreme is rare (Weyer, 2019). As a result, many different approaches exist to define extreme events (Smith, 2011). If a relative threshold (e.g., quantile) is used to define an extreme event, it is important to distinguish between extreme events that are defined with respect to a fixed reference period or baseline, or if the reference period or baseline moves with time. If the baseline is fixed, the changes in the mean state as well as changes in variability and higher moments of the distribution contribute to changes in extreme events (e.g., Fischer and Knutti, 2015;Frölicher et al., 2018;Oliver et al., 2018). However, if a shifting baseline is used, changes in the mean state do not contribute to changes in extreme events (e.g., Stephenson, 2008;Seneviratne et al., 2012;Zscheischler and Seneviratne, 2017;Cheung and Frölicher, 2020;Vogel et al., 2020). In this case, changes in extremes arise solely due to changes in variability and higher moments of the distribution . This latter definition ensures that values are not considered extreme solely because the baseline changes under climate change (Jacox, 2019;Oliver et al., 2019). Whether extreme events should be defined with respect to a fixed baseline or with respect to a shifting baseline depends on the scientific question. For example, the shifting-baseline approach may be more appropriate when the ecosystems under consideration are likely able to adapt to the mean changes but not to changes in variability (Seneviratne et al., 2012;Oliver et al., 2019). Here, we use both approaches, with a special focus on the analysis of ocean acidity extremes with respect to shifting baselines.
Under continued long-term ocean acidification (i.e., changes in the mean), one can expect that extreme events in [H + ] and , when defined with respect to a fixed reference period or baseline, will become more frequent and intense (Hauri et al., 2013). In addition to the changes in the mean, recent studies suggest that the seasonal cycles in [H + ] and are also strongly modulated under elevated atmospheric CO 2 . Higher background concentrations of dissolved inor-ganic carbon and warmer temperatures produce stronger departures from mean state values for a given change in pertinent physical or chemical drivers for [H + ] and weaker departures for (Kwiatkowski and Orr, 2018;Fassbender et al., 2018). Other studies have also addressed the changes in the seasonal cycle of pCO 2 (Landschützer et al., 2018;Gallego et al., 2018;McNeil and Sasse, 2016;Rodgers et al., 2008;Hauck and Völker, 2015). Over the 21st century and under a high-greenhouse-gas-emission scenario, Earth system model simulations project that the seasonal amplitude in surface [H + ] will increase by 81 %, whereas the seasonal amplitude for aragonite saturation state ( A ) is projected to decrease by 9 % globally on average (Kwiatkowski and Orr, 2018). Recent observation-based estimates as well as theoretical arguments support these projected increases in seasonality for [H + ] and pCO 2 (Landschützer et al., 2018;Fassbender et al., 2018). Thus, when extremes are defined with respect to a shifting baseline (i.e., mean state changes are neglected), the frequency and intensity of extreme [H + ] events will likely increase due to increases in variability.
Unlike for marine heatwaves Collins et al., 2020) and extreme sea level events (Oppenheimer et al., 2020), little is known about the characteristics and changes of extreme ocean acidity events and, if so, only on seasonal timescales (Kwiatkowski and Orr, 2018). A global view of how extreme events in ocean chemistry will unfold in time and space and a mechanistic understanding of the relevant processes is missing. This knowledge gap is of particular concern as it is expected that extreme events in ocean acidity, defined with respect to both a fixed and a shifting baseline, are likely to become more frequent and intense under increasing atmospheric CO 2 . Given the potential for profound impacts on marine ecosystems, quantifying trends and patterns of extreme events in ocean acidity is a pressing issue.
In this study, we use daily mean output of a five-member ensemble simulation under low-and high-CO 2 -emission scenarios of a comprehensive Earth system model to investigate how the occurrence, intensity, duration, and volume of [H + ] and extreme events change under rising atmospheric CO 2 levels. Extreme events defined with respect to both a fixed preindustrial and a shifting baseline are assessed, but the main focus is on extremes with respect to a shifting baseline and how these are affected by variability changes.

Model and experimental design
The simulations used in this study were made with the fully coupled carbon-climate Earth system model developed at the NOAA Geophysical Fluid Dynamics Laboratory (GFDL ESM2M) (Dunne et al., 2012. The GFDL ESM2M model consists of ocean, atmosphere, sea-ice, and land modules and includes land and ocean biogeochemistry. The ocean component is the Modular Ocean Model version 4p1 (MOM4p1), with a nominal 1 • horizontal resolution increasing to 1/3 • meridionally at the Equator, with a tripolar grid north of 65 • N, and with 50 vertical depth levels. The MOM4p1 model has a free surface, with the surface level centered around about 5 m depth, and the spacing between consecutive levels is about 10 m down to a depth of about 230 m (Griffies, 2009) with increasing spacing below. The dynamical sea-ice model uses the same tripolar grid as MOM4p1 (Winton, 2000). The atmospheric model version 2 (AM2) has a horizontal resolution of 2 • × 2.5 • with 24 vertical levels (Anderson et al., 2004). The land model version 3 (LM3) simulates the cycling of water, energy, and carbon dynamically and uses the same horizontal grid as AM2 (Shevliakova et al., 2009).
The ocean biogeochemical and ecological component is version two of the Tracers of Ocean Phytoplankton with Allometric Zooplankton (TOPAZv2) module that parameterizes the cycling of carbon, nitrogen, phosphorus, silicon, iron, oxygen, alkalinity, lithogenic material, and surface sediment calcite (see Supplement in Dunne et al., 2013). TOPAZv2 includes three explicit phytoplankton groupssmall, large, and diazotrophs -and one implicit zooplankton group. The ocean carbonate chemistry is based on the OCMIP2 parameterizations (Najjar and Orr, 1998). The dissociation constants for carbonic acid and bicarbonate ions are from Dickson and Millero (1987), which are based on Mehrbach et al. (1973), and the carbon dioxide solubility is calculated according to Weiss (1974). Total alkalinity in TOPAZv2 includes contributions from phosphoric and silicic acids and their conjugate bases. TOPAZv2 also simulates diurnal variability in ocean physics as well as in phytoplankton growth. While diurnal variations in open-ocean pH are therefore simulated to some extent, we do not expect the model to fully capture the high diurnal variability in seawater chemistry, especially in coastal regions with large biological activity (Kwiatkowski et al., 2016;Hofmann et al., 2011), due to its relatively coarse resolution and simple biogeochemical model.
We ran a five-member ensemble simulation covering the historical 1861-2005 period, followed by a high (RCP8.5; RCP: Representative Concentration Pathway) and a lowgreenhouse-gas-emission scenario (RCP2.6) over the 2006-2100 period with prescribed atmospheric CO 2 concentrations. RCP8.5 is a high-emission scenario without effective climate policies, leading to continued and sustained growth in greenhouse gas emissions (Riahi et al., 2011). In the GFDL ESM2M model, global atmospheric surface temperature in the RCP8.5 ensemble is projected to increase by 3.24 • C (ensemble minimum of 3.17 • C to ensemble maximum of 3.28 • C) between the preindustrial period and 2081-2100. The RCP2.6 scenario represents a low-emission, highmitigation future (van Vuuren et al., 2011) with a simulated warming in the GFDL ESM2M model of  Increase in ocean acidity variability and extremes 1.26) • C. The five ensemble members over the historical period were initialized from a multicentury preindustrial control simulation that was extended with historical land use over the 1700-1860 period (Sentman et al., 2011). The five ensemble members were generated by adding different sea surface temperature (SST) disturbances of the order 10 −5 • C to a surface grid cell in the Weddell Sea at 70.5 • S, 51.5 • W on 1 January 1861 (Wittenberg et al., 2014;Palter et al., 2018). Although the ocean biogeochemistry is not perturbed directly, [H + ] and differences between the ensemble members spread rapidly over the globe. On average, the ensemble members can be regarded as independent climate realizations after about 3 years of simulation for surface waters and after about 8 years at 200 m . Neither the choice of the perturbation location nor the choice of the perturbed variable has a discernible effect on the results presented here (Wittenberg et al., 2014). In addition, an accompanying 500-year preindustrial control simulation was performed.

Extreme-event definition and characterization
We analyze daily mean data of [H + ] and aragonite saturation state A in the upper 200 m of the water column. [H + ] is on the total scale and hence the sum of the concentrations of free protons and hydrogen sulfate ions. We define an event as a [H + ] extreme event when the daily mean [H + ] exceeds the 99th percentile, i.e., occurring once every 100 d. Similarly, we define a A extreme event when the daily mean A falls below the 1st percentile. The percentiles are calculated for each grid cell from daily mean data of the 500-year preindustrial control simulation. In contrast to absolute thresholds, relative thresholds, such as those used here, take into account regional differences in a variable's mean state, variance, and higher moments. Events that are defined based on relative thresholds have the same probability of occurrence across the globe in the period in which they are defined (e.g., preindustrial period; see also Frölicher et al., 2018).
We assess changes in [H + ] and A extreme events when they are defined with respect to both a fixed preindustrial baseline and a shifting baseline. Under the fixed baseline approach, the secular trends as well as changes in variability and the higher moments of the distribution impose changes in extreme events. Under the shifting-baseline approach, which is the focus of this study, a value is considered extreme when it is much higher or lower than the baseline that undergoes changes due to secular trends in the variable. Thus, changes in the different extreme-event characteristics are only caused by changes in variability and the higher moments of the distributions. To define the extreme events with respect to the shifting baselines, we subtract the secular trends in [H + ] and A at each grid cell and in each individual ensemble member prior to the calculation of the different extreme-event charac-teristics based on the preindustrial percentiles (depicted for one grid cell in Fig. 1). The secular trend is calculated as the five-member ensemble mean, which has been additionally smoothed with a 365 d running mean to keep the seasonal signal in the data (further information in Appendix A). The removal of the secular trend ensures that the mean state in the processed data stays approximately constant while dayto-day to interannual variability can change over the simulation period (Fig. 1).
We calculate four extreme-event metrics: (a) the yearly extreme days (in days; number of days per year above the 99th percentile for [H + ] and below the 1st percentile for A ), (b) the annual mean duration (in days; the average number of days above the 99th percentile for [H + ] and below the 1st percentile for A of single events within a year), (c) the annual mean maximal intensity (in nmol kg −1 or A unit; maximum [H + ] or A anomalies with respect to the percentile threshold over the duration of a single extreme event and then averaged over all events within a year), and (d) the mean volume covered by individual extreme events in the upper 200 m (in km 3 ; mean volume of 3D clusters of connected grid cells that are above the 99th percentile for [H + ] or below the 1st percentile for A , calculated using the measure.label function from the scikit-image library for Python for each day; these daily means are then averaged annually). The yearly extreme days, duration, and maximal intensity are calculated for individual grid cells at the surface and at 200 m. While the truncation of extremes between years alters the results for duration and maximal intensity, it allows for the calculation of annual extreme-event characteristics. We focus our analysis not only on the surface, but also on 200 m to study changes in extreme events within the seasonal thermocline. Most organisms susceptible to ocean acidification are found in the upper 200 m, such as reef-forming corals and calcifying phytoplankton.

Decomposition of [H + ] variability into different variability components
We use three steps to decompose the total temporal variability in [H + ] into interannual, seasonal, and subannual variability (Fig. 2). In a first step, we calculate the climatological seasonal cycle from the daily mean data by averaging each calendar day over all years in the time period of interest. Seasonal variability is then identified with the time-series variance of this 365 d long seasonal cycle. The secular trend in the daily mean data has been removed with the five-member ensemble mean before doing the analysis. In a second step, we subtract the seasonal cycle from the data and estimate the spectral density (Chatfield, 1996) of this residual time series using the periodogram function from the scipy.signal Python library. In a third step, we calculate the variance arising from variations on interannual and subannual timescales from the spectral density to obtain interannual and subannual variability (further information is given in Appendix B). Following this methodology, subannual variability comprises all variations in daily mean data with periodicities of less than a year that are not part of the seasonal cycle.

Taylor expansion of [H + ] and A variability changes
To understand the processes behind the simulated changes in the variabilities of [H + ] and A , we decompose these changes into contributions from changes in temperature (T ), salinity (S), total alkalinity (A T ), and total dissolved inorganic carbon (C T ). Assuming linearity, the difference of [H + ] from its mean at time step i can be decomposed into contributions from the drivers by employing a first-order Taylor expansion, and analogously for A . The partial derivatives are evaluated at T , S, C T , and A T , i.e., the temporal mean values of the drivers in the period of interest. While it is important to take into account the climatological total phosphate and total silicate concentrations for calculating the partial derivatives (Orr and Epitalon, 2015), one introduces only small errors by neglecting variations in phosphate and silicate. The partial derivatives in Eq. (1) are evaluated using mocsy 2.0 (Orr and Epitalon, 2015). Using the Taylor decomposition (Eq. 1), one can for example express the seasonal variation in [H + ] as a function of the drivers' seasonal variations (Kwiatkowski and Orr, 2018). In this study, however, we analyze the timeseries variance of [H + ] and A that also includes variability on other timescales (see Sect. 2.2.2) and the drivers of its changes. From the Taylor approximation (Eq. 1) and the definition of variance (e.g., Coles, 2001), it follows that the variance of [H + ] can be written as a function of the partial derivatives with respect to the drivers (sensitivities), the standard deviations of the drivers, and their pairwise correlation coefficients: where the pairwise covariances are functions of the standard deviations and correlation coefficients according to cov(x, y) = σ x σ y ρ x,y , and the partial derivatives are again evaluated at the temporal mean values T , S, C T , and A T . This methodology has also been used to propagate uncertainties in carbonate system calculations (Dickson and Riley, 1978;Orr et al., 2018) and to identify drivers of potential predictability in carbonate system variables . Based on Eq.
(2) and the analogous result for A , a change in variance of [H + ] and A can be attributed to changes in the sensitivities that arise from changes in the drivers' mean states, to changes in the drivers' standard deviations, and to changes in the pairwise correlations between the drivers. We do so by calculating the Taylor series of Eq.
(2) (further information in Appendix C). We then identify the [H + ] variance change from mean changes in the drivers as the sum of all terms in the expansion that describe the contributions of sensitivity changes to the overall change in variance ( s σ 2 H + ). Likewise, we identify the contribution from standard deviation changes in the drivers ( σ σ 2 H + ). We further group terms in the expansion that stem from simultaneous changes in the sensitivities and standard deviations ( sσ σ 2 H + ) and the remaining terms that arise either from correlation changes alone or mixed contributions from correlation changes and changes in sensitivities and standard deviations ( ρ+ σ 2 H + ). Since these four components contain all terms in the Taylor series, they exactly reproduce a change in variance represented by Eq. (2), We also assess the contributions to the four components from C T alone; from C T and A T ; and from C T , A T , and T . The equivalent procedure is also used to decompose variance change in A . Further information on the decomposition is given in Appendix C. For [H + ], the preindustrial 99th percentile threshold (horizontal blue line in panels a and b) is increasingly exceeded even when subtracting the ensemble-mean change, because [H + ] variability increases. In contrast, a reduction in A variability leads to a reduced undershooting of the preindustrial 1st percentile (d). Figure 2. The three-step decomposition of [H + ] variance into interannual, seasonal, and subannual variance, exemplified for a surface grid cell at 40 • N and 30 • W in the North Atlantic in the preindustrial control simulation. In a first step, the climatological seasonal cycle is determined (over the whole period, only 5 years are depicted here) and its variance is calculated. Note that the seasonal cycle in this grid cell has two minima and maxima. In a second step, the spectral density of the anomalies with respect to the seasonal cycle is calculated. In a third step, interannual and subannual variance is estimated from the spectral density.

Model evaluation
The focus of our analysis is on changes in variability in [H + ] and A . As observation-based daily mean data of the inorganic carbon chemistry at the global scale are not available, we limit the evaluation of the Earth system model simulation to the representation of the seasonal cycles of [H + ] and A , and especially on its changes over the 1982-2015 period. We developed an observation-based dataset for sur-face monthly [H + ] and A using monthly surface salinity, temperature, pCO 2 , and A T fields. Salinity and temperature data are taken from the Hadley Centre EN.4.2.1 analysis product (Good et al., 2013). A T is then calculated using the LIARv2 total alkalinity regression from salinity and temperature (Carter et al., 2018). For pCO 2 , we use the neuralnetwork-interpolated monthly data from Landschützer et al. (2016), which is based on SOCATv4 . Although not fully capturing pCO 2 variability in regions with only few observations (Landschützer et al., 2016), the pCO 2 dataset appears to be generally well suited for analyzing pCO 2 seasonality and changes therein (Landschützer et al., 2018). An exception is the Southern Ocean, where data-based pCO 2 products are uncertain due to sparse data in winter (Gray et al., 2018). [H + ] and A are then calculated from salinity, temperature, A T , and pCO 2 using the CO2SYS carbonate chemistry package (van Heuven et al., 2011). Uncertainties in the derived seasonal cycles for [H + ] and A that arise from uncertainties in the observation-based input variables are not quantified in this study.
In most regions, the GFDL ESM2M model captures the observation-based mean seasonal cycle in [H + ] and A quite well, in particular for A (the mean values of the seasonal amplitudes in Fig. 3). However, potential biases in the mean seasonal amplitudes do not directly have an effect on projected changes in extreme events, as we base the extremeevent definition on relative thresholds.
We then compare the simulated ensemble-mean trends in seasonal amplitude with the observation-based estimates (  Table 1), consistent with the observation-based estimates. While the estimates for the simulated trends are significantly larger than zero for all latitude bands, this is not the case for the observation-based trends in the equatorial region (10 • S-10 • N) and the northern low latitudes (10-40 • N) ( Table 1). The simulated [H + ] seasonality trends are significantly smaller (with 90 % confidence level) than estimated from observations in the northern high (40-80 • N; orange thick lines in Fig. 3a, b) and southern low latitudes (40-10 • S; blue thick lines in Fig. 3a, b), where the trends from the model ensemble are 0.031 ± 0.012 nmol kg −1 per decade and 0.035 ± 0.003 nmol kg −1 per decade, compared to the observation-based trends of 0.106 ± 0.040 nmol kg −1 per decade and 0.055 ± 0.014 nmol kg −1 per decade, respectively. The simulated ensemble-mean trends for the remaining latitude bands are not significantly different from the observation-based trend estimates.
For the seasonal amplitude of A , we find a significant negative trend in the observation-based data in the northern low latitudes and significant negative trends in the simulations in the northern and southern high latitudes (Table 1). The negative trends in seasonal amplitude in the simulations are significantly different from the observation-based trends in the northern high latitudes (−0.015 ± 0.004 vs. 0.002 ± 0.009 A units per decade) and in the southern high latitudes (−0.012 ± 0.002 vs. 0.000 ± 0.005 A units per decade).
In summary, taking into account previous evaluations of the mean states of [H + ] and A and the underlying drivers in the GFDL-ESM2M model Kwiatkowski and Orr, 2018), the model performs well against a number of key seasonal performance metrics. However, the model slightly underestimates past increases in seasonal amplitude of [H + ], especially in the northern and southern high latitudes. In contrast to the observation-based data, the model also projects negative trends in the A seasonal amplitude there. Nevertheless, the observation-based trends in the northern and especially southern high latitudes are uncertain because wintertime data are sparse there. Even though we lack the daily mean observation-based data to undertake a full assessment, it appears that the GFDL ESM2M model is adequate to assess changes in open-ocean variability of [H + ] and A and to assess changes in extreme events that arise thereof.

Results
We first briefly discuss the simulated changes in [H + ] and A extreme events when these events are defined with respect to a fixed preindustrial baseline period (Sect. 3.1). In Sect. 3.2 and 3.3, these results are contrasted with changes in extremes that are defined with respect to a shifting baseline, i.e., where the secular trends do not alter extreme events. In Sect. 3.4, variability changes are decomposed into subannual, seasonal, and interannual variability contributions. The processes leading to variability changes are analyzed in Sect. 3.5.
3.1 Global changes in extremes defined relative to a fixed preindustrial baseline  A1). Large increases in yearly extreme days are also projected for A when using a fixed preindustrial 1st percentile as a baseline (Fig. 4b). Similar to [H + ], the entire surface ocean is projected to approach a permanent A extreme state dur-   (Fig. 5a, ensemble ranges are given in Table 2). The maximal intensity is projected to increase from 0.08 to 0.12 nmol kg −1 (Fig. 5c, Table 2) and the duration from 11 to 15 d (Fig. 5e). Compared to preindustrial conditions, these changes correspond to a 173 % increase in the number of days per year, a 44 % increase in the maximal intensity, and a 45 % increase in the duration of [H + ] extreme events. The volume of individual events is projected to increase by 20 % over the historical period, from a typical volume of 2.7 × 10 3 km 3 , which is about 0.004 % of the total ocean volume in the upper 200 m (Fig. 5g), to 3.2 × 10 3 km 3 . Over the 21st century, extreme events in ocean acidity, defined with respect to a shifting baseline, are projected to further increase in frequency, intensity, duration, and volume (Fig. 5). By 2081-2100 under the RCP8.5 scenario, the number of [H + ] extreme days per year at the surface is projected to increase to 50 d (corresponding to a 1273 % increase relative to the preindustrial period). The maximal intensity is projected to increase to 0.38 nmol kg −1 (371 % increase), the duration to 32 d (199 % increase), and the volume to 13.9 × 10 3 km 3 (414 % increase).
At 200 m, the [H + ] extreme events in preindustrial conditions are in general more intense (0.17 nmol kg −1 ; Fig. 5d) and longer lasting (38 d; Fig. 5f) than at the surface. The stronger extreme events are caused by the overall larger variability at 200 m than at the surface in the preindustrial period. The longer duration is connected to the more pronounced contribution from interannual variability (see Sect. 3.4). However, projected relative changes over the historical period and the 21st century are smaller at 200 m than at the surface and with larger year-to-year variations across the ensembles. Under recent past conditions (1986)(1987)(1988)(1989)(1990)(1991)(1992)(1993)(1994)(1995)(1996)(1997)(1998)(1999)(2000)(2001)(2002)(2003)(2004)(2005), the number of extreme days per year at 200 m is 4.3 d per year (corresponding to an 18 % increase since the preindustrial period), the maximal intensity 0.20 nmol kg −1 (18 % increase), and the duration 46 d (21 % increase). By the end of the 21st century under the RCP8.5 scenario, the number of [H + ] extreme days per year is projected to increase to 32 d per year, the maximal intensity to 0.34 nmol kg −1 , and the duration to 99 d. Notably, extreme events in [H + ] are projected to become slightly less intense at 200 m than at the surface (0.34 vs. 0.38 nmol kg −1 ) by the end of the century under RCP8.5, even though they were more intense in preindustrial times at depth. In contrast, surface [H + ] extreme events remain shorter in duration at the end of the century than at 200 m.
Under the RCP2.6 scenario, the magnitude of changes in the different [H + ] extreme-event characteristics by the end of the century is substantially smaller than in the RCP8.5 scenario. This difference is especially pronounced at the surface (blue lines in Fig. 5). There, the number of extreme days per year, maximal intensity, and duration under the RCP2.6 are projected to be 46 % (44-47), 43 % (43-44), and 75 % (73-77) of that under the RCP8.5 scenario. At depth, the differences between the RCP2.6 and RCP8.5 scenario are less pronounced and only emerge in the second half of the 21st century. As opposed to the surface, the number of [H + ] extreme days per year and the maximal intensity at 200 m as well as the volume of events are projected to increase significantly even after the atmospheric CO 2 concentration stabilizes in RCP2.6 around year 2050. This delayed response in the subsurface is due to the relatively slow surface-to-subsurface transport of carbon. However, this is not the case for the duration, which slightly decreases in the second half of the 21st century at depth (Fig. 5f). This decrease in duration mainly occurs in the subtropics, where events generally last longer (Fig. A3b). It is connected to an increase in the contribution from high-frequency variability to total variability in those regions over that period.
In contrast to [H + ] extreme events, the number of yearly extreme days in A is projected to decrease over the historical period and during the 21st century under both the RCP8.5 and RCP2.6 scenarios ( Fig. 6a- Table A1) when the extreme events are defined with respect to a shifting baseline. The number of surface A extreme days per year by the end of this century is projected to be 63 % smaller under RCP8.5 and 39 % smaller under RCP2.6 compared to preindustrial conditions (ensemble ranges are given in Table A1). Projected changes at depth are less pronounced than at the surface, again with larger decreases under RCP8.5 than under RCP2.6. Despite this decline in extreme events when defined with respect to a shifting baseline, the long-term decline in the mean state of A still leads to more frequent occurrence of extreme low A events when defined with respect to a fixed baseline (see Sect. 3.1). Figure 5. Simulated changes in globally averaged [H + ] extreme-event characteristics over the 1881-2100 period following historical (black lines) and future RCP8.5 (red) and RCP2.6 (blue) scenarios. The extreme events are defined with respect to a shifting baseline. Yearly extreme days, maximal intensity, and duration are shown for the surface (a, c, e) and for 200 m (b, d, f). Volume is shown in (g). The thick lines display the five-member ensemble means, and the shaded areas represent the maximum and minimum ranges of the individual ensemble members.  3.3 Regional changes in extremes defined relative to a shifting baseline Surface [H + ] extremes that are defined with respect to a shifting baseline are projected to become more frequent in 87 % of the surface ocean area by the end of the 21st century under the RCP8.5 scenario. However, the projected changes in these ocean acidity extremes are not uniform over the globe ( For example, events become up to 1 nmol kg −1 more intense in the subtropical North Pacific and Atlantic, corresponding roughly to a 10-fold increase in intensity with respect to the preindustrial period. The largest relative increases in intensity are projected for the Arctic Ocean, the North Atlantic, and around Antarctica, where more-than-10-fold increases with respect to the preindustrial period are projected. Re-gions with large increases in the number of yearly extreme days tend to also show large increases in the duration of extreme events (Fig. 7e). The Arctic Ocean is an exception. Although the number of yearly extreme days increases strongly, the increase in duration is not as pronounced. This discrepancy is because extremes are already long lasting but rare at preindustrial times (Fig. A3). So even though extreme events are projected to occur each year by the end of the century under RCP8.5, the increase in duration is relatively small. At 200 m, the projected pattern of changes in yearly extreme days generally resembles that at the surface (Fig. 7b). The largest increases in yearly extreme days are projected for parts of the subtropics, the Southern Ocean, and the Arctic Ocean. In contrast to the surface, [H + ] extremes at 200 m are projected to become less frequent in the equatorial Atlantic, the northern Indian Ocean, the North Pacific, and in large parts of the Southern Ocean. The regions indicating a decline in [H + ] extremes at depth include also some of the eastern boundary current systems, such as the Humboldt, California, and Benguela Current systems. In most of these regions, extreme events are projected to disappear in the RCP8.5 scenario by the end of this century (gray regions in Fig. 7b). The largest increases in subsurface event intensity are projected in the subtropics (Fig. 7d), whereas the duration of [H + ] extremes is projected to increase strongly in many regions of the mid-to-high latitudes of both hemispheres (Fig. 7f). The projected increases in duration at 200 m are much larger than at the surface.
The increase in the number of extreme days per year, the maximal intensity, and the duration is smaller under RCP2.6 compared to RCP8.5 for most of the ocean (Fig. A2). The largest increases in occurrence of extremes under RCP2.6 are simulated for the Arctic Ocean, similar to under RCP8.5, and for parts of the Southern Ocean. The regions in the Southern Ocean where the occurrence of extreme events in [H + ] is projected to decrease largely overlap with those for RCP8.5, at the surface and at depth. On the other hand, unlike under RCP8.5, a decrease in extreme-event occurrence is only projected for a small fraction of the tropical oceans under RCP2.6.
While the decline in mean A generally leads to lower values in A and thus extreme events are becoming more frequent when defined with respect to a fixed preindustrial baseline (Sect. 3.1), extreme events in A are projected to become less frequent throughout most of the ocean when defined with respect to a shifting baseline (89 % of surface area under RCP8.5 at the end of the 21st century; Fig. 6c). In many regions, extreme events in A are projected to disappear by 2081-2100 under the RCP8.5 scenario (gray regions in Fig. 6c) when defined with respect to a shifting baseline. However, the number of yearly extreme days in A is projected to increase by 10 or more in the subtropical gyres, especially in the western parts of the subtropical gyres. At 200 m, no extreme events are projected for most of the ocean during 2081-2100 under RCP8.5 (Fig. 6d).

Decomposition of temporal variability in [H + ]
The changes in [H + ] extreme events defined with respect to a shifting baseline mainly result from changes in [H + ] variability. These variability changes may arise from changes in interannual variability, seasonal variability, and subannual variability. Thus, we decomposed the total variability into these three components (see Sect. 2.2.2). For the preindustrial period, the model simulates generally larger [H + ] variance at depth than at the surface (0.42 vs. 0.15 nmol 2 kg −2 , not shown). Seasonality has the largest contribution at the surface (81 % of total variance). At 200 m, interannual variability has the largest contribution (63 %), and also subannual variability is more important compared to the surface (15 % vs. 8 %).
Over the 1861-2100 period under the historical-RCP8.5 forcing, changes in seasonality clearly dominate the overall change in variability at the surface with 87 % contribution to the overall variance change in the global mean (Fig. 8b,  d). Changes in interannual variability (3 % contribution to overall variance change; Fig. 8a, d) and subannual variability (10 %; Fig. 8c, d) play a minor role. The largest increases in variability for all three variability types are projected for the northern high latitudes. Around Antarctica and the southern end of South America, large increases in seasonal variabil-ity are projected (Fig. 8b). In the tropical Pacific and parts of the Southern Ocean, decreases in interannual and seasonal variability are projected (Fig. 8a, b).
In contrast to the surface, changes in interannual and to a lesser extent subannual variability at 200 m are also important for explaining the overall changes in [H + ] variability (Fig. 8e, g, h). Changes in interannual variability contribute most to overall variance change at the global scale (with 42 % contribution). Seasonal variability changes are almost equally important (37 %), and changes in subannual variability also contribute substantially to changes in total variability (20 %). The patterns of variability changes are very similar across the three temporal components of variability. The largest increases in [H + ] variability are simulated north and south of the Equator. These regions tend to be already more variable during the preindustrial period (see Fig. A3a). However, the model also projects an increase in variability for regions that are less variable during the preindustrial period, such as northern high latitudes. All three temporal components of variability are projected to decrease in the tropics and parts of the Southern Ocean. The variability decrease in those regions is most pronounced for interannual variability (Fig. 8e).

Drivers of [H + ] and A variability changes
In this section, we investigate the changes in the drivers that cause the variability changes in [H + ] and A . Drivers are carbon (C T ), alkalinity (A T ), temperature, and salinity. To do so, we attribute changes in [H + ] and A variability to four factors (see Sect. 2.2.3 for further details): (i) changes in the mean states of the drivers that control the sensitivities ( s σ 2 H + ), (ii) changes in the variabilities of the drivers ( σ σ 2 H + ), (iii) simultaneous changes in the mean states and variabilities of the drivers ( sσ σ 2 H + ; this contribution arises because both mean states and variabilities change and cannot be attributed to either (i) or (ii) alone), and (iv) changes in the correlations between the drivers, also including mixed contributions from correlation changes together with mean state and variability changes ( ρ+ σ 2 H + ). In other words, (iv) describes the change in variability that arises because the correlations between the drivers also change, and not only their mean states and variabilities.
The drivers' mean changes between the preindustrial period and 2081-2100 under RCP8.5 cause a strong increase in surface [H + ] variability, which is most pronounced in the high latitudes ( s σ 2 H + ; red line in Fig. 9a, black dashed line in Fig. 9b). On a global average, these variance changes due to the mean changes in the drivers ( s σ 2 H + = 1.3 nmol 2 kg −2 ) are much larger than the total simulated variance change in [H + ] ( σ 2 H + = 0.5 nmol 2 kg −2 , dashed gray or solid black line in Fig. 9a). In general, an increase in mean C T , temperature, and salinity would lead to an increase in s σ 2 H + , whereas an increase in mean A T would lead to a  decrease. The GFDL ESM2M model projects an increase in mean C T over the entire surface ocean (Fig. A5a) due to the uptake of anthropogenic CO 2 from the atmosphere and therefore an increase in s σ 2 H + (light blue line in Fig. 9b). In the high latitudes, a relatively small increase in mean C T leads to a large increase in s σ 2 H + , because [H + ] is more sensitive to changes in C T due to the low buffer capacity there. Decreases in mean A T further contribute to the increase in s σ 2 H + in the high latitudes (green line in Fig. 9b). In the low-to-mid latitudes and in particular in the Atlantic Ocean, mean surface H + ) is decomposed into the contribution from changes in the sensitivities that arise from changes in the drivers' mean values ( s σ 2 H + ), the contribution from changes in the drivers' standard deviations ( σ σ 2 H + ), the contribution from simultaneous changes in the sensitivities and the drivers' standard deviations ( sσ σ 2 H + ), and the contribution from correlation changes alone together with simultaneous changes in correlations and sensitivities and standard deviations ( ρ+ σ 2 H + ) (a). The small mismatch between the sum of the components (black line) and simulated variance change (gray dashed line) arises because the decomposition is based on Eq. (2), which is an approximation to simulated [H + ] variance. The contributions to these components from changes in C T alone (light blue lines); from changes in C T and A T (green lines); and from C T , A T , and temperature (gold lines) are shown in panels (b)-(e). The dashed black lines in panels (b)-(e) show the total components that contain contributions from all four drivers.
A T is projected to increase (Fig. A5) and therefore dampens the overall increase in s σ 2 H + (green line in Fig. 9b). The changes in A T are largely due to changes in freshwater cycling that also manifest in salinity changes (Fig. A5, Carter et al., 2016). Increases in temperature additionally increase s σ 2 H + , mainly in the northern mid-to-high latitudes (gold line in Fig. 9b), but the overall impact of mean changes in temperature, and especially salinity, is small.
Why is the increase in σ 2 H + (gray dashed or black solid line in Fig. 9a) smaller than the increase from the mean changes (i.e., s σ 2 H + ; red line in Fig. 9a)? In the high latitudes, the projected change in the variability of the drivers (Fig. A6) contributes negatively to the [H + ] variability change and counteracts to some degree the increase in s σ 2 H + . These variability changes alone have a small imprint on σ σ 2 H + (blue line in Fig. 9a; black dashed line in Fig. 9c), but the variability changes dampen the increases from the mean changes ( sσ σ 2 H + , magenta line in Fig. 9a, black dashed line in Fig. 9d). The latter contribution is large in the high latitudes, where mean changes alone would lead to a strong increase. In the high latitudes, decreases in C T variability (Fig. A6a) together with increases in mean C T (Fig. A5a) can explain much of the negative contribution from sσ σ 2 H + (light blue line in Fig. 9d). In the northern high latitudes, mean and variability changes in A T are also impor-tant for sσ σ 2 H + (green line in Fig. 9d). The additional contribution from changes in the correlations between the drivers ( ρ+ σ 2 H + ; cyan line in Fig. 9a) also tends to contribute negatively to [H + ] variability changes, especially in the North Atlantic, and changes in correlations with temperature play an important role (gold line in Fig. 9e). In summary, the increase in [H + ] variability at the surface is mainly caused by increases in mean C T attenuated by decreases in C T variability in the high latitudes. Mean changes in A T reinforce the increase in [H + ] variability in the northern high latitudes but dampen the increase in the low latitudes.
At 200 m, the projected increase in σ 2 H + (gray dashed or black solid line in Fig. 10a) is also a result of the large increase due to the mean changes in the drivers ( s σ 2 H + ; red line in Fig. 10a; dashed black line in Fig. 10b) and the decrease due to the interplay between mean changes and decreases in the variability ( sσ σ 2 H + ; magenta line in Fig. 10a, black dashed line in Fig. 10d). Similar to the surface, the changes in mean and variability of C T are the most important drivers of changes (light blue lines in Fig. 10b, d). Increases in mean A T partially compensate for the increase in [H + ] variability due to the increase in mean C T (green lines in Fig. 10b, d). Changes in [H + ] variability due to changes in temperature and salinity are small. In contrast to the surface, the individual compensating contributions to [H + ] variability change from mean changes and simultaneous mean and variability changes in the drivers, in particular those in C T , are much larger at 200 m. The global average variance change due to the mean changes in the drivers ( s σ 2 H + = 3.7 nmol 2 kg −2 ) is much larger than the overall simulated variance change ( σ 2 H + = 0.1 nmol 2 kg −2 ). The contribution from changes in the correlations between the drivers is overall small (cyan line in Fig. 10a) and stems mainly from changes in the correlation between C T and A T (Fig. 10e). Taken together, the increase in [H + ] variability at 200 m mainly arises from the balance between increases in mean C T and decreases in C T variability. Increases in mean A T dampen these changes.
Unlike for [H + ], both mean changes ( s σ 2 ; red lines in Fig. 11) and variability changes in the drivers ( σ σ 2 : blue lines in Fig. 11) lead to a decrease in A variability ( σ 2 ; black dashed lines in Fig. 11). At 200 m, variability changes are even the dominant driver for reductions in A variability. Simultaneous changes in means and variabilities ( sσ σ 2 ; purple lines in Fig. 11) contribute positively and dampen the reduction in A variability from mean and variability changes alone. Mean and variability changes in C T are the main drivers for changes in A variability as indicated by the tight relation between the dashed and solid red, blue, and purple lines in Fig. 11, in particular at 200 m. An exception is the northern high latitudes, where A T changes also play a substantial role at the surface (not shown). Correlation changes in the drivers ( ρ+ σ 2 ; cyan lines in Fig. 11) are of secondary importance and have the largest imprint in the northern midto-high latitudes at the surface.

Discussion and conclusions
We provide a first quantification of the historical and future changes in extreme events in ocean acidity by analyzing daily mean three-dimensional output from a five-member ensemble simulation of a comprehensive Earth system model. In our analysis, we focus on changes in high-[H + ] and low-A extreme events that are defined with respect to a shifting baseline, where changes in extremes arise from changes in daily to interannual variability. Secular trends in the mean state were removed from the model output before analyzing extremes under this approach. We show that such extreme events in [H + ] are projected to become more frequent, longer lasting, more intense, and spatially more extensive under increasing atmospheric CO 2 concentration, both at the surface and also within the thermocline. Under RCP2.6, the increase in these extreme-event characteristics is substantially smaller than under RCP8.5. The increase in [H + ] variability is a consequence of increased sensitivity of [H + ] to variations in its drivers. It is mainly driven by the projected increase in mean C T and additionally altered by changes in C T variability and A T mean and variability as well as changes in the correlations between the drivers. In contrast to [H + ], variability of A is projected to decline in the future. Therefore, extreme events in A are projected to become less frequent in the future when defined with respect to a shifting baseline. The reason for the decline in variability is that A , unlike [H + ], becomes less sensitive to variations in the drivers with the mean increase in C T . Furthermore, the projected reductions in the drivers' variabilities, mainly in C T , further reduce A variability.
The analysis of extreme events defined with respect to fixed preindustrial percentiles reveals that the secular trends in [H + ] and A are so large that they lead to year-round or almost-year-round extreme events in the upper 200 m over the entire globe by the end of the 21st century, even under the low-emission scenario RCP2.6. Extreme events are no longer temporally and spatially bounded events that arise due to the chaotic nature of the climate system but describe a permanent new state. Under the fixed baseline approach, the relative contribution of changes in variability or higher moments of the distribution to the changes in the number of extremes is small. For example, the number of yearly extreme days for surface [H + ] over the 1986-2005 period under the shiftingbaseline approach is only 3.8 % of that when defining the extreme events with respect to a fixed preindustrial baseline. This fraction differs regionally, reaching more than 10 % in the North Pacific, the North Atlantic, and the Arctic Ocean. However, we recall here that the changes in the number of [H + ] extremes when defined with respect to a shifting baseline are large. These changes in variability may need to be taken into account when assessing the impacts of ocean acidity changes on marine organisms, especially when organisms are likely to adapt to the long-term mean changes but not to changes in variability.
We use the 99th percentile of the distribution from a preindustrial simulation for the definition of the extreme [H + ] events (i.e., a one-in-a-hundred-days event at preindustrial levels), but the results may depend on the choice of this threshold. We tested the sensitivity of our results under the shifting-baseline approach by using also the 99.99th percentile threshold (i.e., a one-day-in-27.4-years event at preindustrial levels). The relative increase in the numbers of extreme [H + ] days per year is larger for these rare extremes (Fig. 12). For example, nearly every second day with [H + ] exceeding the 99th percentile (red solid lines in Fig. 12) is also a day with [H + ] exceeding the 99.99th percentile (red dotted lines in Fig. 12) by the end of the 21st century under RCP8.5, both at the surface and at depth. As a result of this large relative increase in rare extremes, the model projects as many days with [H + ] exceeding the 99.99th percentile by the end of the century under RCP8.5 (red dotted lines in Fig. 12) as it projects days exceeding the 99th percentile under RCP2.6 (blue solid lines in Fig. 12).
The projected increase in [H + ] variability and decrease in A variability also alters the occurrence of extreme events based on absolute thresholds. An often used threshold is A = 1, below which seawater is corrosive with re- H + ) is decomposed into the contribution from changes in the sensitivities that arise from changes in the drivers' mean values ( s σ 2 H + ), the contribution from changes in the drivers' standard deviations ( σ σ 2 H + ), the contribution from simultaneous changes in the sensitivities and the drivers' standard deviations ( sσ σ 2 H + ), and the contribution from correlation changes alone together with simultaneous changes in correlations and sensitivities and standard deviations ( ρ+ σ 2 H + ) (a). The contributions to these components from changes in C T alone (light blue lines); from changes in C T and A T (green lines); and from C T , A T , and temperature (gold lines) are shown in panels (b)-(e). The dashed black lines in panels (b)-(e) show the total components that contain contributions from all four drivers. Figure 11. Decomposition of A variability changes into different drivers. The simulated zonal mean contribution to variance changes in A (black dashed lines, σ 2 ) from the preindustrial period to 2081-2100 (RCP8.5) at the surface (a) and at 200 m (b). Shown is the contribution from sensitivity changes (due to mean changes in the drivers) (red lines, s σ 2 ), standard deviation changes in the drivers (blue lines, σ σ 2 ), simultaneous changes in sensitivities and standard deviations (purple lines, sσ σ 2 ), and all contributions that involve changes in the drivers' correlations (cyan lines, ρ+ σ 2 ). Furthermore, contributions from mean changes, standard deviation changes, and simultaneous mean and standard deviation changes in C T alone are shown (dashed red, blue, and purple lines, respectively). spect to the calcium carbonate mineral aragonite (Morse and Mackenzie, 1990). We assess the influence of the general decline in A variability at the time where a grid cell falls below A = 1 for the first time. To do so, we compare these times within the historical and RCP8.5 ensemble to those for the hypothetical case where A variability stays at the preindustrial level but mean A undergoes the ensemblemean evolution. We find that the decline in A variability, which is observed in the historical and RCP8.5 ensemble, leads to an average delay of the first occurrence of undersaturation by about 11 years at the surface and about 16 years at 200 m. At the surface, these delays of undersaturation occur throughout the high latitudes (Fig. 13a). At depth, the delays are most pronounced in the tropics (Fig. 13b), but delays also occur in the high latitudes. Assuming unchanged seasonality, McNeil and Matear (2008) found that seasonal aragonite undersaturation of surface waters in the Southern Ocean may occur 30 years earlier than annual mean aragonite undersat- uration. However, our simulation shows that the reduction in A variability delays the onset of undersaturation by about 10 to 15 years in the Southern Ocean relative to a hypothetical simulation where variability does not change. Therefore, changes in variability need to be taken into account when projecting the onset of seasonal undersaturation, especially in the high latitudes and in the thermocline of the tropics.
Previous studies have shown that the seasonal cycle of surface ocean pCO 2 will be strongly amplified under increasing atmospheric CO 2 (Gallego et al., 2018;Landschützer et al., 2018;McNeil and Sasse, 2016) and that a similar amplification is expected for surface [H + ] (Kwiatkowski and Orr, 2018). Here we show that the changes in the seasonal cycle of [H + ] translate into large increases in short-term extreme acidity events at the surface as well as at 200 m, when these events are defined with respect to a shifting baseline. In addition to earlier studies, we also show that changes in subannual variability, which are only partially resolved by monthly mean data, contribute to changes in extreme events in [H + ] under increasing atmospheric CO 2 . Furthermore, we show that the average duration of extreme events at the surface and in recent past conditions (1986)(1987)(1988)(1989)(1990)(1991)(1992)(1993)(1994)(1995)(1996)(1997)(1998)(1999)(2000)(2001)(2002)(2003)(2004)(2005) is about 15 d. To resolve such events that last for days to weeks, it is necessary to use daily mean output. Currently, ocean carbonate system variables from models that participate in the sixth phase of the Coupled Model Intercomparison Project are routinely stored with a monthly frequency on the Earth system grid . We therefore recommend storing and using high-frequency output to study extreme events in the ocean carbonate systems.
Even though we consider our results as robust, a number of potential caveats remain. First, the horizontal resolution of the ocean model in the GFDL ESM2M model is rather coarse and cannot represent critical scales of smallscale circulation structures (e.g., Turi et al., 2018). In addition, the biogeochemical processes included in the GFDL ESM2M model are designed for the open ocean but do not capture the highly variable coastal processes (Hofmann et al., 2011). High-resolution ocean models with improved pro-cess representations are therefore needed to explore variability in ocean carbonate chemistry, especially in coastal regions and smaller ocean basins, such as the Arctic (Terhaar et al., 2019a, b). Observation-based carbonate system data with daily mean resolution would also be necessary to thoroughly evaluate the models' capability to represent day-today variations in carbonate chemistry. Secondly, our results, in particular at the local scale, might depend on the model formulation. As the mean increases in C T mainly drive the increases in [H + ] variability (see Fig. 9b), we expect that models with larger oceanic uptake of anthropogenic carbon show larger increases in [H + ] variability than models with lower anthropogenic carbon uptake. The GFDL ESM2M model matches observation-based estimates of historical global anthropogenic CO 2 uptake relatively well but still has difficulties in representing the regional patterns in storage (Frölicher et al., 2015). Therefore, the exact regional patterns of C T changes may differ from model to model. Further studies focusing on the physical processes that lead to the regional C T changes may help to better constrain the regional patterns in variability changes. In addition, it is currently rather uncertain how [H + ] and A variability changes as a result of changes in the drivers' variabilities. We have demonstrated that this factor is particularly important for A and for [H + ] at depth. It is well known that current Earth system models have imperfect or uncertain representations of ocean variability over a range of timescales (Keller et al., 2014;Resplandy et al., 2015;Frölicher et al., 2016). A possible way forward would be to assess variability changes and changes in ocean acidity extreme events within a multimodel ensemble, which would likely provide upper and lower bounds. Finally, it is assumed that physical and biogeochemical changes in the ocean will also increase diurnal variability. In particular in coastal areas, such diurnal variations can have amplitudes that are much larger than the projected changes over the 21st century (Hofmann et al., 2011). However, the GFDL ESM2M model does not fully resolve the diurnal variability. Future studies with Earth system models that resolve diurnal processes are needed to quantify changes in diurnal vari- Figure 13. The difference in years between the first occurrence of aragonite undersaturation in the historical and RCP8.5 ensemble and a hypothetical simulation where variability does not change over the 1861-2100 period, but only the mean changes. Positive values (yellow and red) indicate a delayed onset of undersaturation resulting from declines in A variability. ability and the impacts of these changes on extreme acidity events.
Our results may also have important consequences for our understanding of the impacts of ocean acidification on marine organisms and ecosystems. The projected increase in the frequency and the duration of ocean acidity extremes implies that marine organisms will have less time to recover from high-[H + ] events in the future. Organisms that cannot adapt to the large long-term changes in mean [H + ] will likely be the most impacted. However, even if organisms may be able to adapt to the long-term increase in [H + ], the large projected increase in [H + ] extreme events due to changes in variability may push organisms and ecosystems to the limits of their resilience, especially those organisms that are commonly accustomed to a more steady environment. The risks for substantial ecosystem impacts are aggravated by the fact that the frequency and intensity of marine heatwaves are also projected to substantially increase , which also negatively impact marine ecosystems (Wernberg et al., 2016;Smale et al., 2019). The interactions of intensified multiple stressors have the potential to influence marine ecosystems and the ocean's biogeochemical cycles in an unprecedented manner (Gruber, 2011). However, further research is needed to understand the combined impacts of short-term ocean acidity extremes and marine heatwaves on marine ecosystems.
In conclusion, our analysis shows that [H + ] and A in the upper 200 m are projected to be almost permanently under extreme conditions by the end of the 21st century when extremes are defined relative to preindustrial baselines. Even when accounting for the changes in the long-term mean, short-term extreme events in [H + ] are projected to become more frequent, to last longer, to be more intense, and to cover larger volumes of seawater due to increases in [H + ] variability, potentially adding to the stress on organisms and ecosystems from the long-term increase in ocean acidity. In Sect. 3.2 and 3.3, we analyze the changes in extreme events in [H + ] and A that arise from day-to-day to interannual variability changes in these variables. We therefore need to remove the secular trends from the data prior to analysis. We estimate the secular trend in a simulation from the five-member ensemble mean, assuming that subannual and interannual to decadal variations in the individual ensemble members are phased randomly and do not imprint on the ensemble mean because they average out. A larger ensemble size would be necessary for this assumption to perfectly hold. However, this potential source of error does not qualitatively alter our results. We remove the seasonal cycle, here defined as the 365 d long mean evolution over the course of a year, from the ensemble mean by smoothing the ensemble mean with a 365 d running mean filter, i.e., by calculating the convolution of the time series with a rectangular window of length 365 and height 1/365. This filter also removes variability on subannual and interannual timescales and thereby also reduces the error we make due to the small ensemble size that is discussed above. We then subtract the runningmean-filtered ensemble mean from the five ensemble members to remove the secular trend in the individual ensemble members. 3.3 5.0 (3.9-6.7) 7.9 (6.1-11.1) 6.0 (2.9-9.1) Volume (×10 3 km 3 ) 3.6 3.2 (2.9-3.5) 3.7 (3.0-4.2) 3.4 (3.1-3.7) Figure A1. Simulated globally averaged changes in [H + ] extreme events defined with respect to the fixed preindustrial baseline. Shown are changes over the 1861-2100 period following historical (black lines) and future RCP8.5 (red) and RCP2.6 (blue) scenarios for maximal intensity at the surface (       The spectral density describes how the variance in a time series is distributed over different frequencies ν j . It is proportional to the absolute value squared of the discrete Fourier transformation (DFT) of the time series. Defining the spectral density only for positive frequencies, it is given by with N the number of time steps, x k the values of the time series at each time step, t the time interval between two time steps, T = N · t, and the frequencies ν j = j/T . The autocovariance is the inverse Fourier transform of the spectral density (Wiener-Khintchine theorem, Chatfield, 1996).
In the continuous case, the theorem states with the autocovariance function γ (τ ) and the spectral densityf defined for positive and negative frequencies. Since the two-sided spectral density,f , is a real and even function, one can also use γ (τ ) = ∞ 0 f (ν) cos(2π ντ )dν (B3) with the one-sided spectral density f = 2 ·f that is used in this text. As a consequence, the variance within the time series, given by the autocovariance at lag zero, is obtained by integrating the spectral density over all positive frequencies, σ 2 = ∞ 0 f (ν)dν. For a discrete time series, where the maximal resolved frequency is given by ν max = 1/2 t, the identity reads Based on this equation, one can separate the contributions to variance from low-frequency and high-frequency variations.
In this study, we determine interannual variability and subannual variability. Interannual variability is calculated by summing over the contributions to variance from all frequencies up to a cycle of once per year, i.e., by evaluating the sum up to i cut for which ν cut = 1/365 d −1 . Accordingly, subannual variability is obtained by evaluating the sum from i cut + 1 to N/2. Prior to this separation, the seasonal variability is removed from the data by subtracting the 365 d climatology.
and the six pairwise correlation coefficients, in matrix notation given by (C3) Based on this notation, we can rewrite Eq.
(2) of the main text as s i s j σ i σ j ρ ij .
We use Eq. (C4) and decompose the variability change between the preindustrial period and 2081-2100 into the contributions from changes in s, σ , and ρ based on a Taylor expansion. Since [H + ] variance represented by Eq. (C4) is a polynomial of fifth order in these variables, its Taylor series has five nonvanishing orders. We use the drivers' standard deviations instead of their variances for the decomposition. With the latter, the Taylor expansion would have infinite terms and could not be decomposed exactly as it is done in the following. However, it would asymptotically lead to the same decomposition of [H + ] variance change into s σ 2 H + , σ σ 2 H + , sσ σ 2 H + , and ρ+ σ 2 H + that is presented below. Furthermore, it should be noted that the resulting decomposition of [H + ] variance change only approximates the simulated variance change because it is based on Eq. (C4), which itself is based on a first-order Taylor expansion of [H + ] with respect to the drivers.
In the following, all terms of the Taylor series are given. We denote the sum of first-order terms that contain changes in the four sensitivities s 1,...4 by (1) s σ 2 H + , the sum of second-order terms that contain changes in the sensitivities and standard deviations by (2) sσ σ 2 H + , and so on.
Appendix D: Comparison of simulated ensemble-mean trends in seasonal amplitude to observation-based trends We construct confidence intervals for the observation-based slope estimates following Hartmann et al. (2013). For the simulations, we use the arithmetic average of the five ensemble-member slope estimates,b k , as the estimator, b = 1 5 5 k=1b k , with estimated variancê We then construct the confidence interval forb as with q the (1 + p)/2 quantile (we use p = 0.9) of the t distribution with 5 · (N − 2) degrees of freedom. We correct the sample size N (34, the number of years we use for the fits) to a reduced sample size N r when we find positive lag-one autocorrelation in the residuals of the fits (data -linear regression model). Lag-one autocorrelation is estimated as the average of the five ensemble-member lag-one autocorrelation estimates, and we obtain N r = N · (ρ − 1)/(ρ + 1). Positiveρ is only found in the northern high latitudes. This is in contrast to the observation-based case, where we find large positiveρ o (up to 0.7) in the residuals of all latitude bands besides the tropical region.
For testing the significance of a difference between the simulation slope estimateb and the observation-based estimateb o , we use Welch's test, which assumes different variances for the two estimates (Andrade and Estévez-Pérez, 2014). The variance of the simulation slope estimate is calculated by dividing the ensemble-averaged slope variance by the ensemble size (Eq. D2) and is hence smaller than the observation-based slope variance. If the absolute value of the test statistiĉ is larger than the (1 + p)/2 quantile of the t distribution with (Andrade and Estévez-Pérez, 2014) degrees of freedom, we consider the observation-based and simulation slope to be different from each other with a confidence level of p = 0.9.