Transient Earth system responses to cumulative carbon dioxide emissions: linearities, uncertainties, and probabilities in an observation-constrained model ensemble

. Information on the relationship between cumulative fossil CO 2 emissions and multiple climate targets is essential to design emission mitigation and climate adaptation strategies. In this study, the transient response of a climate or environmental variable per trillion tonnes of CO 2 emissions, termed TRE, is quantiﬁed for a set of impact-relevant climate variables and from a large set of multi-forcing scenarios extended to year 2300 towards stabilization. An ∼ 1000-member ensemble of the Bern3D-LPJ carbon–climate model is applied and model outcomes are constrained by 26 physical and biogeochemical observational data sets in a Bayesian, Monte Carlo-type framework. Uncertainties in TRE estimates include both scenario uncertainty and model response uncertainty. Cumulative fossil emissions of 1000 Gt C result in a global mean surface air temperature change of 1.9 ◦ C (68 % conﬁdence interval (c.i.): 1.3 to 2.7 ◦ C), a decrease in surface ocean pH of


Introduction
How multiple climate targets are related to allowable CO 2 emissions provides basic information to design policies aimed to minimize severe or irreversible damage from anthropogenic climate change (Steinacher et al., 2013).The emission of carbon dioxide from burning of fossil fuels is by far the most dominant driver of the ongoing anthropogenic climate change and of ocean acidification (IPCC, 2013;Gattuso et al., 2015).The increase in a broad set of climate variables such as atmospheric carbon dioxide (CO 2 ), CO 2 radiative forcing, global air surface temperature, or ocean acidification depends on cumulative CO 2 emissions (Allen et al., 2009;IPCC, 1995).It is thus informative to quantify the link between cumulative, total CO 2 emissions, and different climate variables.It is advantageous to represent a climate target, such as the United Nations' 2 • C global mean surface air temperature target, in terms of allowable total CO 2 emissions because this is an easily communicable emission mitigation goal.While the link between cumulative CO 2 emissions and global mean surface air temperature has been extensively studied (IPCC, 2013), relatively little attention has been paid to the relationship between cumulative CO 2 emissions and other impact-relevant variables such as ocean acid-ification or sea level rise (Zickfeld et al., 2012;Herrington and Zickfeld, 2014).However, considering the link to emissions for other variables and from the global to the regional scale appears important as many impact-relevant changes are not directly related to global mean surface air temperature.
It is also important to quantify the uncertainty in these links with CO 2 emissions by using probabilistic, observation-constrained approaches or multi-model ensembles.This enables one to establish a budget for the amount of allowable carbon emissions if a given climate target or a set of targets is to be met with a given probability.Such budgets in probabilistic terms have been established for surface air temperature, but only recently for a set of multiple climateimpact-relevant variables (Steinacher et al., 2013).
A climate target that is currently recognized by most world governments (United Nations, 2010) places a limit of 2 • C on the global mean warming since preindustrial times.An objective of the recent Paris agreement (United Nations, 2015) is to hold "the increase in the global average temperature to well below 2 • C above pre-industrial levels and to pursue efforts to limit the temperature increase to 1.5 • C above pre-industrial levels".This target emerged from the international negotiation process following the United Nations Framework Convention of Climate Change (United Nations, 1992) that entered into force in 1994.However, the United Nations Framework Convention of Climate Change has multiple objectives.It calls for the avoidance of dangerous anthropogenic interference within the climate system as well as allowing for ecosystems to adapt naturally to climate change, ensuring food production, and enabling sustainable economic development.These objectives cannot be encapsulated in one single target, e.g., a global mean surface air temperature target, but may require multiple targets.These may be specific for individual regions and components of the climate system, which includes the atmosphere, hydrosphere, biosphere, and geosphere and their interactions (United Nations, 1992).For example, targets may include bounds for sea level rise, ocean acidification, and ocean warming that threatens marine ecosystem functioning and services (IPCC, 2014;Gattuso et al., 2015;Howes et al., 2015).Ocean acidification is, like global warming, progressing with anthropogenic CO 2 emissions but, unlike global warming, largely independent of the emissions and atmospheric abundance of non-CO 2 forcing agents.It is thus expected that the quantitative link to cumulative CO 2 emissions is different for ocean acidification variables, e.g., surface ocean pH, than for global mean surface air temperature.In general, the quantitative relationship to emissions and its uncertainty ranges are distinct for different individual target variables.
Climate projections are associated with two fundamentally distinct types of uncertainties (e.g.Hawkins and Sutton, 2009).First, the scenario uncertainty arises from the fact that future anthropogenic emissions are not known because they depend largely on human actions and decisions, such as climate policies, technological advances, and other socioeco-nomic factors.Second, limitations in our process understanding likely lead to differences between the simulated response to emissions and the response of the actual system the model is intended to describe.This constitutes an additional uncertainty, termed the model or response uncertainty.
Well-defined metrics that summarize the Earth system response to a given forcing by a single or a few values are useful in many aspects.They allow one to quantify the response uncertainty and to compare results from different sources, such as ensemble model simulations, model intercomparisons, or observation-based estimates.Due to their relative simplicity, metrics also ease the communication among scientists and between scientists, stakeholders, and the public.The transient climate response (TCR) and the equilibrium climate sensitivity (ECS) are such metrics, which are used to quantify the global mean surface air temperature (SAT) change associated with a doubling of atmospheric CO 2 (e.g.Knutti and Hegerl, 2008).The TCR measures the short-term response (i.e., the temperature increase at the time of doubling atmospheric CO 2 in a simulation with 1 % yr −1 increase), while the ECS quantifies the long-term response after reaching a new equilibrium of the system under the increased radiative forcing.TCR and ECS are metrics for the physical climate system and they do not depend on the carbon-cycle response (e.g.Huber and Knutti, 2014;Kummer and Dessler, 2014).TCR and ECS both depend on multiple physical feedbacks such as the water vapor, the icealbedo, or the cloud feedbacks.TCR also depends on the rate of ocean heat uptake.ECS itself does not depend on the rate of ocean heat uptake, while observationally constrained estimates of ECS do.
Certain metrics are helpful to reduce the scenario dependency of results, which may facilitate the communication in a mitigation policy context (Allen et al., 2009).One such metric is the response to a pulse-like emission of CO 2 and other forcing agents as applied to compute global warming potentials used in the greenhouse gas basket approach of the Kyoto protocol (Joos et al., 2013;Myhre et al., 2013).Another metric is the transient climate response to cumulative CO 2 emissions (TCRE), which links the global mean surface air temperature increase to the total amount of CO 2 emissions.In addition to the physical climate response, these metrics also depend on the response of the carbon cycle.TCRE is a useful metric because it has been shown that global warming is largely proportional to cumulative CO 2 emissions and almost independent of the emission pathway (Allen et al., 2009;Matthews et al., 2009;Zickfeld et al., 2009;IPCC, 2013;Gillett et al., 2013).It essentially represents the combination of the TCR and the cumulative airborne fraction of CO 2 (Gregory et al., 2009;Collins et al., 2013).More recently, additional metrics, i.e., the equilibrium and multimillennial climate response to cumulative CO 2 emission, have been proposed to evaluate the long-term link between global mean surface air temperature and emission (Frölicher and Paynter, 2015).
There is an apparent discrepancy between the TCR estimated with the most recent set of Earth system models (ESMs) versus some recent studies that invoke observational constraints (Otto et al., 2013) and simplified models (Schwartz, 2012;Collins et al., 2013).These latter studies suggest the possibility of a TCR below 1 • C, i.e., outside the very likely range given in the Fourth Assessment Report of the IPCC (Collins et al., 2013).Shindell (2014b, a) suggests that there are biases in simple models that do not adequately account for the spatial distribution of forcings.Shindell found by analyzing ESM output that the transient climate sensitivity to historical aerosol and ozone forcing is substantially greater than to CO 2 forcing due to their spatial differences.Taking this into account resolves the discrepancies in TCR estimates.Stainforth (2014) concluded from the study by Shindell (2014b) that probabilistic 21st century projections based on simple models and observational constraints under-weight the possibility of high impacts and over-weight low impacts on multi-decadal timescales.Huber and Knutti (2014) find that the TCR and ECS of the ESMs are consistent with recent climate observations when natural variability and updated forcing data are considered.Kummer and Dessler (2014) concluded that considering a ≈ 33 % higher efficacy of aerosol and ozone forcing than for CO 2 forcing would resolve the disagreement between estimates of ECS based on the 20th century observational record and those based on climate models, the paleoclimate record, and interannual variations.Van der Werf and Dolman (2014) applied a multiple regression approach using historical temperature and radiative forcing data to find that recent temperature trends are influenced by natural modes of variability such as the Atlantic Multidecadal Oscillation.They estimated TCR to be above 1 • C using century-long records.However, an updated probabilistic quantification of the TCR, ECS, and TCRE with a spatially explicit model and constrained by a broad set of observations is missing.
The goals of this study are (i) to establish the relation between cumulative CO 2 emissions and changes in illustrative, impact-relevant Earth system variables; (ii) to quantify TCRE, TCR, and ECS; and (iii) to establish the response of different Earth system variables to an emission pulse, i.e., the impulse response function (IRF).In analogy to TCRE, we introduce a new metric, the transient response to cumulative CO 2 emissions (TRE).TRE X is the change in a climate variable, X, in response to cumulative CO 2 emissions of 1000 Gt C. To this end, we analyzed TRE for variables that we deemed impact-relevant and also reasonably well represented in our model including surface air temperature, sea surface temperature, steric sea level change, ocean acidity, carbon storage in soils, and ocean overturning.
The link and the linearity between the responses in the different variables and cumulative CO 2 emissions is investigated in a structured way with an observation-constrained model ensemble and a large set of emissions scenarios.This allows us to address not only the scenario uncertainty but also the model uncertainty.We quantify uncertainties related to specific greenhouse gas emission trajectories, i.e., scenario uncertainty, by analyzing responses to CO 2 emission pulses as well as to a set of 55 scenarios representing the evolution of carbon dioxide and other radiative agents.The response uncertainties for these scenarios are quantified with an ∼ 1000-member model ensemble constrained by 26 observational data sets in a Bayesian, Monte Carlo-type framework with an ESM of intermediate complexity (EMIC).The model features spatially explicit representations of land-use forcing, vegetation, and carbon dynamics, as well as physically consistent surface-to-deep transport of heat and carbon by a 3-D, dynamic model ocean, thereby partly overcoming deficiencies identified for box-type models used in earlier probabilistic assessments (Shindell, 2014b, a).This allows us also to reassess the probability density distribution, including best estimates and confidence ranges, for the ECS, the TCR, and the TCRE.
This paper is structured in the following way.In the Methods section, first the modeling framework is introduced (Sect.2.1).Specific subsections deal with model parameter selection and sampling (Sect.2.1.1),observational constraints and the calculation of model skill scores (Sect.2.1.2),the procedure for model spin-up (Sect.2.1.3),and scenario choices and model simulations (Sect.2.1.4).The following sections then cover the definition of TCRE and TRE X (Sect.2.2), the calculation of probability density functions (PDFs; Sect.2.3), and how the linearity of the responses to cumulative CO 2 emissions is tested (Sect.2.4).Finally, we discuss the selection of the analyzed climate variables (X) in Sect.2.5.In the results section, we first discuss the response in various climate variables to CO 2 emission pulses of various magnitude to gain insight regarding to what extent we may expect linearity in the response to emissions (Sect.3.1).In Sect.3.2, we present results for the TRE of the global mean surface air and surface ocean temperatures, steric sea level rise (SSLR), the Atlantic meridional overturning circulation (AMOC), global mean surface ocean pH, the saturation of surface waters in the Southern Ocean and the tropics with respect to calcium carbonate, as used to build coral reefs and shells and other structures of marine organisms, and finally global soil carbon stocks.In Sect.3.3, results for the transient and equilibrium climate sensitivity are presented.Discussion and conclusions complete the paper.
We rely here on simulations presented by Steinacher et al. (2013) as described in the following subsections and illustrated in Fig. 1.Uncertainties in physical and carbon-cycle model parameters, radiative efficiencies, climate sensitivity, and carbon-cycle feedbacks are taken into account by varying 19 key model parameters to generate a model ensemble with 5000 members (Appendix Table A1).Each ensemble member is assigned a skill score based on how well the model version is able to represent the observational constraints.This skill score is used as a weight to compute PDFs and ensemble means for different model outcomes.

Model parameter sampling
Nineteen model parameters are sampled for the generation of the model ensemble (Table A1).The selection of these parameters has to balance computational costs vs. maximum coverage of the parameter space that is relevant for the model variables we are interested in.
Three parameters are sampled from the energy and moisture balance model of the atmosphere.Most importantly, the nominal ECS determines the equilibrium warming per change in radiative forcing.Technically, this is implemented by translating a given value for ECS to a value for the feedback parameter λ (Ritz et al., 2011a, b) using a calibration curve.λ accounts for all feedbacks in the model that are not explicitly resolved.Diffusivity coefficients, diff zonal and diff merid,scale , control the depth-integrated heat fluxes (Ritz et al., 2011a, b).The uniform zonal diffusivity is specified directly and diff merid,scale is a scaling factor for the latitudedependent meridional diffusivity.
The selection of the most relevant parameters for terrestrial photosynthesis, hydrology, vegetation dynamics, soil organic matter decomposition, and turnover is largely based on a previous study by Zaehle et al. (2005).They analyzed an earlier version of the model by sampling 36 parameters and identified the most important ones in controlling carbon fluxes and pool sizes.Perhaps not surprisingly, the most influential parameters directly govern either the input flux of carbon into a carbon pool or the timescale of carbon overturning for individual pools.
Four parameters are sampled that govern carbon assimilation and transpiration of water.These are a scaling parameter to upscale assimilation from the leaf to the canopy level (α a ), the intrinsic quantum efficiency of CO 2 uptake for C3 plants (α C3 ), a shape parameter specifying the degree of colimitation by light and RuBisCO activity (θ ), and a parameter that influences the link between canopy conductance and evapotranspiration (g m ), and thereby soil hydrology and water limitation of photosynthesis.These parameters were identified as the four most important ones controlling net primary production and heterotrophic respiration and they are among the eight most important parameters controlling carbon pool sizes (Zaehle et al., 2005).
Two parameters are sampled that control the turnover of carbon in vegetation.These are the timescale governing the conversion of sapwood to heartwood (τ sapwood ) and the maximum mortality rate of trees (mort max ).Four parameters are sampled that govern the carbon turnover in mineral soils.The fractions f soil and f slow determine how much decomposing litter enters the fast and slow overturning soil pools and how much is released directly to the atmosphere.k soil,scale is a global scaling factor applied to the spatial and temporal variable decomposition rates of organic carbon in the fast and slow soil pools.Litter and soil decomposition rates depend on soil temperature and thus are influenced by global warming.The parameter governing the temperature sensitivity of these rates is also sampled.Finally, C peat,scale determines the initial amount of carbon stored in northern peatlands.
Three parameters are sampled from the Bern3D ocean component.diff dia and diff iso are the diapycnal and isopycnal diffusivities that control the ocean circulation and thus the transport and vertical mixing of heat, carbon, and other tracers (Müller et al., 2006;Schmittner et al., 2009).k gas,scale is a scaling factor applied to the OCMIP-2 air-sea gas transfer velocity field (Müller et al., 2008) and affects the oceanic uptake of anthropogenic carbon.The ocean carbonate chemistry and marine biology parameters are not perturbed in this study in order to save computational costs.The ocean chemistry is very well understood and the relevant parameters are already well constrained (Dickson, 2002).The marine biology parameters are considered of secondary importance for this study, and when compared to the parameters affecting the physical transport and uptake of anthropogenic carbon (Joos et al., 1999;Plattner et al., 2001;Heinze, 2004;Gangstøet al., 2008;Kwon et al., 2009).
Finally, two parameters are sampled to modulate the radiative forcing from well-mixed greenhouse gases (RF GHG,scale ) and aerosols (RF aerosol,scale ).They are applied as scaling factors to the prescribed time series (or to the simulated radiative forcing in the case of atmospheric CO 2 ) and reflect the uncertainties given by Forster et al. (2007).
We generate a 5000-member ensemble from the prior distributions of those 19 key model parameters using the Latin  et al., 1979).The prior distributions are selected such that the median matches the standard model configuration and the standard deviation is one-fourth of the plausible parameter range based on the literature and/or expert judgment (Table A1).Normal prior distributions are chosen for ranges that are basically symmetric with respect to the standard parameter value and log-normal priors are used for asymmetric ranges.

Observational constraints and the computation of skill scores
Twenty-six observation-based data sets are used to constrain the model results including projected Earth system changes, allowable carbon emissions to meet a climate target, or metrics such as the transient and equilibrium climate sensitivity.
A single skill score is computed by comparing observations and model outcomes for each ensemble member and across all data sets.The data sets are organized in a hierarchical structure to balance the weight of individual data sets and groups of data.The skill scores are used to weight results from individual ensemble members for the computation of ensemble mean and uncertainties (PDF).Figure A1 summarizes the observation-based data sets and their hierarchical arrangement to compute skill scores (adapted from Fig. S3 and Table S2 in Steinacher et al., 2013).
The observational data sets combine information from satellite, ship-based, ice-core, and in situ measurements to probe both the mean state and transient responses in space and time.The energy balance and its change over time is probed by annual mean time series of Southern and Northern Hemisphere temperature from 1850 to 2010 (Brohan et al., 2006), upper (0-700 m) ocean heat content anomalies from 1955 to 2011 (Levitus et al., 2012) and from 1993 to 2008 (Lyman et al., 2010), and the ocean heat uptake over the period 2005 to 2010 (von Schuckmann and Le Traon, 2011).The atmospheric carbon balance is probed by the reconstructed atmospheric CO 2 history from ice cores (1850to 1958;Etheridge et al., 1996) and direct atmospheric measurements (1959 to 2010; Keeling and Whorf, 2005;Conway and Tans, 2011) as well as global and temporal means of net carbon uptake by the land and by the ocean for the periods 1959periods to 2006periods , 1990periods to 1999periods , and 2000periods to 2006periods (Canadell et al., 2007)).Oceanic processes, which are key for the uptake of both heat and carbon, are probed using gridded data from the World Ocean Atlas (Locarnini et al., 2010;Antonov et al., 2010;Garcia et al., 2010) and the Global Ocean Analysis Project (GLODAP; Key et al., 2004); surface fields and whole ocean fields are considered separately for individual tracers.Ocean temperature (T ) and salinity (S) fields probe the water mass distribution and T and S influences CO 2 solubility and carbonate chemistry.The transient tracer CFC-11 (distribution for 1995) and radiocarbon (preindustrial) probe the ventilation timescales and thus the surface-to-deep transport rates for carbon, heat, and other tracers.The marine bio-logical cycle is probed by comparing modeled with observed fields of the major nutrient phosphate, as well as dissolved inorganic carbon (preindustrial) and alkalinity (1995).Temperature, salinity and phosphate fields from the World Ocean Atlas include seasonal variations in the upper ocean.Land biosphere processes are constrained by comparing modeled and observation-derived carbon stocks and fluxes.Vegetation carbon stock data include two different data sets for about 140 sites each (Luyssaert et al., 2007;Keith et al., 2009) and an estimate for the global preindustrial inventory (550 ± 200 Gt C; Prentice et al., 2001).Gridded soil carbon fields for low and mid-latitudes (south of 50 • N; Global Soil Data Task Group, 2000) and high-latitude North America (Tarnocai et al., 2009(Tarnocai et al., , 2007) ) and an estimate for the global soil carbon content in the top 100 cm (1950 ± 550 Gt C;Batjes, 1996) are used for soil carbon pools.Net primary productivity is probed using observation-based estimates from around 80 sites (Olson et al., 2001) and 140 sites (Luyssaert et al., 2007), as well as a gridded seasonal climatology of the fraction of absorbed photosynthetic active radiation (Gobron et al., 2006).Finally, we probe the seasonal cycle of the net terrestrial carbon balance by prescribing modeled net landto-atmosphere fluxes in the TM2 transport model to compute the average seasonal cycle of atmospheric CO 2 at nine sites as monitored by the GLOBALVIEW atmospheric CO 2 network (GLOBALVIEW-CO2, 2011).
A hierarchical structuring of the data sets is applied for the computation of the skill scores.Individual data sets consist of single numbers, site data, time series, and gridded two-and three-dimensional fields.The number of values included in a data set ranges from one to many thousands.In addition, different data sets sometimes probe closely related quantities.It is thus necessary to implement a formalism to avoid that the data sets with the largest number of data dominate the outcome.The task is to attribute a weight to each individual data set that is appropriate in comparison with the other data sets.Here, this is done by organizing the data in a hierarchical structure for aggregating the scores of individual data sets to the total score.We consider four main data groups probing the energy balance, termed "heat" in Appendix Fig. A1; the atmospheric carbon balance, "CO 2 "; ocean processes and inventories, "ocean"; and land biosphere fluxes and stocks, "land".Each of these groups has the same weight for the computation of the overall skill scores.The individual data sets are further arranged in additional subgroups.
From the simulation results over the historical period ("mod") and the set of observational constraints ("obs"), we assign a score to each ensemble member m as S m ∝ ).This likelihood-type function basically corresponds to a Gaussian distribution of the datamodel discrepancy (X mod m − X obs ) with zero mean and variance σ 2 .The overbar indicates that the error-weighted datamodel discrepancy is first averaged over all data points of each observational variable (volume-or area-weighted) and then aggregated in the hierarchical structure by averaging variables belonging to the same group.Out of the 5000 ensemble members, 3931 contribute less than a percent to the cumulative skill m S m of all members m and are not used any further.
The variance σ 2 represents the combined observational error and model discrepancy and needs to be specified.The model discrepancy is the inherent model error that cannot be eliminated even with the best parameter settings and input data.While most of the observational data sets come with estimates of the observational errors, the model discrepancy is difficult to specify.Here we estimate the combined observational and model error with the variance of the modeldata difference for the best fitting model realization (i.e., the model with the smallest mean squared error).In some few cases where the observational error is larger than this estimate (and thus the combined error is clearly underestimated), the observational error is taken as total error.

Spin-up procedure
The spin-up procedure for the 5000 ensemble members is tailored to keep computational costs low, while at the same time achieving small model drift after completion of the spinup.First, a very long spin-up over more than 20 000 years is carried out with standard model parameters and preindustrial (year 1800) boundary conditions.The spin-up is then continued for all individual members from this initial steady state to adjust the model to the perturbed parameters.In this way, the new equilibrium for the perturbed parameter set is reached faster than when starting from scratch.
The adjustment spin-up is done in a sequence where not all model components are active in all steps to further decrease the computational costs.First, the physical ocean component is run stand-alone for 1500 years.Then, the atmospheric energy balance model is coupled again to the ocean and the model is run for 1000 more years.Next, the oceanic biogeochemistry module is activated with initial tracer fields from the standard model configuration.The model is run for another 1000 years to allow the biogeochemical fields to adjust to the new physics.In parallel, the terrestrial component is run stand-alone for 400 years with the perturbed parameter settings, including an instantaneous adjustment of the soil carbon pools after 200 years by calculating the new pool sizes analytically from the adjusted fluxes.Finally, the fully coupled model is run for another 200 years and all transient simulations are started from this state.
The reliability of the spin-up procedure is verified by performing a 500-year-long control run without additional forcing and checking for unacceptable drift.Slight drifts in deep ocean tracers are accepted.
Modern peat carbon stocks are not in equilibrium with the current climate and boreal peatlands still sequestered about 0.1 Gt C yr −1 during the last millennium (Charman et al., 2013;Yu et al., 2010).Peat carbon distribution for our tran-sient simulations is initialized with the output from a transient simulation starting at the Last Glacial Maximum as described in Spahni et al. (2013).This initial pattern, and thus the total peat carbon inventory, is uniformly scaled with the value sampled for the parameter C peat,scale .
After the spin-up, the 5000-member ensemble is run over the industrial period under prescribed CO 2 and non-CO 2 forcing.The model output is compared with the observational data and the ensemble is reduced to the 1069 simulations with the highest skill, as described in the previous section.

Model simulations
In a next step we run the constrained model ensemble for 55 greenhouse gas scenarios spanning from high businessas-usual to low-mitigation pathways.The set of scenarios consists of economically feasible multi-gas emission scenarios from the integrated assessment modeling community.In addition to the four RCP scenarios (Moss et al., 2010) that were selected for the Fifth Assessment Report (AR5) of the IPCC, we included 51 scenarios from the EMF-21 (Weyant et al., 2006), GGI (Grübler et al., 2007), and AME (Calvin et al., 2012) projects.For these simulations, we prescribe atmospheric CO 2 and the non-CO 2 radiative forcing derived from the emission scenarios (Fig. A2) as described in Joos et al. (2001) and Strassmann et al. (2009).We note that the AME scenarios are less complete than the others because they do not provide emission paths for aerosols and some minor greenhouse gases.We therefore make the conservative assumption of constant aerosol emissions at the level of the year 2005 (−1.17 Wm −2 ), which implies a significant cooling effect continued into the future in those 23 (out of 55) scenarios (Fig. A2f).Note that this effect does not affect our estimates of TCR, ECS, and TCRE, which are based on the atmospheric CO 2 -only simulations described below, nor does it affect the constraining of the model ensemble with observation-based data over the historical period.The scenarios are extended from 2100 to 2300 by stabilizing atmospheric CO 2 and the non-CO 2 forcing by the year 2150 (see Steinacher et al., 2013, for details).Note that the extensions of the RCP scenarios beyond 2100 CE as used in the AR5 (extended concentration pathways, ECPs; Meinshausen et al., 2011) are not identical to the extensions applied here.Our extensions of RCP4.5 and RCP6 are similar to ECP4.5 and ECP6, but ECP8.5 differs significantly from our extension of RCP8.5, where atmospheric CO 2 is stabilized by 2150.
In addition to these multi-gas scenarios used by Steinacher et al. (2013), we run the model ensemble for an idealized "2xCO 2 " scenario to determine TCR, ECS, and TCRE and an emission pulse experiment.In the 2xCO 2 simulation, atmospheric CO 2 is increased by 1 % yr −1 from its preindustrial level until a doubling of the concentration is reached.After that, the atmospheric CO 2 concentration is held fixed.All other forcings remain constant at preindustrial levels.
The emission pulse simulations are conducted as described by Joos et al. (2013).A pulse input of 100 Gt C is added to a constant background atmospheric CO 2 concentration of 389 ppm in year 2010, while all other forcings are held constant at 2010 levels.The impulse response function (IRF) is then derived from the difference between simulations with and without emission pulse.Additionally, experiments with pulse sizes of 1000, 3000, and 5000 Gt C were performed to test the sensitivity of the response to the pulse size.These additional pulse experiments were run for a model configuration with median parameter settings, which is able to reproduce the median response of the ensemble for the 100 Gt C pulse (Fig. 2).

Definition of TCRE and TRE X
There are slightly different definitions of the TCRE in the literature.Matthews et al. (2009) define it similarly to the TCR, i.e., as the ratio of warming to cumulative CO 2 emissions in a simulation with prescribed 1 % yr −1 increase in atmospheric CO 2 at the time when atmospheric CO 2 reaches double its preindustrial concentration.In the AR5 of the IPCC, TCRE is defined more generally as the annual mean global surface temperature change per unit of cumulated CO 2 emissions in a scenario with continuing emissions (Collins et al., 2013).In scenarios with non-CO 2 forcings, such as the representative concentration pathways (RCPs), the diagnosed TCRE thus also depends on the non-CO 2 forcing.Further, the transient response should be distinguished from the peak response to cumulative emissions as defined in Allen et al. (2009), although the TCRE is nearly identical to the peak climate response to cumulative CO 2 emissions in many cases (Collins et al., 2013).The responses, TCR and TCRE, are defined for surface air temperature in previous studies.Here, we extend the definition of TCRE to any climate variable X(t).We define the transient response, TRE X , and peak response, TRE X peak , per cumulative CO 2 emissions at a given time t as where E(t) are the cumulative CO 2 emissions (either total or fossil-fuel-only emissions; see Appendix A).
For the transient response analyses, TRE X (t) is computed for every year t in the range 2000 < t ≤ 2300 (i.e., 300 data points per simulation).In contrast, the peak response is represented by only one data point per simulation.It is the value of X(t) at the time t max , i.e., where the maximum change in the absolute value of X(t) between the years 2000 and 2300 occurs, divided by the cumulative emissions in the year 2300, E(t = 2300), and denoted TRE X peak (t = 2300).Note that the actual peak response might occur after 2300 CE, in which case TRE X peak is only an approximation.Surface air temperature usually peaks before 2300 CE in the applied scenarios.Steric sea level rise, on the other hand, continues to increase after 2300 CE due to the large thermal inertia of the oceans.
TCRE is used in this study as defined by Gillett et al. (2013).Thus, TCRE is equivalent to TRE SAT derived from a simulation with prescribed 1 % yr −1 atmospheric CO 2 increase and no other forcings.

Calculation of PDFs
Cumulative CO 2 emissions E m,s (t) and climate response X m,s (t) are diagnosed for each model configuration 1 ≤ m ≤ N m (N m = 1069), greenhouse gas scenario 1 ≤ s ≤ N s (N s = 55), and simulation year 2000 < t ≤ 2300.For a given model configuration m and year t we obtain 55 points in the two-dimensional (E, X) space, representing the response under different scenarios (E m,s (t), X m,s (t)).These points are considered to span the range of plausible emission-response combinations for this model configuration.Technically, we use the convex hull, which is the smallest region containing all points such that, for any pair of points within the region, the straight-line segment that joins the pair of points is also within the region.
By combining the convex hulls from all model configurations m in the (E, X) space we can derive a two-dimensional PDF, p(E, X), of the plausible emission-response combinations.The model ensemble is constrained in this step by weighting the contribution of an individual model to the PDF with the model score S m : where C(P ) denotes the convex hull of the set of points P .Finally, the resulting field is normalized for each emission E to obtain the relative probability map p rel (E, X), as shown, for example, in Fig. 3a: Alternatively, p(x, y), is normalized by its integral, which then represents the PDF of the response in X for given emissions E. The probability that the change X remains smaller than a target value, X target , given emission E is then in percent: The allowable CO 2 emissions, E allowable , to not exceed the climate target X target with a probability of 68 % are then implicitly given by p cum (E, X target ) = 68 %.

Testing the linearity of the response
From the probability maps in the (E, X) space, PDFs are extracted at E = 1000, 2000, and 3000 Gt C. To compare the response at different emission levels the PDFs at 2000 and 3000 Gt C are rescaled to the response per 1000 Gt C. In a perfectly linear system we would expect that the rescaled PDFs are identical for the different emission levels.To test the linearity of the response further, we fit a linear function X(E) = a X,m • E to the points (E m,s (t), X m,s (t)) for each model configuration m.The linear function is forced through zero because we require X(E = 0) = 0 at preindustrial (t = 1800).From the obtained coefficients a X,m of the model ensemble, we then calculate a PDF for the sensitivity a X of the response to cumulative emissions under the assumption that a linear fit is reasonable.The goodness of fit is quantified by the correlation coefficients, r m , and standard , for each model setup m.In Table 2, the ensemble median and 68 % range of a X , of r m , as well as the ensemble median standard error (expressed as a percentage of the median linear slope), σ , are reported.

Selection of the climate variables for analysis and computation of TRE
We compute TRE X and IRFs (see Sect. 30 • N to 30 • S) with respect to calcium carbonate in the mineral form of aragonite, and global soil carbon stocks ( C soil ).These variables are deemed both impact-relevant and reasonably well represented in the Bern3D-LPJ EMIC.We stress the illustrative nature of this selection.TRE X may be computed for other climate variables and regions in future work.
In particular the determination of TRE X for extreme events, such as droughts, heat waves, and floods, or for food production and fishery may be relevant for policy makers and stakeholders.
Changes in SAT and sea level are known to impact biological and physical systems (e.g.IPCC, 2014; Sherwood  and Huber, 2010).Sea level rise is expected to affect coastal ecosystems such as mangroves, reduce coastal protection, and increase flood occurrence, possibly affecting hundreds of millions of people living in low-lying cities and along the coast.
The uptake of CO 2 by the ocean fundamentally changes the chemical composition of ocean waters (Orr, 2011), generally referred to as "ocean acidification".The reaction of dissolved CO 2 with H 2 O to H 2 CO 3 and the dissociation of the latter lead to an increase in the hydrogen ion concentration (H + ); a decrease in pH (pH = − log[H + ]); and, through shifting acid-base equilibria, a decrease in the concentration of carbonate ions, CO 2− 3 .The decrease in CO 2− 3 is associated with a decrease in the saturation state of water with respect to calcium carbonate (CaCO 3 ).
Ocean acidification in conjunction with warming waters poses large risks to marine species, marine ecosystems such as corals, sea grass meadows, and marine ecosystem services such as tropical fisheries (e.g.Gattuso et al., 2015;Howes et al., 2015).Warming waters affect the aerobic scope of marine organisms and constrain marine habitats (Deutsch et al., 2015;Pörtner et al., 2011).The saturation state of water with respect to aragonite and other mineral forms of calcium carbonate determines whether water is corrosive (in the absence of protective mechanisms) to shells and structures made out of calcium carbonate.Model projections reveal large and sustained changes in the saturation state of surface and deep waters for a range of emission scenarios (Orr et al., 2005;Joos et al., 2011).Waters in the Arctic Ocean, coastal upwelling zones, and the Southern Ocean are becoming increasingly undersaturated with respect to aragonite (Steinacher et al., 2009;Gruber et al., 2012) and ongoing changes in saturation state are largest in the tropics, possibly adversely affecting net calcification rates of coral systems.
The AMOC contributes to the net heat transport into the North Atlantic region and changes in the AMOC may affect climate patterns in Europe and worldwide.Paleo-data reveal a southward shift of the Intertropical Convergence Zone linked to a decrease or collapse of the AMOC with related terrestrial ecosystem impacts (e.g.Bozbiyik et al., 2011).Fi-nally, changes in the global soil carbon inventory may be taken here as an indication of the strength of the land carbonclimate feedback.

Climate response to a CO 2 emission pulse: testing the linearity of the emission-response relationship
In a first step, we explore how different climatic variables respond to a pulse-like input of carbon into the atmosphere (Fig. 2) and determine the so-called IRF for the different climate variables.The IRF experiments provide a framework to discuss path dependency and linearity in responses without the need to run many independent scenarios.IRFs for atmospheric CO 2 , SAT, SSLR, and ocean and land carbon uptake are given elsewhere and we refer the reader to the literature for a general discussion on IRFs, underlying carboncycle and climate processes, and timescales (e.g.Archer et al., 1998;Joos et al., 2013;Maier-Reimer and Hasselmann, 1987;Shine et al., 2005).
A main goal of this section on IRF is to discuss to which extent one may expect a close-to-linear relationship between cumulative CO 2 emissions and a climate variable of interest.A linear relationship between emissions and variable X has the advantage that the determination of TRE X depends on neither the choice of scenario nor the magnitude of CO 2 emissions.In addition, TRE X would, in the case of linearity, fully describe the response to any CO 2 emissions.We start with a description of the model setup, followed by theoretical considerations.Then we discuss linearity in the context of CO 2 -only scenario uncertainty by analyzing median model responses.After that, we investigate the response uncertainty by relying on the full model ensemble and compare scenario and response uncertainty.Finally, we briefly address the scenario uncertainty due to non-CO 2 forcing.

Model simulations to determine IRFs
CO 2 is added instantaneously to the model atmosphere to determine IRFs.This results in a sudden increase in atmospheric CO 2 and radiative forcing.Afterwards, the evolution in the perturbation of atmospheric CO 2 and in any climate variable of interest, e.g., global mean surface air temperature, is monitored in the model.The resulting curve is the impulse response function (Fig. 2).Here, 1069 runs were carried out in different model configurations by adding emissions of 100 Gt C to an atmospheric CO 2 background concentration of 389 ppm, which corresponds to the concentration in the year 2010.Additionally, simulations with emission pulses of 1000, 3000, and 5000 Gt C were run for a median model configuration (Methods).For comparability, all IRFs are normalized to a carbon input of 100 Gt C.

The link between IRF and TRE: theoretical considerations
The motivation to analyze IRFs is twofold.First, the dynamic of a linear (or approximately linear) system is fully characterized by its response to a pulse-like perturbation -i.e., the response of variable X at year t to earlier annual emissions, e, at year t can be represented as the weighted sum of all earlier annual emissions.The weights are the values of the IRF curve at emission age t − t : where the sum runs over all years t with annual emissions up to year t.IRFs thus provide a convenient and comprehensive quantitative characterization of the response of a model.IRFs form also the basis for the metrics used to compare different greenhouse gases in the Kyoto basket approach and to compute CO 2 equivalent concentrations (Joos et al., 2013;Myhre et al., 2013) and are used to build substitute models of comprehensive models (Joos et al., 1996).Second, and relevant for the TRE and for this study, IRFs allow us to gauge whether there is a roughly linear relationship between cumulative CO 2 emissions, and the change in a climate variable of interest, X(t).The transient response for variable X to cumulative CO 2 emissions is in this notation: We note that there is a close relationship between Eqs. ( 8) to (10) and thus between cumulative CO 2 emissions E(t), response X(t), and TRE X .The IRF provides the link between these quantities.Three conditions are to be met for a strict linear relationship between cumulative CO 2 emissions E and response X for any emission pathway: (i) the response is independent of the magnitude of the emissions, (ii) the response is independent of the age of the emission, i.e., the time passed since emissions occurred (in this case the IRF and the response TRE X is a constant and all emissions are weighted equally in Eq. 8), and (iii) non-CO 2 forcing factors play no rolea point that will be discussed later in Sect.3.1.4.While these conditions are not fully met for climate variables, they may still approximately hold for plausible emission pathways.For the range of RCP scenarios, the mean age of the CO 2 emissions varies from a few decades to about 100 years for the industrial period and up to year 2100, and then it increases up to 300 years until 2300 CE (Fig. 2c).More than half of the cumulative CO 2 emissions have typically an age older than 30 years (Fig. 2c).If the IRF curve is approximately flat after a few decades and independent of the pulse size, then the vast majority of emission is weighted by a similar value in Eq. ( 8).Consequently, the relationship between response X(t) and cumulative emissions, E(t), is approximately linear and path-independent.This response sensitivity per unit emission, X(t)/E(t), corresponds to an "effective" (emission-weighted) mean value of the IRF and is the transient response to cumulative CO 2 emissions TRE X .Indeed, the IRF for many variables varies within a limited range after a few decades (Fig. 2).Then, an approximately linear relationship between E(t) and X(t) holds and TRE X is approximately scenario-independent.

IRFs: median results
The median values of the (normalized) IRFs (Fig. 2, solid and dashed lines; Table 1) vary within a limited range over the period from 30 years to the end of the simulation (500 years) and for the different pulse sizes of 100 to 3000  For a given pulse size, the median of the IRF for the saturation with respect to aragonite in the tropical ( arag, trop. ) and Southern Ocean ( arag,S.O. ) surface waters and for the global soil carbon inventory varies within a limited range.However, the normalized IRFs for these variables vary substantially with the magnitude of the emission pulse.Thus, we expect a nonlinear relationship between the ensemble median responses and cumulative CO 2 emissions for these quantities.
The atmospheric CO 2 perturbation declines by about a factor of 2 within the first 100 years for an emission pulse of 100 Gt C.This means that the atmospheric CO 2 concentration at a specific time depends strongly on the emission path of the previous 100 years.In addition, the IRFs differ for different pulse sizes because the efficiency of the oceanic and terrestrial carbon sinks decreases with higher atmospheric CO 2 concentrations and warming.The fraction remaining airborne after 500 years is about 75 % for a pulse input of 3000 Gt C, about 2.5 times larger than the fraction remaining for a pulse of 100 Gt C (Fig. 2a).Thus, we do not expect a scenario-independent, linear relationship between atmospheric CO 2 and cumulative emissions.
At first glance, it may be surprising that the responses in SAT, SSLR, AMOC, and pH do not depend much on the size of the emission pulse given the strong sensitivity of the atmospheric CO 2 response to the pulse size.For the physical variables, this is a consequence of near-cancellation of nonlinearity in the carbon cycle and in the relationship between radiative forcing and atmospheric CO 2 (Joos et al., 2013).The long-term response in atmospheric CO 2 (Fig. 2a) increases with increasing emissions and the fraction remaining airborne is substantially larger for large than for small emission pulses.On the other hand, radiative forcing depends logarithmically on atmospheric CO 2 and the change in forcing per unit change in CO 2 is smaller at high than at low atmospheric CO 2 concentrations.As a consequence, the response in radiative forcing is rather insensitive to the magnitude of the emission pulse and so is the response in climate variables forced by CO 2 radiative forcing.A similar effect applies for pH.Changes in dissolved [CO 2 ] and [H + ] in the surface ocean closely follow changes in atmospheric CO 2 as the typical timescale to equilibrate the ocean mixed layer with an atmospheric CO 2 perturbation is of the order of a year and because changes in [H + ] are roughly proportional to [CO 2 ] (Orr, 2011).pH is by definition the (negative) logarithm of the H + concentration.As for radiative forcing, nonlinearities in the CO 2 and thus H + response roughly cancel out when applying the logarithm to compute pH.

Response vs. scenario uncertainty
The Monte Carlo IRF experiments allow us also to assess the response or model uncertainty (Fig. 2, orange range).The 90 % confidence range in the IRF are substantially larger than the variation of the (normalized) median IRF for the variables SAT, SSLR, AMOC, and soil carbon inventory.Consequently, the model uncertainty will dominate the uncertainty in TRE X and is larger than uncertainties arising from dependencies on the carbon emission pathway.On the other hand, the response uncertainty from our 1069 Monte Carlo model setups are more comparable to the variation in the median IRFs for atmospheric CO 2 , and surface water saturation with respect to aragonite in the tropical ocean and Southern Ocean.
In addition to the path dependency and the response uncertainty in TRE X discussed above, forcing from non-CO 2 agents will affect TRE X .We expect a notable influence of non-CO 2 agents on the physical climate variables SAT, SSLR, and AMOC.For example, Strassmann et al. (2009) attributed simulated surface warming to individual forcing components for a range of mitigation and non-mitigation scenarios.They find that non-CO 2 greenhouse gas forcing causes up to 50 % as much warming as CO 2 forcing and that the non-CO 2 forcing is only partly offset by aerosol cooling by 2100.On the other hand, we expect a small influence of non-CO 2 forcing on pH and saturation state which is predominantly driven by the atmospheric CO 2 perturbation (Steinacher et al., 2009;McNeil and Matear, 2007).
In summary, uncertainty in the response dominates over the uncertainty arising from path dependency for SAT, SSLR, AMOC, and soil carbon.For CO 2 -only or CO 2 -dominated scenarios, we expect close-to-linear relationship between cumulative CO 2 emissions and SAT, surface ocean pH, AMOC, and to some extent for SSLR.In other words, the concept of TRE X should work particularly well for these variables.On the other hand, less well expressed linear behavior is found for global soil carbon and surface water saturation with respect to aragonite.In the next section, we will elaborate on these findings and quantify TRE X .

The transient response to cumulative CO 2 emissions
We investigate the response in multiple climate variables, X(t), as a function of cumulative fossil or total CO 2 emissions E(t).We used the model ensemble presented in Steinacher et al. (2013) for 55 multi-gas emission scenarios from the integrated assessment modeling community which range from very optimistic mitigation to high business-asusual scenarios (Methods).From those simulations we determine the transient response to cumulative CO 2 emissions TRE X (t) = X(t)/E(t) (Tables 2 and A2; Figs.3-5).
The discussion of results is guided by the results shown in Figs. 3 to 5.These show the relative probability of change in the variable X for given cumulative CO 2 emissions (e.g., colors in Fig. 3a) together with the linear regression slopes (black dashed lines).These graphs allow one to visually inspect the linearity in response to cumulative CO 2 emissions and results include both scenario and response uncertainty.In accompanying panels (e.g., Fig. 3b), the focus is on scenario uncertainty versus response uncertainty.The relationship between change and cumulative emissions is plotted for the ensemble median and for the 55 scenarios (55 colored lines).In addition, the 68 and 90 % confidence intervals for the response (or model) uncertainty are given for one scenario, RCP8.5, by red dashed and dotted lines, respectively.These graphs allow one to infer scenario and response uncertainty individually.

TRE SAT
We find a largely linear relationship between cumulative CO 2 emissions and both transient and peak warming (Fig. 3a and  c) for the set of emission scenarios considered here.These linear relationships confirm the finding from the pulse experiment above, i.e., that the response in the global SAT change is largely independent of the pathway of CO 2 emissions in our model.We note, however, that some low-end scenarios show a nonlinear behavior due to non-CO 2 forcing (Fig. 3b).Some AME scenarios show a decrease in temperature due to a strong reduction in the non-CO 2 forcing while cumulative emissions continue to increase slightly.Other scenarios (mostly from GGI) deviate from the linear relationship when negative emissions decrease the cumulative emissions while the increased temperature is largely sustained.These nonlinearities are evident as large changes in the slope between SAT and cumulative emissions towards the end of the individual simulations -that is, after ≈ 2150 CE, when atmospheric CO 2 is stabilized and emissions are low (Fig. 3b).Yet those deviations are not large enough to eliminate the generally linear relationship found for this set of scenarios.
The projected warming for a given amount of CO 2 emissions is associated with a considerable uncertainty which increases with higher cumulative emissions.This uncertainty arises both from the response uncertainty of the model ensemble such as the uncertain climate sensitivity or oceanic carbon uptake and from the scenario uncertainty.The scenario uncertainty is mainly due to different assumptions for the non-CO 2 forcing in the scenarios.The AME scenarios, for example, assume a relatively strong negative forcing from aerosols which leads to a consistently smaller warming than in the other scenarios (Fig. 3b).The response and scenario uncertainty appear to be of the same order of magnitude (Fig. 3b).
We fitted a linear function through zero to the results of each ensemble member and then calculated the PDFs from the individual slopes.The median slope is 1.8 ) for the peak response and values are similar for the transient response (Table 2).These slopes are somewhat lower than the direct results, but in general the linear regression approach is able to reproduce the distribution of the peak and transient warming response per 1000 Gt C CO 2 emissions, although the confidence interval is narrower and the long tail of the distribution might be underestimated.

TCRE
Following Matthews et al. (2009) and Gillett et al. (2013), we also determined the TCRE for our model ensemble from a scenario where atmospheric CO 2 is increasing by 1 % yr −1 until twice the preindustrial concentration is reached.No other forcing agents are included.Correspondingly, we find a slightly lower median TCRE of 1.7 than for the SAT response in the multi-agent scenarios.The 68 % c.i. includes the scenario uncertainty range in TCRE (1.5 to 2.0 • C) obtained by Herrington and Zickfeld (2014) with a single model setup and for a range of CO 2 -only scenarios (with constant future non-CO 2 forcing).Gillett et al. (2013) report a TCRE of 0.8-2.4• C (Tt C) −1 (5-95 % range) from 15 models of the Coupled Model Intercomparison Project (CMIP5) for a 2xCO 2 scenario and a range of 0.7-2.0• C (Tt C) −1 estimated from observations.In IPCC AR5, TCRE is estimated to be likely in the range of 0.8 • C to 2.5 • C for cumulative emissions up to about 2000Gt C (IPCC, 2013).Those ranges are somewhat lower than our 5-95 % ranges of 0.9-3.1 • C (Tt C) −1 obtained by linear regression from the scenarios that include non-CO 2 forcing and 1.0-2.7 • C (Tt C) −1 from the 2xCO 2 simulations.

TRE SST
The transient response in sea surface temperature (SST) shows the same characteristics as the response in SAT (Fig. 3e, f).The response is 1.5 sions, and 1.3 • C (Tt C) −1 (0.9-1.8 • C (Tt C) −1 ) for the linear regression approach.

TRE SSLR and TRE AMOC
Compared to global mean warming, the responses in SSLR and in the strength of the AMOC are more emission-pathdependent (Fig. 4b, d).In all scenarios applied here, it is assumed that atmospheric CO 2 and total radiative forcing is stabilized after 2150.This yields a slow additional growth in cumulative emissions after 2150, whereas SSLR continues largely unabated and the AMOC continues to recover.This results in a steep slope in the relationship between cumulative CO 2 emissions and these variables after 2150 as well visible in Fig. 4b.The path dependency also results in larger differences between transient and peak responses (Table 2).
The projected peak SSLR is described remarkably well by a linear regression (Table 2).However, these results for the peak SSLR response are somewhat fortuitous and influenced by our choice to stabilize atmospheric CO 2 and forcings after 2150 in all scenarios and by the stopping of simulations in year 2300.We emphasize that SSLR would continue to increase beyond the end of the simulation and TRE SSLR are thus only indicative for the period from today to year 2300.For AMOC, the response is somewhat stronger for lowemission than high-emission paths (Fig. 4d).For 1000 Gt C total emissions, we find a peak reduction in AMOC of −24 % (−35 to −15 %) (Table 2).This sensitivity is larger than found by Herrington and Zickfeld (2014), but simulated changes in AMOC are known to be model-dependent.

TRE pH , TRE arag,S.O. and TRE arag,tropics
Surface pH shows a very tight and linear relationship with cumulative CO 2 emissions (Fig. 4e, f).This is consistent with a small influence of non-CO 2 forcing agents, a small response uncertainty, and a relatively small dependency on the CO 2 emission pathway as revealed by the IRF experiments.Both scenario uncertainty and response uncertainty are smaller than for other variables.pH decreases by about 0.2 unit per 1000 Gt C emissions from fossil sources.
For arag , the nonlinearities are more pronounced than for the physical variables and pH with a proportionally stronger response at low total emissions ( arag = −0.68 to −0.54 (Tt C) −1 at 1000 Gt C total emissions) and weaker response at higher total emissions ( arag = −0.43 to −0.35 (Tt C) −1 at 3000 Gt C total emissions, Fig. 5b, d).Again, results for fossil-fuel emissions only are provided in Fig. A3 and Table A2.

TRE C soil
Finally, the change in global soil carbon (Fig. 5e, f) shows a similar response to SSLR, with continued carbon release from soils after stabilization of greenhouse gas concentrations in medium-to high-emission scenarios.Like the ocean heat uptake, the respiration of soil carbon can be slow, particularly in deep soil layers at high latitudes, and it takes some time to reach a new equilibrium at a higher temperature.The response uncertainty represented by the model spread for a given scenario, however, is even larger than the spread from the scenarios.For the same scenario, the 90 % confidence interval ranges from a very high loss of up to 40 % to increases in global soil carbon by a few percent (Fig. 5f).
In summary, we find that not only global mean surface air temperature but also the other target variables investigated here show a monotonic relationship with cumulative CO 2 emissions in multi-gas scenarios.The relationship with cumulative CO 2 emission is highly linear for pH as evidenced by the high correlation coefficient and the invariance in the ensemble median and confidence range from total emissions (Table 2).Changes in steric sea level, meridional overturning circulation, and aragonite saturation are generally less linearly related to cumulative emissions than global pH and surface air temperature.These variables show a substantial nonlinear response after stabilization of atmospheric CO 2 .Nevertheless, the PDF of the peak response for all these variables can be reproduced relatively well with a linear regression yielding correlations of r = 0.8-0.98 and standard errors of σ = 30-40 % (Table 2).

Transient and equilibrium climate sensitivity
TCR is estimated from the ensemble simulations with 1 % yr −1 increase until doubling of atmospheric CO 2 and in combination with the observational constraints (Methods).
TCR is constrained to a median value of 1.7 • C with 68 and 90 % c.i. of 1.3-2.2• C and 1.1-2.6 • C, respectively.The 68 % range is somewhat narrower than the corresponding IPCC AR5 range of 1.0-2.5 • C (Collins et al., 2013).The CMIP5 model mean and 90 % uncertainty range of 1.8 and 1.2-2.4• C (Flato et al., 2013) are fully consistent with our observation-constrained estimates.
ECS is estimated by extending the 2xCO 2 simulations by 1500 years (at constant radiative forcing) and fitting a sum of exponentials to the resulting temperature response.Median ECS is 2.9 • C with constrained 68 and 90 % c.i. of 2.0-4.2• C and 1.5-6.0• C. Again, the CMIP5 model mean and 90 % range of 3.2 and 1.9-4.5 • C are well within our observationconstrained estimates.However, our 68 % confidence interval is narrower than the IPCC AR5 estimate of 1.5-4.5 • C, particularly on the low end.

Influence of individual observational data on the probability distribution
Twenty-six different observational data sets are applied to constrain carbon-cycle and physical climate responses.This raises the question of to what extent an individual data set or a group of data sets constrain the model responses and whether some data sets may unintentionally deteriorate estimates.Uncertainties in the carbon cycle are irrelevant for the physical metrics TCR and ECS.Correspondingly, data sets aimed at constraining the carbon-cycle response, e.g., land carbon inventory data, should not affect estimates of TCR and ECS.The effect of the different observational constraints on the constrained, posterior distribution for TCR and ECS is estimated by applying only subsets of the observational data.First, the subsets of constraints is given the full weight as if they were the only available data (Fig. 6a, c).As expected, the data groups "land" and "ocean", targeted at carbon-cycle responses, do not influence the outcomes for TCR and ECS.The subgroups "heat" (SAT and ocean heat uptake records) and "CO 2 " both constrain TCR and ECS and shift the prior PDF towards the fully constrained PDF when applied alone (Fig. 6a, c).The SAT record tends to constrain TCR and ECS to slightly higher values and the ocean heat uptake data to slightly lower values than the full constraint.
Interestingly, the "CO 2 " subgroup also narrows the probability distribution for TCR and ECS, although less than the SAT and ocean heat records.The "CO 2 " subgroup includes data sets of the atmospheric CO 2 increase over the industrial period and observation-based estimates of the ocean and land carbon uptake for recent periods.Ocean carbon and heat uptake are governed by similar processes, namely the surfaceto-deep transport of excess carbon and heat from the surface to the deep ocean.Apparently, model members that are not able to describe the ocean carbon uptake and the evolution in atmospheric CO 2 reasonably well, also fail to match observational records for SAT and ocean heat content.The PDF for the "CO 2 " subgroup displays several maxima for ECS and similar for TCR.We are not in a position to provide a firm explanation for these maxima, but we speculate that this result may be related to the limited number of members in our ensemble and that the multi-dimensional model parameter space is not completely sampled.
Second, the subsets of constraints are added successively (Fig. 6b, d).Unlike above, weights associated with each subgroup are now set to correspond to the weights they will have in the fully constrained set (i.e., after adding all the subsets).Note that the fully constrained posterior distribution does not depend on the order of applying the individual constraints.When applied sequentially with their corresponding weights in the full constraint, ocean heat uptake represents the strongest constraint.In contrast, the SAT record changes the prior PDF only slightly (dashed magenta line in Fig. 6b,  d) when applied with its corresponding weight in the full constraint.Similarly, adding the group "CO 2 " after the ocean heat uptake data shifts the PDF only slightly (solid magenta vs. cyan line in Fig. 6b, d).This suggests that the CO 2 data do not add substantial information with respect to TCR and ECS that is not already captured by the temperature data.In summary, the subgroup "heat" represents the strongest constraints for TCR and ECS.In particular, the ocean heat uptake data are important for constraining these metrics and exerts the dominant influence on the final PDFs.

Discussion
We have quantified the transient response to cumulative CO 2 emissions, TRE X , for multiple Earth system variables, the responses to a CO 2 emission pulse defining the IRF, and three other important climate metrics, the ECS, the TCR, and the TCRE.TRE X and IRF are evaluated for global and regional changes in physical and biogeochemical variables.The linearity and path dependency in responses and scenario uncertainties as well as model response uncertainties are quantified.Our probabilistic results are derived with an observationally constrained ∼ 1000-member ensemble of the Bern3D-LPJ model and for 55 different greenhouse gas scenarios and additional idealized simulations.
A caveat is that we apply a cost-efficient EMIC with limitations in spatial and temporal model resolution and mechanistic representation of important climate processes.However, and in contrast to reduced-form, box-type, twodimensional, linear response models; expert assumptions; or component models applied in many earlier probabilistic assessments (e.g.Wigley and Raper, 2001;Knutti et al., 2002Knutti et al., , 2003Knutti et al., , 2005;;Schleussner et al., 2014;Bodman et al., 2013;Little et al., 2013;Harris et al., 2013;Holden et al., 2013;Bhat et al., 2012), Bern3D-LPJ features a dynamic threedimensional ocean with physically consistent formulations for the transport of heat, carbon, and other biogeochemical tracers, similar to work by Holden et al. (2010) and Olson et al. (2012), and includes a state-of-the-art dynamic global vegetation model, peat carbon, and anthropogenic land-use dynamics.The model is applied directly without using an emulator (Holden et al., 2010(Holden et al., , 2015;;Olson et al., 2012).Further, we note that no ocean carbonate chemistry or marine biology parameters were varied in this study.Results for changes in AMOC are known to vary considerably among different models and our ensemble may not represent the full uncertainty in AMOC response.Important processes are not represented in Bern3D-LPJ.Most notably, the melting of ice sheets and glacier and its impacts on sea level and AMOC are not included.Consequently, only results for the steric component of sea level rise are reported and results for changes in AMOC should be considered with caution.Potential climatic "surprises" such as the massive release of methane from clathrates or permafrost are also not considered.

TRE X : The emission-response relationship
A main focus of this study is on TRE X and thus on the probabilistic relationship between cumulative CO 2 emissions and the transient or peak response in individual, illustrative climate variables.TRE X was evaluated both by using the response and emission data for each year of a simulation and, in the case of TRE X peak , by considering only the peak (or maximum) in response over a transient simulation.For simplicity, the term TRE X is often used to refer to both quantities in the following discussion.In this study, probability distributions are always determined for the climate variable response for a fixed, given amount of emissions.For example, for 1000 Gt C of total emissions, the peak response in global mean surface temperature change ( SAT) is determined to 2.31 • C and to be with a probability of 68 % within 1.49 and 3.81 • C (Table 2).
The magnitude of the response is in general nonlinearly related to cumulative CO 2 emissions.This may present no fundamental problem.Yet, nonlinearity in responses add to the scenario uncertainty and extrapolation beyond the considered scenario space may not provide reliable results.Non-linear relationships cannot be precisely summarized with one single number.For convenience, we have approximated responses for the investigated variables by linear fits (Tables 2 and A2).A close to linear relationship is found for pH.Consistent with earlier studies, we also find an approximately linear relation between transient surface temperature increase and cumulative CO 2 emissions of about 1-3 • C (Tt C) −1 over our set of multi-agent scenarios.There are some nonlinear temperature responses in strong mitigation scenarios (particularly those with negative emissions).
Within Bern3D-LPJ, TRE SAT is higher when evaluated at 1000 Gt C than when evaluated at 2000 or 3000 Gt C (see Table 2).This may be related to non-CO 2 forcing as it potentially has a relatively smaller role in high-emission scenarios.It may also be model-specific as similar tendencies are found for not only the other physical variables but  (c, d) derived from the model ensemble and for different observation-based constraints.In c) the PDFs are shown for the ensemble without constraints (prior, black line), for the case when each of the constraint groups "heat" (magenta), "CO 2 " (cyan), "ocean" (blue), and "land" (green) is applied alone with equal weights, and for all constraints (red).The group "heat" is split up further into SAT anomaly (dashed magenta) and ocean heat uptake observations (dotted magenta).In (b, d) the constraints are added sequentially with their corresponding weights in the full constraint in the following order: SAT anomaly (magenta dashed), ocean heat uptake (magenta solid), CO 2 (cyan), ocean (blue), and land (red, corresponding to the full constraint).also the ocean acidification variables which are hardly influenced by non-CO 2 forcing.A tendency for the TCRE to decrease with increasing cumulative emissions is noted in earlier studies (Herrington and Zickfeld, 2014;Gillett et al., 2013;Matthews et al., 2009), while Krasting et al. (2014) find TCRE to be large for low and high emission rates and low for modern emission rates in idealized scenarios in the GFDL model.

Climate targets, allowable emissions, and TRE X
Next we address climate targets and allowable emissions, widely discussed in the literature for global mean surface temperature (e.g.Siegenthaler and Oeschger, 1978;Friedlingstein et al., 2011;Rogelj et al., 2011;Peters et al., 2013).The link between a climate target, e.g., the 2 • C target, and allowable emissions is closely related to TRE X and TRE X peak .The probabilistic, quantitative relationship between a climate variable of choice and cumulative CO 2 emissions permits one to assess the ceiling in cumulative CO 2 emissions if a specific individual limit is not to be exceeded with a given probability, P .This quantification of allowable emissions is possible irrespective of whether the emissionresponse relationship is linear or not.Estimates of allowable emissions may be inferred from the full model ensemble re-sults or approximated graphically from the Figs. 3 to 5.Even simpler, TRE X (P ) (or TRE X peak (P )) is a convenient measure to link a given climate target with allowable fossil-fuel CO 2 emissions, E allowable .It holds that where X target (P ) is a limit in variable X not to be exceeded with probability P .TRE X (P ) is then the numerical value determined from the probability distribution (e.g., Fig. 3d) of TRE X for a given cumulative probability P (or (1 − P )).
In the case of an approximately linear emission-response relationship, a single value of TRE X (P ) applies for different target levels.For example, TRE SAT peak is 2.85 • C (Tt C) −1 at the 68th percentile (evaluated for total emissions of 1000 Gt C).Then, allowable total carbon emissions to keep global mean surface temperature warming below 2 • C at any time with a 68 % probability are estimated to 702 Gt C (2 • C/(2.85 • C (Tt C) −1 )).Correspondingly, allowable total carbon emissions to meet the 1.5 • C target mentioned in the Paris agreement (United Nations, 2015) are 526 Gt C. Numerical values of TRE X vary with the magnitude of emissions (Tables 2 and A2 the Energy Modeling Forum Project 21 (Van Vuuren et al., 2008).Thus, in the context of emission mitigation, the numerical values (median and confidence interval) determined at 1000 Gt C cumulative fossil-fuel emissions appear best suited (Tables 2 and A2).For convenience, we provide the inverse values of TRE X and TRE X peak for the different climate variables for the 68th and 90th percentiles of the cumulative, integrated probability distribution in Table 3. Multiplying the appropriate value by the climate target of choice yields the allowable emissions to meet this target with 68 and 90 % probability, respectively.Some aspects are not explicitly considered here.First, meeting a set of multiple targets requires lower cumulative CO 2 emissions than required to meet the most stringent target within the set in probabilistic assessments (Steinacher et al., 2013).Thus, the evaluation of allowable cumulative emissions to meet multiple climate targets requires their joint evaluation.In practical terms, the joint evaluation of the 2 • C target and the Southern Ocean saturation target would yield lower allowable emissions than indicated in the above paragraph.
Second, inertia in the socioeconomic system limits the rate of carbon emission reduction.In other words, carbon emissions are committed for the future through existing infrastructure.The committed peak change in a climate variable X (relative to preindustrial) under a limited, constant rate of emission reduction s is easily evaluated using the tabulated values of TRE X (Allen and Stocker, 2014): Here, e(t) denotes the CO 2 emissions at time t, e.g., today, e(t)/s is the cumulative sum of future emissions (given exponentially decreasing emissions with rate s), and E(t) is the cumulative emissions over the historical period up to time t.Economically feasible emission reduction rates are considered to be in the range of a few percent.In 2015, total CO 2 emissions are about 10 Gt C per year and realized emissions from fossil-fuel burning, land use, and cement production are about 600 Gt C.This yields a committed (median) change in SAT of 2.5 (2.31 • C (Tt C) −1 × (10/0.02+ 600) Gt C) and 1.8 • when assuming immediate emission reduction with a rate of 2 and 5 %, respectively.The corresponding commitments in pH decrease are 0.22 and 0.16.
Climate targets may become out of reach when the transition to a decarbonized economy is delayed.This is quantitatively illustrated by the mitigation delay sensitivity (MDS; Stocker, 2013;Pfister and Stocker, 2016), a metric that captures the additional, committed increase in a climate variable due to a delay in emission reduction.Again, the values of TRE X given in Tables 2 and A2 allow one to compute the median and the 68 % confidence interval for the MDS following Allen and Stocker (2014).

Impulse response functions (IRFs)
The response to a pulse-like input of carbon into the atmosphere for atmospheric CO 2 , ocean and land carbon, surface air temperature, and steric sea level rise are discussed elsewhere (e.g.Archer et al., 1998;Frölicher et al., 2014;Joos et al., 2013;Shine et al., 2005).Here we provide, in addition, IRFs for surface ocean pH and calcium carbonate saturation states as well as soil carbon.A substantial fraction of carbon emitted today will remain airborne for centuries and millennia.The impact of today's carbon emissions on surface air temperature will accrue within about 20 years only but persists for many centuries.In Bern3D-LPJ, as in many other models, surface air temperature remains approximately constant after the first ∼ 20 years after the pulse input.As found in earlier studies, the normalized IRF in SAT depends relatively weakly on the magnitude of the emission pulse.However, the peak warming is realized later for larger than for smaller emission pulses in Bern3D and in a range of other models (Joos et al., 2013;Zickfeld and Herrington, 2015).Interestingly, Frölicher et al. (2014) find that surface air temperature increases for several centuries in their CO 2 pulse experiment with the GFDL model.Steric sea level rise accrues slowly on multi-decadal to century timescales.Similar to atmospheric CO 2 , peak impacts in surface ocean pH and saturation states occur almost immediately after emissions and these changes will persist for centuries to millennia.Thus, the environment and the socioeconomic system will experience the impact of our current carbon emissions more or less immediately and these impacts are irreversible on human timescales.

Transient and equilibrium climate sensitivity
Another focus of this study is to provide observationconstrained estimates of the TCR, the ECS, and the TCRE as determined from CO 2 -only scenarios.The recent slow-down in global surface air temperature warming (Hartmann et al., 2013;Roberts et al., 2015;Nieves et al., 2015;Karl et al., 2015;Marotzke and Forster, 2015), termed hiatus, has provoked discussions whether climate models react too sensitive to radiative forcing.Here, the observation-constrained TCR and ECS are quantified to 1.7 and 2.9 • C (ensemble mean) with 68 % uncertainty ranges of 1.3 to 2.2 and 2.0 to 4.2 • C, respectively.TCRE is estimated to 1.7 • C (Tt C) −1 .Our results for ECS, TCR, and TCRE are consistent with the CMIP5 estimates in terms of multi-model mean and uncertainty ranges (Flato et al., 2013) and there is no apparent discrepancies between our observation-constrained TCR and CMIP5 models.On the other hand, our results do not confirm some recent studies (Otto et al., 2013;Schwartz, 2012;Collins et al., 2013) that suggest the possibility of a TCR below 1 • C. Such low values for TCR are outside the very likely range given in the Fourth Assessment Report of IPCC (discussed by Collins et al., 2013) and of this study.The choice and record length of observational constraints may bias results for TCR and ECS.In particular, internal climate variability, e.g., associated with the Atlantic Multidecadal Oscillation, may obscure the link between anthropogenic forcing and response (van der Werf and Dolman, 2014).Ocean heat content data provide the strongest constraint on ECS and TCR in our analysis.The influence of the applied long-term hemispheric SAT records is smaller.This is not surprising as ocean heat content represents the timeintegrated anthropogenic forcing signal both in the observations and in our model.Roemmich et al. (2015) analyzed a large set of ocean temperature measurements from floats covering the top 2000 m of the water column and concluded that ocean heat uptake continues steadily and unabated over the recent period 2006 and 2013.The significant variability in surface temperature and upper 100 m heat content was offset by opposing variability from 100 to 500 m.The high variability in the SAT and SST records as evidenced by the hiatus serves to emphasize that these records are poor indicators of the steadier subsurface-ocean and climate warming signal on the decadal timescale.These findings appear to support our approach where ocean heat data provide the strongest constraint on TCR and ECS, complemented by hemispheric century-scale (1850 to 2010) SAT records.Studies that rely on decadal-scale SAT (or SST) changes as included in the most recent assessment by the IPCC may be affected by large and unavoidable uncertainties due to the chaotic nature of natural, internal variability (van der Werf and Dolman, 2014).These findings suggest that the downward revision of the ECS range from the IPCC's AR4 to AR5 may, in hindsight, appear perhaps somewhat cautious and that the AR4 range may be more reliable.

Summary and conclusions
We have quantified the transient response to cumulative CO 2 emissions, TRE X , for multiple Earth system variables, the responses to a CO 2 emission pulse defining the impulse response function (IRF), and three other important climate metrics, the equilibrium climate sensitivity (ECS), the transient climate response (TCR), and the transient climate response to cumulative CO 2 emissions (TCRE).Our results are based on (i) a large number of simulations carried out in a probabilistic framework for the industrial period and for the future using 55 different greenhouse gas scenarios, different emission pulses, and an ∼ 1000-member model ensemble and (ii) a diverse and large set of observational data as constraints.The observation-constrained PDFs provide both best estimates and uncertainties ranges for risk analyses and for determining allowable emissions to meet a climate target.
The 68 % confidence intervals for TCR and ECS are constrained to 1.3 to 2.2 and 2.0 to 4.2 • C, respectively.This is fully consistent with the range found by the CMIP5 models, but in conflict with suggestions of the possibility of a TCR below 1 • C. Ocean heat content data provide the most stringent constraint on these estimates, while observation-based records of surface air temperature and of the atmospheric CO 2 budget are of secondary importance in our analysis.TRE X and IRF are evaluated for changes in physical variables including surface air and ocean temperature, sea level, and Atlantic meridional overturning circulation and changes in ocean acidification variables and terrestrial soil carbon stocks.Path dependency in responses and scenario uncertainties as well as model response uncertainties are quantified.
The IRF analysis provides a theoretical framework to discuss path dependency and linearity in response without the need to run many independent scenarios.It reveals that a perfect linearity between cumulative CO 2 emissions and Earth system variables is not to be expected.Nevertheless, the median values of the (normalized) IRFs vary within a limited range for an emission age range between 30 and 500 years and for pulse sizes between 100 and 3000 Gt C for global mean surface air temperature, surface ocean pH, AMOC, and to a somewhat lesser degree for SSLR.This implies a closeto-linear relationship between these variables and cumulative CO 2 emissions and relatively little influence of the CO 2emission scenario choice for these variables.On the other hand, the IRFs for atmospheric CO 2 , global soil carbon inventory, and aragonite saturation in the tropics and Southern Ocean are shown to vary with the size of the emission pulse, implying some nonlinearity in the emission-response relationship.
TRE X provides a convenient metric to characterize (i) responses of different climate variables to CO 2 emissions and (ii) to estimate the link between an individual climate target and allowable emissions.A close to linear relationship between cumulative CO 2 emissions and modeled change is found for the Earth system variables investigated here and when considering both scenario and response uncertainty and total emissions of up to 3000 Gt C.These findings suggests that the emission-response and emission-climate target relationships described by TRE X should be further evaluated and quantified for additional impact-relevant climate variables and using the full Earth system model hierarchy.A2.

Figure 2 .
Figure 2. Response to an emission pulse of 100 Gt C added to an atmospheric concentration of 389 ppm.Ensemble median (solid red line) and 68/90 % ranges (dark/light orange) of changes in (a) atmospheric CO 2 , (b) surface air temperature, (d) steric sea level rise, (e) Atlantic meridional overturning circulation, (f) global soil carbon stocks, (g) global mean surface ocean pH, and (h) southern and (i) tropical ocean surface aragonite saturation are shown.The dashed lines show the response (per 100 Gt C) for median parameters and pulse sizes of 100 (red), 1000 (black), 3000 (blue), and 5000 Gt C (green).Panel (c) shows the mean age of past emissions over the historical period and for the four RCP scenarios (left axis), and the fraction of the emissions older than 30 years (right axis) versus calendar years.More than half of the emissions are older than ∼ 30 years.The bulk of the emissions at any calendar year is thus in the age range (x axis in the other panels) where the pulse response function varies within a limited range for surface air temperature (b), surface pH (g), steric sea level rise (d), and Atlantic meridional overturning (e).This implies an approximately linear relationship between cumulative emissions and responses in these variables.

Figure 3 .
Figure 3. Transient and peak warming as a function of cumulative emissions: (a) relative probability of transient surface air temperature change ( SAT) for given cumulative CO 2 emissions (fossil fuel and deforestation), derived from annual values from ensemble model simulations for 55 greenhouse gas emission scenarios.Black dashed lines show the median and 68 % range of the linear regression slope.The red line indicates the coverage of the emission range by the model ensemble.High and low emission ranges with a coverage of less than 90 % are shaded and considered not robust.(b) Transient SAT response (ensemble median) for the 55 different scenarios.The dashed/dotted lines show the 68/90 % range of the ensemble for the RCP8.5 scenario to indicate the model spread.Note that our extensions of the RCP scenarios beyond 2100 are not identical to the extended concentration pathways (ECPs; see Methods).(c) Same as (a) but for the peak warming for given total cumulative emissions.(d) PDFs of the peak warming for 1000 (blue), 2000 (green), and 3000 Gt C (red) cumulative emissions, and for the linear regression (black).The dashed lines indicate the unscaled PDFs and solid lines the normalized response per 1000 Gt C. (e, f) Same as (a, b) but for transient sea surface temperature change ( SST).

Figure 5 .
Figure 5. Same as Fig. 4 but for the transient response in surface aragonite saturation state in the Southern Ocean and in the tropics, as well as in global soil carbon stocks.

Figure 6 .
Figure 6.PDFs of transient climate response (a, b) and equilibrium climate sensitivity(c, d) derived from the model ensemble and for different observation-based constraints.In c) the PDFs are shown for the ensemble without constraints (prior, black line), for the case when each of the constraint groups "heat" (magenta), "CO 2 " (cyan), "ocean" (blue), and "land" (green) is applied alone with equal weights, and for all constraints (red).The group "heat" is split up further into SAT anomaly (dashed magenta) and ocean heat uptake observations (dotted magenta).In (b, d) the constraints are added sequentially with their corresponding weights in the full constraint in the following order: SAT anomaly (magenta dashed), ocean heat uptake (magenta solid), CO 2 (cyan), ocean (blue), and land (red, corresponding to the full constraint).
) as mentioned above.Cumulative fossil and land-use emissions up to year 2100 are typically lower than 1500 Gt C for the mitigation scenarios of www

Figure A1 .Figure A2 .
Figure A1.Observation-based data sets used to constrain the model ensemble.The data sets are organized in a hierarchical structure to balance the weight of individual data sets, and model skill scores are aggregated by averaging over the group of constraints at the same level in the hierarchy(Steinacher et al., 2013).

Figure A3 .
Figure A3.Response as function of fossil-fuel CO 2 emissions: (a) same as Fig. 3a but for cumulative fossil-fuel emissions only -i.e., CO 2 emissions from deforestation are not included in this figure (see Methods).(b-d) Same as (a) but for the transient SSLR, sea surface temperature change ( SST), and global annual mean surface ocean pH ( pH surf ).The response of the remaining variables to fossil-fuelonly CO 2 emissions are given in TableA2.

Table 1 .
Response to a 100 Gt C CO 2 emission pulse on different timescales as simulated by the Bern3D-LPJ model (see also Fig.2).

Table 2 .
Transient (TRE X ) and peak (TRE X peak ) response per 1000 Gt C total CO 2 emissions estimated with different methods.Ensemble medians and 68 % ranges (i.e., the 16th and 84th percentiles) are taken from the relative probability maps derived from all model configurations and scenarios at 1000, 2000, and 3000 Gt C total emissions as well as from the linear regression slope (see Methods).The correlation coefficient (r, median, and 68 % range) and the median standard error as a percentage of the median regression slope ( σ ) are given for the linear fit of the peak response.

Table 3 .
Inverse values of TRE X and TRE X peak for the different climate variables.The values are determined at 1000 Gt C total and fossil-fuel CO 2 emissions, respectively, and are given for the 68th and 90th percentiles of the cumulative probability distribution.Under the assumption of linearity, the allowable emissions to meet a given target with 68 or 90 % probability can be estimated by multiplying the corresponding value in the table with the target value of the climate variable.C soil is omitted in this table due to its nonlinear response and large uncertainty.

Table A2 .
Same as Table2but for fossil-fuel CO 2 emissions only -i.e., the gross emissions from deforestation are not included when regressing the responses against cumulative CO 2 emissions (see Methods).