Risk of crop failure due to compound dry and hot extremes estimated with nested copulas

. The interaction between co-occurring drought and hot conditions is often particularly damaging to crop’s health and may cause crop failure. Climate change exacerbates such risks due to an increase in the intensity and frequency of dry and hot events in many land regions. Hence, here we model the trivariate dependence between spring maximum temperature and spring precipitation and wheat and barley yields over two province regions in Spain with nested copulas. Based on the full trivariate joint distribution, we (i) estimate the impact of compound hot and dry conditions on wheat and barley loss and (ii) estimate the additional impact due to compound hazards compared to individual hazards. We ﬁnd that crop loss increases when drought or heat stress is aggravated to form compound dry and hot conditions and that an increase in the severity of compound conditions leads to larger damage. For instance, compared to moderate drought only, moderate compound dry and hot conditions increase the likelihood of crop loss by 8 % to 11 %, while when starting with moderate heat, the increase is between 19 % to 29 % (depending on the ce-real and region). These ﬁndings suggest that the likelihood of crop loss is driven primarily by drought stress rather than by heat stress, suggesting that drought plays the dominant role in the compound event; that is, drought stress is not required to be as extreme as heat stress to cause similar damage. Furthermore, when compound dry and hot conditions aggravate stress from moderate to severe or extreme levels, crop loss probabilities increase 5 % to 6 % and 6 % to 8 %, respectively (depending on the cereal and region). Our results highlight the additional value of a trivariate approach for estimating the compounding effects of dry and hot extremes on crop failure risk. Therefore, this approach can effectively contribute to design management options and guide the decision-making process in agricultural practices

approach. In particular, we are interested in quantifying the additional risk associated with compound dry and hot conditions compared to only dry or only hot conditions. Wheat and barley are chosen as they are two of the major rainfed crops in the Iberian Peninsula (Peña-Gallardo et al., 2019;Vicente-Serrano et al., 2006). Moreover, we here build on prior work which has estimated wheat and barley losses in the same area, but related to a single hazard, namely droughts (Ribeiro et al., 2019a, b). 60 Using NACs, we estimate the conditional probabilities of crop loss under different severity levels of dry and hot conditions based on the full trivariate joint distribution. We focus on annual wheat and barley yield data at the sub-national scale, thus overcoming drawbacks related to assessing climate related crop risks at the national scale. Based on the proposed approach we (i) characterize the dependence structures between the dry and hot conditions and the crop yields; (ii) estimate the conditional probability of crop loss under different compound dry and hot severity levels; and (iii) evaluate how much the compound dry

Weather data
The vegetative cycle of the winter crops in Spain is mainly driven by precipitation and temperature: sowing occurs around 90 autumn (from September through November, SON), followed by the vegetative phase in winter (from December through January, DJF), reproductive phase (more photosynthetically active phase) in spring (from March through May, MAM) and crop harvest occurs in the early summer (around June). Therefore, monthly accumulated precipitation (P) and monthly maximum temperature (Tmax) were extracted from the Climate Research Unit (CRU) TS4.01 dataset (Harris et al., 2014) spanning the same time period.Given the importance of assessing crop's water and temperature requirements at different moments of the 95 vegetative cycle we conducted a correlation analysis between the annual yields and the 3-monthly means of P and 3-monthly means of Tmax during the whole growing season, as shown in Fig. 2. The identification of the moment of the vegetative cycle of the highest crop's water and temperature requirements was assessed based on the strongest statistically significant correlation value (denoted by filled circles in Fig. 2). Figure 2 suggests that the greatest influence of P and Tmax in crop yields is observed during spring (MAM in both regions and cereals) corresponding to the reproductive phase of plant development, 100 when vegetation is photosynthetically more active. Therefore, for the remaining analysis we focus on 3-monthly means of Tmax and 3-monthly means of P during spring (P MAM and Tmax MAM , respectively), which has also been identified in previous studies as a growth stage sensitive to the effects of water content and high temperatures (Ferrise et al., 2011;Del Moral et al., 2003;Iglesias and Quiroga, 2007;Ribeiro et al., 2019c). This selection of climate variables allows to maximize the dependence between climate conditions and yields as also shown by previous work based on the same data (Ribeiro et al., 2019c). 105 We considered three severity levels of dry and/or hot conditions: Moderate (+), Severe (++) and Extreme (+++) based on percentile thresholds as shown in Table 1. Besides these three severity levels, we further considered all combinations of 10 categories of severity levels of dry and hot conditions exceeding the 50 th to 5 th and 50 th to 95 th percentiles for P MAM and Tmax MAM , respectively. We further considered the 20 th percentile of the crop anomaly time-series as lower exceedance threshold for crop failure (Ben-Ari et al., 2016;Ribeiro et al., 2019a, b . Kendall correlation τ between three-monthly means of maximum temperature (Tmax, red) and precipitation (blue) with wheat (filled lines) and barley (dashed lines) yield anomalies, respectively. Correlations were computed during the crop growing period (September to June) over 1986-2016 for Region 1 (a) and 2 (b) (Figure 1). The letters on the x-axis denote the three-month averaging periods. Circles indicate statistically significant correlations at α = 0.05. The strongest correlation (positive or negative) is denoted by filled circles (PMAM and TmaxMAM).

Modelling trivariate distributions with nested Archimedean copulas
We model the trivariate relationship between temperature, precipitation and crop yields with nested copulas. Consider a vector of crop yield annual anomalies Y and the climate variables X 1 =P MAM and X 2 =Tmax MAM with marginal cumulative distribution functions (CDF) F Y , F X1 and F X2 , respectively. We aim to estimate and compare three conditional cumulative distribution functions (CDFs) with the scalars x * 1 and x * 2 corresponding to the dry and hot thresholds, respectively: suggests higher probabilities of crop loss (i.e., y = y * for a low y * ) than F Y |X1 or F Y |X2 . Furthermore, we can study the relative role of P MAM and Tmax MAM for crop loss with Equations 1 and 2.
To compare the additional impact of compound dry and hot conditions with the impacts caused by the individual hazards, Relative change from heat-stress = where 0.2 is the threshold of crop loss (y * ) corresponding to the 20 th percentile of the crop yield anomalies. These changes can be estimated for different severity levels of dry (x * 1 ) and hot (x * 2 ) conditions. Following the theorem of Sklar (1959) we can decompose a multivariate probability distribution into its marginals and a cop-130 ula C which describes the dependence structure between the margins. To estimate the multivariate distribution P (Y, X 2 , X 3 ), the respective copula C is fitted, which is then a joint CDF whose marginal distributions are uniform in the interval [0, 1] (Durante and Sempi, 2015;Nelsen, 2006;Salvadori and De Michele, 2007). Transforming the margins to uniform variables through their CDFs, that is, u 1 = F Y , u 2 = F X1 and u 3 = F X2 , the trivariate CDF can be written as (Sklar, 1959):

135
Within the copula families, AC are extensively used due to their flexibility and applicability to a variety of tail dependence structures, as well as their analytical tractability. AC can be written in terms of the respective generator function ϕ, which belongs to a parametric family ϕ θ dependent on the parameter θ, e.g. for the three-dimensional case: Due to the symmetry of bivariate AC, the above trivariate form can be expressed in terms of NAC, where two of the margins 140 are first coupled by their bivariate copula and then coupled with the third margin, via the same generator on each level but different parameters θ 12 and θ 1 , respectively, e.g.: C(u 1 , u 2 , u 3 ; θ 12 ; θ 1 ) = C 1 (C 12 (u 1 , u 2 ; θ 12 ), u 3 ; θ 1 ) Equation 8 can also be expressed in terms of the other possible pair copulas C 13 (u 1 , u 3 ; θ 13 ) and C 23 (u 2 , u 3 ; θ 23 ) that are coupled with u 2 and u 1 by C 2 and C 3 , with expressions C 2 (C 13 (u 1 , u 3 ; θ 13 ), u 2 ; θ 2 ) and C 3 (C 23 (u 2 , u 3 ; θ 23 ), u 1 ; θ 3 ), respec-145 tively. Like Equation 8, among each structure of NAC the same generator is required for each level but with potentially different parameters. Hence, both the optimal structure and respective parameters must be determined.
Most structures of NACs require decreasing parameters from the inner to the outer hierarchical level to attain a properly fitted copula. As for most ACs, the larger the parameter the stronger the dependence, this means that most structures of NAC require that the marginal copulas in the inner level should correspond to the pair with the strongest dependence, i.e., satisfying 150 θ 12 ≥ θ 1 in the case of Equation 8. This requirement applies to NAC with generators from the same family, providing a flexible estimation of the NAC, which allows for specifying the full distribution with at most d − 1 parameters, where d is the number of copula dimensions or marginal distributions (Okhrin and Ristig, 2014).
In our study we focus on a total of four Archimedean families that capture different kinds of joint dependence structures: Clayton, Gumbel, Frank and Joe. The Clayton, Gumbel and Joe copulas describe an asymmetrical tail behaviour, while the 155 Frank copula, in a similar way to the Gaussian copula, captures joint symmetric dependence. While Gumbel and Joe copulas can represent upper tail dependence, Clayton copulas can represent lower tail dependence. The estimation of the copula parameters is based on maximum likelihood based on the R package HAC (Okhrin and Ristig, 2014) .
The main steps of the trivariate approach used in this study can be summarized as follows (Okhrin and Ristig, 2014). First, the marginal distributions u 1 , u 2 and u 3 are estimated non-parametrically by simple ranking, using the empirical distribution 160 functions of the data through the pobs function in the R package copula (Ivan Kojadinovic and Jun Yan, 2010), a common approach for copula modelling. Afterwards, the fit of bivariate copula models is performed to every pair of variables to estimate C θ12 , C θ13 and C θ23 . For each pair, the copula selection is performed based on the Akaike's information criterion (AIC) and checking the goodness-of-fit by comparing the empirical copula based on the Cramer-von Mises distance (Sn). The bivariate copula with the strongest dependence, with the lowest AIC and the lowest Sn, is selected to define the structure of the NAC.

165
Afterwards, the marginal distribution that is not part of the selected bivariate copula is joined and the parameter of the upper level copula of the same family is estimated (Equation 8). As a final step, the estimated NAC with two parameters is compared with the same Archimedean family with one parameter (Equation 7) in terms of the AIC, which penalizes the number of estimated parameters.

Diagnostics and uncertainties in the estimation procedure 170
The visual diagnostics of the quality of the selected models are performed analogously to a QQ-plot by comparing the empirical estimate of the Kendall function (cumulative distribution of the copula) with the theoretical estimate of the Kendall function based on the selected parametric trivariate copulas (Okhrin and Ristig, 2014).
Best estimates of all conditional probabilities (i.e., Equations 1-5) are estimated by drawing N = 100, 000 samples from the fitted trivariate copula. Using the same single-parameter generator function on each level of NAC (but with a potentially 175 different value of θ) satisfies the required complete monotonicity of the ACs generators to construct NAC following Okhrin and Ristig (2014), which also implies that the possible pairs are positively dependent. Therefore, due to the negative dependence between Tmax MAM and both crop yields and P MAM , we inverted the margins of Tmax MAM for copula modelling (i.e. multiplication by -1). For more details on complete monotonicity of the ACs generators and NAC constructions see e.g. Górecki et al. (2017).

180
Uncertainties of the statistical modelling are estimated by repeated sampling (10,000 times) of the fitted model with sample sizes equal to the number of observations (i.e., N 1 in the case of Region 1 and N 2 in the case of Region 2). From these samples, 95% confidence intervals of Kendall's rank correlation are estimated and compared with the observed pairs (u 1 , u 2 ), (u 1 , u 3 ) and (u 2 , u 3 ). This validation step intends to verify if the generated pairs of copula-based samples preserve the level of dependence found in the observations. Furthermore, this approach is used to estimate uncertainties related to the conditional 185 probabilities (Equations 1-5).

Results
In both cereals and both regions the most dependent pair of variables corresponds to crop yields and P MAM , hence the pair of variables u 1 , u 2 defines the optimal NAC structure ( Figure 3). Results for all possible variable pairs and the respective bivariate copulas are shown in Table A.1. Once the bivariate copula C 12 (u 1 , u 2 ) of yields and P MAM is known, the NAC models are constructed ( Table 2). The Frank copula provides the best fit of C 12 (u 1 , u 2 ) ( Table A.1) for both cereals and both regions and thus the parameters of the trivariate nested copulas are all from the Frank family. Nevertheless, despite Frank being the best family to characterize the nested copulas, we also constructed NAC models with Gumbel, Clayton and Joe copulas for comparison, as well as trivariate Archimedean copulas with one parameter where we selected the best structure between one-parameter and two-parameter AC copulas via the 195 AIC (Table 2). In all but one case the NAC models with Frank copulas is the best model. The only exception is barley in Region 2 whose AIC of C θ (u 1 , u 2 , u 3 ) is slightly lower than the AIC of C θ1 (u 3 , C θ12 (u 1 , u 2 )) ( Table 2). This feature may suggest that a structure favouring the dependence between yield and precipitation (u 1 , u 2 ) may not be as relevant as in the other regions and yields due to a less dominant role of drought individually in this case. Nevertheless, in terms of Cramer-von Mises distance (Sn) the nested copula is closer to the empirical trivariate copula. For this reason, we modelled the trivariate joint distribution 200 based on nested Frank copulas for all cases. For all fitted models, the empirical cumulative distribution corresponds well to the theoretical cumulative distributions (Figure 4).
Bivariate dependencies as measured by Kendall's τ are captured well by the fitted models ( Figure 5 for wheat, Figure A.1 for barley). Among all possible pairs, the correlation between Tmax and yield is the weakest for both cereals (Table A.1), and likely for this reason it is the pair in Figure 5 and Figure  dependence is inside the 95% confidence level and the magnitude of correlations among the pairs is also reasonably preserved by the models i.e., such that τ u1,u2 > τ u2,u3 > τ u1,u3 .  is 68% and 71% for wheat and barley, respectively, in Region 2, in contrast to 62% and 63% in Region 1 (Figure 6e, purple bars). In addition, the differences in crop loss are higher between moderate (+dry+hot) and severe (++dry++hot) conditions compared to the differences between severe and extreme (+++dry+++hot) conditions. More precisely, when the compound dry 215 and hot conditions aggravate from moderate to severe stress, crop loss increases 5 to 6% and when the compound dry and hot conditions aggravate from moderate to extreme stress, crop loss increases 6 to 8% (depending on the cereal and region). For comparison, conditional cumulative probability distributions for single stressors compared with the compound stressors are shown in Figure A.2 for all three severity levels.
While Figure 6 illustrates the same severity levels for the different hazards, Figure 7 illustrates crop loss for a range of 220 different combinations of severity levels of dry and hot conditions (e.g., extreme dry conditions combined with moderate, severe and extreme hot conditions, and vice-versa) starting from the 50 th percentile of P MAM and Tmax MAM . When P MAM /Tmax MAM are below/above the median, the probability of crop loss is always higher than 40%. Similarly to Figure 6, the increase of crop loss with the severity of drought-and heat-stress is evident (Figure 7). The higher likelihood of crop loss in Region 2, particularly for barley, is also consistent with Figure 6. Moreover, the results indicate that droughts are typically associated 225 with higher probabilities of crop loss than heatwaves at the same severity level. This finding suggests that drought stress causes more damage to crop yields than heat stress, even for lower values of stress.

235
We have modelled the trivariate relationship between Tmax MAM and P MAM and wheat and barley yields in two regions in Spain using nested copulas. We found that the likelihood of crop loss increases with the severity of the compound dry and hot conditions and that compound drought and heat always increases the likelihood of crop loss. Moreover, our findings suggest  (1) and barley in Region 2 (d)) under moderate (+dry+hot, yellow), severe (++dry++hot, orange) and extreme (+++dry+++hot, purple) compound dry and hot conditions (see Table 1). (e) Conditional probabilities of non-exceeding the crop loss threshold (20 th percentile -vertical dashed line in a-d)) for each severity level of compound dry and hot conditions given by F Y |X 1 ,X 2 (0.2).
Uncertainty ranges illustrate the 95% confidence intervals.
that drought stress does not require to be so extreme as heat stress to cause the same adverse impact on crop yields. Hence drought is the more stressful driver of crop loss, when considering compound drought and heat. Although previous studies have discussed that maximum temperature might be the best predictor variable for yield variability 255 in most countries , our study highlights that in Spain crop loss of wheat and barley is more sensitive to dryness than to hot conditions. This finding agrees with the rainfed practices adopted in the wheat and barley cultivation in Spain. In fact, the nesting structure of the trivariate models adopted in the present study privileges the stronger dependency between yields and precipitation, rather than between yields and temperature or between precipitation and temperature ( Figure 3). Though irrigated crops typically produce higher yields, the pressure in water resources is already increasing the deficit between 260 water supplies and water demand in Spain (Rodríguez Díaz et al., 2007). Hence, understanding climate risks for rainfed crops is crucial to address the current water management challenges for agricultural practices in Mediterranean regions.
Higher probabilities of crop loss under drought and/or heat stress are generally expected in the southern region of Spain, in comparison to the northern region (Figures 6 and 7), in agreement with the higher temperatures and lower rainfall amounts observed in the south (Ribeiro et al., 2019a;IM and AEMET, 2011). In the case of wheat losses, this finding is in agreement 265 with previous work which focused on drought risks for the same crops and the same region (assessed based on remote sensing and hydro-meteorological drought indicators, Ribeiro et al. (2019b). However, Ribeiro et al. (2019b) identified a higher likelihood of barley loss with drought in the northern region. This discrepancy underlines importance of addressing the interaction between compound dry and hot conditions and the associated impacts on vegetation. For instance, compound dry and hot conditions have a larger impact on the carbon uptake potential than the sum of the individual impacts (Zscheischler et al., 2014), 270 highlighting the relevance of interactions between multiple stressors.
We found that for barley in Region 2, drought is the least dominant driver in comparison to the other cereals and regions.
Barley in Region 2 shows the highest difference between drought and compound dry and hot conditions, and the lowest difference between heat stress and compound conditions (Figure 8). This suggests that among both cereals and both regions, barley in Region 2 is the case where the compound and possibly interacting effects of drought and heat are most relevant. Note 275 that in this case also the CDF's between the dry and hot and dry or hot conditions are more differentiated from each other for the severe and extreme stress ( Figure A.2). This is consistent with a recent study at the province level, which recommended that crop production in Spain should focus more on wheat production given that most provinces displayed lower levels of wheat loss with drought in comparison to barley loss (Ribeiro et al., 2019a). This finding is also consistent with Figures 6 and 7.
The uncertainties associated to the parametric statistical model were assessed with a large number of sampled distributions 280 with the same sample size as the observations. In some of these distributions, drought or heat alone may cause more damage than concurrent drought and heat (lower uncertainty bound is below 0 in Figures 8 and A.3). This highlights the challenges of estimating the likelihood of rare events in two-or three-dimensional probability distribution with limited sample size (Serinaldi, 2013(Serinaldi, , 2016Zscheischler and Fischer, 2020). For the same reason, the wheat loss in Region 2 when P MAM is below the 5 th percentile in Figure 7 slightly decreases when the threshold of Tmax MAM change from the 10 th percentile to the 5 th percentile 285 (while an increase would be expected like in the other cases). These features are associated with the uncertainties in the estimation procedure, which may be particularly large for extreme values and it would be difficult to find a physical explanation for such a feature. Note that the uncertainties increase with the increasing severity of the compound dry and hot conditions ( Figure A.3) due the rapid decrease of available samples in the corners of the three-dimensional probability distribution.
increase yield loss. In the general sense, the biophysiological explanation for the combination of environmental drivers leading to stronger yield reductions relates with the crop's requirements of water and thermal conditions during the key phenological stage in the analysis. The selection of the climate variables during spring corresponds to the reproductive phase of the plants and when vegetation is photosynthetically more active, and the combined effect of water and heat stress during this period is critical for the crop's health leading to yield decrease. During this stage of formation of the grains the dry and hot extremes may 295 accelerate the maturation affecting the size, number and weight of the grains and consequently affecting the crop's harvests in quantity and quality (Balla et al., 2011;COPA-COGECA, 2003;Nicolas et al., 1984;Qaseem et al., 2019;Talukder et al., 2014).
Following the work by Okhrin and Ristig (2014), here we considered nesting copulas of the same family only, as more complex structures would be difficult to implement in general. Vine copulas might offer an alternative that is also appropriate 300 for higher dimensions (Bevacqua et al., 2017), when considering for instance more driver variables. Nevertheless, in comparison with previous studies based on bivariate models only, we argue that the statistical modelling based on NAC is a good compromise between complexity and the trivariate dimension.

Conclusions
The present study assessed how compound drought and heat enhance losses of wheat and barley in two major dryland areas 305 in Spain. We showed that nested Archimedian copulas can successfully model the trivariate joint distribution between spring maximum temperature, spring precipitation and yields to estimate conditional probabilities of crop loss under different severity levels of hot and dry conditions. The strongest dependence exists between spring precipitation and yields and is best captured by a Frank copula. Our results demonstrate that the probability of crop loss increases with the severity of compound dry and hot conditions. Furthermore, the likelihood of wheat and barley loss increases when drought or heat, respectively, aggravate 310 to compound dry and hot conditions in both regions. Overall, the likelihood of crop loss in the southern region is larger, in particularly for barley. For both cereals and regions, the likelihood of crop loss increase more with increasing drought stress than with heat stress, suggesting that drought plays a dominant role in the compound event. Our results illustrate the additional value of using trivariate copula modelling to estimate the compounding effects of dry and hot extremes on the risk of crop failure. In operational practice, this research will allow contributing to design supporting tools and provide guidance in the 315 decision-making process in agricultural practices to minimize crop losses related to climate hazards.
Code and data availability. The statistical analysis was performed under R software using the packages copula (Ivan Kojadinovic and Jun Yan, 2010) and HAC (Okhrin and Ristig, 2014). The R scripts are available at http://impecaf.rd.ciencias.ulisboa.pt/. The precipitation-and maximum temperature gridded values are publicly available from the Climate Research Unit (CRU) TS4.01 dataset by Harris et al. (2014).
The Spanish crop yield is published by the Spanish Agriculture, Fishing and Environment Ministry in their Statistical Yearbooks, which can 320 authors discussed the results, provided critical feedback and helped shape the research, analysis and contributed to the final manuscript.