Summarizing the state of the terrestrial biosphere in few dimensions

In times of global change, we must closely monitor the state of the planet in order to understand the full complexity of these changes. In fact, each of the Earth’s subsystems—i.e. the biosphere, atmosphere, hydrosphere, and cryosphere—can be analyzed from a multitude of data streams. However, since it is very hard to jointly interpret multiple monitoring data streams in parallel, one often aims for some summarizing indicator. Climate indices, for example, summarize the state of atmospheric circulation in a region. Although such approaches are also used in other fields of science, they are rarely used 5 to describe land surface dynamics. Here, we propose a robust method to create global indicators for the terrestrial biosphere using principal component analysis based on a high-dimensional set of relevant global data streams. The concept was tested using 12 explanatory variables representing the biophysical state of ecosystems and land-atmosphere water, energy, and carbon fluxes. We find that three indicators account for 82% of the variance of the state of the biosphere in space and time across the globe. While the first indicator summarizes productivity patterns, the second indicator summarizes variables representing water 10 and energy availability. The third indicator represents mostly changes in albedo. Anomalies in the indicators clearly identify extreme events, such as the Amazon droughts (2005 and 2010) and the Russian heatwave (2010), they also allow us to interpret the impacts of these events. The indicators can also be used to detect and quantify changes in seasonal dynamics. Here we report for instance increasing seasonal amplitudes of productivity in agricultural areas and arctic regions. We assume that this generic approach has great potential for the analysis of land-surface dynamics from observational or model data. 15


Introduction
Today, humanity faces negative global impacts of land use and land cover change (Song et al., 2018), global warming (IPCC, 2014), and associated losses of biodiversity (IPBES, 2019), to only mention the most prominent transformations. Over the past decades, new satellite missions (e.g. Berger et al., 2012;Schimel and Schneider, 2019), along with the continuous collection of ground based measurements (e.g. Wingate et al., 2015;Nasahara and Nagai, 2015;Baldocchi, 2020), and the integration of both 20 (Papale et al., 2015;Babst et al., 2017;Jung et al., 2019) have increased our capacity to monitor the Earth's surface enormously.
However, there are still large knowledge gaps limiting our capacity to monitor and understand the current transformations of the Earth system (Steffen et al., 2015;Rosenfeld et al., 2019;Yan et al., 2019;Piao et al., 2020).
Many of recent changes due to increasing anthropogenic activity are manifested in long-term transformations. One prominent example is "global greening" that has been attributed to fertilization effects, temperature increases, and land-use intensification 25 (de Jong et al., 2011;Zhu et al., 2016;Piao et al., 2019). It is also known that phenological patterns change in the wake of climate change (Schwartz, 1998;Parmesan, 2006). However, these phenological patterns vary regionally. In "cold" ecosystems one may find decreased seasonal amplitudes on primary production due to warmer winters (Stine et al., 2009). Elsewhere, seasonal amplitude may increase e.g. in agricultural areas due to the so called "green revolution" (Zeng et al., 2014;Chen et al., 2019). Another change in terrestrial land-surface dynamics is induced by increasing frequencies and magnitudes of 30 extreme events (Barriopedro et al., 2011;Reichstein et al., 2013). The consequences for land-ecosystems have yet to be fully understood (Flach et al., 2018;Sippel et al., 2018), and require novel detection and attribution methods tailored to the problem (Flach et al., 2017;Mahecha et al., 2007a). While extreme events are typically only temporary deviations from a normal trajectory, ecosystems may changes their qualitative state permanently. Such shifts or tipping points can be induced by changing environmental conditions or direct human influence, and pose yet another problem that needs to be considered (Lenton et al.,35 2008). The question we address here is, how to uncover and summarize changes in land-surface dynamics in a consistent framework. The idea is to simultaneously take advantage of a large array of global data streams, without addressing each observed phenomenon in a specific domain only. We seek to develop an integrated approach to uncover changes in the landsurface dynamics based on a very generic approach.
The problem of identifying patterns of change in high dimensional data streams is not new. Extracting the dominant features 40 from high-dimensional observations is a well-known problem in many disciplines. One approach is to manually define indicators that are know to represent important properties such as the "Bowen Ratio" (Bowen, 1926), another one consists in using machine learning to extract unique, and ideally independent features from the data. In the climate sciences, for instance, it is common to summarize atmospheric states using Empirical Orthogonal Functions (EOF), also known as Principal Component Analysis (PCA; Pearson, 1901). The rationale is that dimensionality reduction only retains the main data features, which makes 45 them easier accessible for analysis. One of the most prominent examples is the description of the El Niño Southern Oscillation (ENSO) dynamics in the multivariate ENSO index (MEI; Wolter and Timlin, 2011), an indicator describing the state of the regional circulation patterns at a certain point in time. The MEI is a very successful index that can be easily interpreted and used in a variety of ways, most basically it provides a measure for the intensity and duration of the different quasi-cyclic ENSO events but it can also be associated with its characteristic impacts: E.g. seasonal warming, changes in seasonal temperatures 50 and overall dryness in the Pacific Northwest of the United States (Abatzoglou et al., 2014), drought related fires in the Brazilian Amazon (Aragão et al., 2018), and crop yield anomalies (Najafi et al., 2019).
In plant ecology, indicators based on dimensionality reduction methods are used to describe changes to species assemblages along unknown gradients (Legendre and Legendre, 1998;Mahecha et al., 2007a). The emerging gradients can be interpreted using additional environmental constraints, or based on internal plant community dynamics (van der Maaten et al., 2012). It 55 is also common to compress satellite based Earth Observations via dimensionality reduction to get a notion of the underlying dynamics of terrestrial ecosystems. For instance, Ivits et al. (2014) showed that one can understand the impacts of droughts and heatwaves based on a compressed view of the relevant vegetation indices. In general, dimensionality reduction is the method of choice to compress high-dimensional observations in a few (ideally) independent components with little loss of information (Van Der Maaten et al., 2009;Kraemer et al., 2018).
Understanding changes in land-atmosphere interactions is a complex problem, as all aforementioned patterns of change may occur and interact: Land cover change may alter biophysical properties of the land surface such as albedo with consequences for the energy balance. Long-term trends in temperature, water availability, or fertilization may impact productivity patterns and biogeochemical processes. In fact, these land surface dynamics have implications on multiple dimensions and require monitoring of biophysical state variables such as leaf area index, albedo, etc., as well as associated land-atmosphere fluxes of carbon, water, and energy.
Here, we aim to summarize these high-dimensional surface dynamics and make them accessible to subsequent interpretations and similar analyses as the original variables, such as mean seasonal cycles (MSC), anomalies, trend analyses, breakpoint analyses, and the characterization of ecosystems. Specifically, we seek a set of uncorrelated, yet comprehensive, state indicators. We want to have a set of very few indicators that represent the most dominant features of the above described temporal 70 ecosystem dynamics. These indicators should also be uncorrelated, so that one can study the system state by looking and interpreting each indicator independently. The approach should also give an idea of the general complexity contained in the available data streams. If more than a single indicator is required to describe land surface dynamics accurately, then these indicators shall describe very different aspects. While one indicator may describe global patterns of change, others could be only relevant in certain regions, for certain types of ecosystems, or for specific types of impacts. The indicators shall have a number 75 of desirable properties: (1) Representing the overall state of observations comprising the system in space and time.
(2) Carrying sufficient information to allow for reconstructing the original observations faithfully from these indicators. (3) Being of much lower dimensionality than the number of observed variables. (4) Allowing intuitive interpretations.
In this work, we first introduce a method to create such indicators, then we apply the method to a global set of variables describing the biosphere. Finally, to prove the effectiveness of the method, we interpret the resulting set of indicators and 80 explore the information contained in the indicators by analyzing them in different ways and relating them to well known phenomena. Table 1 gives an overview of the data streams used in this analysis (for a more detailed description see Appendix A). For an 85 effective joint analysis of more than a single variable, the variables have to be harmonized and brought to a single grid in space and time. The Earth System Data Lab (ESDL; www.earthsystemdatalab.net; Mahecha et al., 2019) curates a comprehensive set of data streams to describe multiple facets of the terrestrial biosphere and associated climate system. The data streams are harmonized as analysis ready data on a common spatiotemporal grid (equirectangular 0.25°grid in space and 8 days in time, [2001][2002][2003][2004][2005][2006][2007][2008][2009][2010][2011], forming a 4d hypercube, which we call a "data cube". The ESDL not only curates Earth system data, but also 90 comes with a toolbox to analyze this data efficiently. For this study we chose all variables available in the ESDL v1.0 (the most recent version available at the time of analysis), divided the available variable into meteorological and biospheric variables and discarded the atmospheric variables. We also discarded variables with distributions that are badly suited for a linear PCA (e.g. The fundamental idea of PCA is to project the data to a space of lower dimensionality that preseserves the covariance structure of the data. Hence, the fundament of a PCA is the computation of a covariance matrix, Q. When all variables are centered to global zero mean and normalized to unit variance, the covariance matrix can be in principle estimated as

Data
However, in our case the data cube lies on a regular 0.25°grid and estimating Q as above would lead to overestimating the influence of dynamics in relatively small pixels of high latitudes compared to lower latitudes where each data point represent larger areas. Hence, one needs a weighted approach to calculate the covariance matrix, where w i = cos(lat i ) and lat i is the latitude of observation i, w = n i=1 w i is the total weight, and n the total number of 120 observations. Equation (2) has the additional property that it can be computed sequentially on very big datasets, such as our Earth System Data Cube, by an consecutively adding observations to an initial estimate.
Note that the actual calculation of the covariance matrix is even more complicated, because summing up many floatingpoint numbers one-by-one can lead to large inaccuracies due to precision issues of floating-point numbers and instabilities of the naive algorithm (Higham, 1993; the same hols for the implementations of the sum function in most software used for 125 numerical computing). Here, we used the Julia package WeightedOnlineStats.jl 1 (implemented by the first author of this paper), which uses numerically stable algorithms for summation, higher precision numbers, and a map-reduce scheme that further minimizes floating point errors.
Based on this weighted and numerically stable covariance matrix, the PCA can be computed used an eigendecomposition of the covariance matrix, In this case, the covariance matrix Q is equal to the correlation matrix because we standardized the variables to unit variance.
Λ is a diagonal matrix with the the eigenvalues, λ 1 , . . . , λ d , in the diagonal in decreasing order and V ∈ R d×d , the matrix with the corresponding eigenvectors in columns. V can project the new incoming input data x i (centered and standardized) onto the retained PCs, where y i is the projection of the observation x i onto the d PCs.
The canonical measure of the quality of a PCA is the fraction of explained variance by each component, σ 2 i , calculated as To get a more complete measure of the accuracy of the PCA, we used the "reconstruction error" in addition to the fraction of explained variance. PCA allows a simple projection of an observation onto the first p PCs and a consecutive reconstruction of the observations from this p-dimensional projection. This is achieved by where Y p is the projection onto the first p PCs, V p the matrix with columns consisting of the eigenvectors belonging to the p largest eigenvalues, and X p the observations reconstructed from the first p PCs.

145
The reconstruction error, e i , was calculated for every point, x i in the space-time domain based on the reconstructions from the first p principal components: As this error is explicit in space, time and variable, it allows for disentangling the contribution of each of these domains to the total error. This can be achieved by estimating e.g. the (weighed) mean square error, This approach can give a better insight into the compositions of the error than a single global error estimate based on the eigenvalues.

Pixel-wise analyses of time series
The principal components estimated as described above are ideally low-dimensional representations of the land-surface dy-155 namics that require further interpretation. These components have a temporal dynamics that needs to be understood in detail.
One crucial question is how the dynamics of a system of interest deviates from it's expected behaviour at some point in time.
A classical approach is inspecting the "anomalies" of a time series, i.e. the deviatiosn from the mean seasonal cycle at a certain day of year.
Another key description of such system dynamics are trends. We estimated trends of the indicators as well as of their seasonal 160 amplitude using the Theil-Sen estimator. The advantage of the Theil-Sen estimator is its robustness to up to 29.3% of outliers (Theil, 1950;Sen, 1968), while ordinary least squares regression is highly sensitive to such values. The calculation of the estimator consists simply on computing the median of the slopes spanned by all possible pairs of points where z i is the value of the response variable at time step i and t i the time at time step i. In our experiments, we computed the To test the slopes for significance, we used the Mann-Kendall statistics (Mann, 1945;Kendall, 1970) and adjusted the resulting p-values with the Benjamini-Hochberg method to control for the false discovery rate (Benjamini and Hochberg, 1995). Slopes with an adjusted p < 0.05 were deemed significant.
To identify disruptions in trajectories, breakpoint detection provides a good framework for analysis. For the estimation of breakpoints, the generalized fluctuation test framework (Kuan and Hornik, 1995). The framework uses recursive residuals (Brown et al., 1975) such that a breakpoint is identified when the mean of the recursive residuals deviates from zero. We used the implementation in Zeileis et al. (2002). For practical reasons, here we only focus on the largest breakpoint. Counterclockwise polygon, has a positive area. (c) Chaotic polygon, has a very low area. (d) Polygon with a single intersection, has both a clockwise and counterclockwise portion. The clockwise portion is slightly larger than the counterclockwise portion, therefore the area is slightly negative.
A very different type of dynamic is considering bivariate relations. In the context of oscillating signals it is particularly 175 instructive to quantify their degree of phase shift and direction-even if booth signals are not linearily related. A "hysteresis" would be such a pattern describing that the pathways A → B and B → A between states A and B differ (Beisner et al., 2003).
We estimated hysteresis by calculating the area inside the polygon formed by the mean seasonal cycle of the combinations of two components (PC 1 to PC 3 ).

180
where n = 46, the number of time steps in a year, x i and y i the mean seasonal cycle of two of PC 1 to PC 3 at time step i, respectively. The polygon is circular, i.e. the indices wrap around the edges of the polygon so that x 0 = x n and x n+1 = x 1 .
This formula gives the actual area inside the polygon only if it is non-self-intersecting and the vertices run counterclockwise.
If the vertices run clockwise, the area is negative. If the polygon is shaped like an 8, the clockwise and counterclockwise parts will cancel each other (partially) out. Trajectories that have a larger amplitudes will also tend to have larger areas as illustrated 185 in fig. 1.

Results and Discussion
In the following we first briefly present and discuss the quality of the global dimensionality reduction (Sect. 3.1), we then interpret the individual components from an ecological point of view (Sect. 3.2), before we turn to summarize the global dynamics we can uncover in the low-dimensional space (Sect. 3.3) and characterize the contained seasonal dynamics (Sect. 3.4), 190 including spatial patterns of hysteresis (Sect. 3.5). We then describe global anomalies of the identified trajectories (Sect. 3.6), and discuss the identified anomalies in depth based on local phenomena (Sect. 3.7). Finally, we turn to global trends and their breakpoints (Sect. 3.7).  components explain 73% of the variance from the 12 variables; additional components contribute relatively little additional variance (PC 3 contributes 9%, all subsequent PCs less than 7%) each. This results in a "knee" at component 3, which suggests that two indicators are sufficient to capture the major global dynamics of the terrestrial land surface, but we will also consider the third components in the following analyses (Cattell, 1966).

Quality of the PCA
We estimated the reconstruction error sequentially up to the first three principal components (

Interpretation of the PCA
The first PC summarizes variables that are closely related to to primary production (GPP, LE, NEE, fAPAR), and therefore highly interrelated (see fig. 2b). The energy for photosynthesis comes from solar radiation, and fAPAR is an indicator for 205 the fraction of light used for photosynthesis. The available photosynthetic radiation is used by photosynthesis to fix CO 2 and producing sugars that maintain the metabolism of the plant. The total uptake of CO 2 is reflected in GPP, whiche is also closely related to water consumption. The actual uplift of water within the plant is not only essential to enable photosynthesis, but also drives the transport of nutrients from the roots and is ultimately reflected in transpiration-together with evaporation from soil surfaces one can observe the integrated latent energy needed for the phase transition (LE). However, ecosystems also 210 respire; CO 2 is produced by plants in energy consuming processes as well as by the decomposition of dead organic materials via soil microbes and other heterotrophic organisms. This total respiration can be observed as terrestrial ecosystem respiration (TER). The difference between GPP and TER is the net ecosystem exchange (NEE) rate of CO 2 between ecosystems and the atmosphere (Chapin et al., 2006), and both variables are also well represented by the first dimension.
The second component represents variables related to the surface hydrology of ecosystems (see fig. 2b). Surface moisture, 215 evaporative stress, root-zone soil moisture, and sensible heat, are all essential indicators for the state of plant available water.
While surface moisture is a rather direct measure, evaporative stress is a modeled quantity summarizing the level of plant stress: A value of zero means that there is no water available for transpiration, while a value of one means that transpiration equals the potential transpiration (Martens et al., 2017). Root-zone soil moisture is the moisture content of the root zone in the soil, the moisture directly available for root uptake. If this quantity is below the wilting point, there is no water available for uptake 220 by the plants. Sensible heat is the exchange of energy by a change of temperature, if there is enough water available, then most of the surface heat will be lost due to evaporation (latent heat), with decreasing water availability more of the surface heat will be lost due to sensible heat, making this also an indicator of dryness.
We observe that the third component is most strongly related to albedo ( fig. 2b). Albedo describes the overall reflectiveness of a surface. Light surfaces, such as snow and sand, reflect most of the incoming radiation, while surfaces that have a high 225 liquid water content or active vegetation absorb most of the incoming radiation. Local changes to albedo can be caused by a large array of reasons, e.g. snow fall, vegetation greening/browning, or land use change.
The relation of PC 3 to productivity and hydrology is opposite to what we would expect from an albedo axis. Because vegetation uses radiation as an energy source, albedo is negatively correlated with the productivity of vegetation, hence the negative correlation of albedo with PC 1 . Given that water also absorbs radiation we can observe a negative correlation of 230 albedo with PC 2 (see fig. 2b). We observe that PC 1 and PC 2 are positively correlated with PC 3 on the positive portion of their axes (see fig. 4d and f), which means counterintuitively that the index representing albedo is positively correlated with primary productivity and moisture content. Finally we can observe that PC 1 and PC 2 have a much higher reconstruction error in snow covered regions, which is strongly improved by adding PC 3 (see fig. 3f). Therefore the third component should be regarded mostly as binary variable that introduces snow cover, as the other information that is usually associated with albedo is already 235 contained in the first two components.

Distribution of points in PCA space
The bivariate distribution of the first two principal components form a "triangle" (gray background in fig. 4a). At the high end onf PC 1 we find one point of the triangle in which ecosystems have a high high state of primary productivity ( fig. 4a). This pattern reflects the two essential global limitations of GPP in terrestrial ecosystems (Anav et al., 2015). 250 To further interpret the "triangle" we analyze how the Bowen ratio embeds into the space of the first two dimensions. Energy fluxes from the surface into the atmosphere can either represent a radiative transfer (sensible heat) or evaporation (latent heat).
Their ratio is the "Bowen ratio", B = H LE , (Bowen, 1926; see also fig. 5). When water is available most of the available energy will be dissipated by evaporation, B < 1, resulting in a high latent heat flux. Otherwise, the transfer by latent heat will be low and most of the incoming energy has to be dissipated via sensible heat, B > 1. In higher latitudes, there is relatively limited 255 incoming radiation and temperatures are low, therefore there is not much energy to be dissipated and both heat fluxes are low.
A high sensible heat flux is an indicator for water limitation.

Seasonal Dynamics
The leading principal components represent most of the variability of the space spanned by the observed variables, summarizing the state of a spatiotemporal pixel efficiently. This means that the PCs track the state of a local ecosystem over time (     as low GPP) and many dry areas such as deserts show similar characteristics to areas with a pronounced dry season, e.g. the Mediterranean.

270
Depending on their position on Earth, ecosystem states can shift from limitation to growth during the year ( fig. 4b, e.g. Forkel et al., 2015). For example, the orange trajectory in fig. 4, an area close to Moscow, shifts from a temperature limited state in winter to a state of very high productivity during summer. Other ecosystems remain in a single limitation state with only slight shifts, such as the red trajectory in fig. 4. In the corner of maximum productivity of the distribution, we find tropical forests characterized by a very low seasonality. We also observe that very different ecosystems can have very similar characteristics 275 during their peak growing season, e.g. green (located in north east India), blue (north west Germany), and orange (located close to Moscow) trajectories have very similar characteristics during peak growing season compared to the red trajectory.
The third components shows a different picture. Due to a consistent winter snow cover in higher latitudes the albedo is much wave (Schwartz, 1994(Schwartz, , 1998 because the first dimension integrates over all variables that correlate with plant productivity.  Although the principal components are globally uncorrelated, they covary locally (see fig. D1). Ecosystems with a dry season 295 have a negative covariance between PC 1 and PC 2 while ecosystems that cease productivity in winter have a positive covariance.
Cold arid steppes and boreal climates show a negative covariance between the PC 1 and PC 3 , while other ecosystems that have a strong seasonal cycle show a positive correlation, many tropical ecosystems don't show a large covariance. A very similar picture paints the covariance between PC 2 and PC 3 , boreal and and steppe ecosystems show a negative covariance, while most other ecosystems show a more or less pronounced positive covariance, again depending on the strength of the seasonality.

300
Observing the mean seasonal cycle of the principal components gives us a tool to characterize ecosystems and may also serve as a basis for further analysis, such as a global comparison of ecosystems (Metzger et al., 2013;Mahecha et al., 2017).

Hysteresis
The alternative return path between ecosystem states forming the hysteresis loops arise from the ecosystem tracking seasonal changes in the environmental condition, e.g. summer-winter or dry-rainy seasons ( fig. 4b). Hysteresis is a common occurrence Here we look at the mean seasonal cycles of pairs of indicators and the area they enclose.
The orange trajectory (area close to Moscow) in fig. 4b shows that the paths between maximum and minimum productivity can be very different, in contrast to the blue trajectory located in the north west of Germany which also has a very pronounced yearly cycle but shows no such effect. Fig. 4 also indicates that the area inside the means seasonal cycles of PC 1 -PC 2 and 315 PC 1 -PC 3 show important characteristics while hysteresis in PC 2 -PC 3 is a much less pronounced feature, i.e. we can only see a pronounced area inside the yellow curve in fig. 4f.
The trajectories that show a more pronounced anticlockwise hysteresis effect in PC 1 -PC 2 ( fig. 7a) are areas with a warm and temperate climate and partially those that have a snow climate with warm summers, i.e. areas that have pronounced growing, dry, and wet seasons and therefore shift their limitations more strongly during the year, i.e. the moisture reserves deplete 320 during growing season and therefore the return path has higher values on the second principal component (the climatic zones are taken from the Köppen-Geiger classification; Kottek et al., 2006). We can also see that areas with dry winters tend to have a clockwise hysteresis effect, e.g. many areas in East Asia, due to the humid summers there is no increasing water limitation during the summer months which causes a decrease on PC 2 instead on an increase. Other areas with clockwise hysteresis can be found in winter dry areas in the Andes and the winter dry areas north and south of the African rainforests. Tropical rainforests 325 do not show any hysteresis effect due to their low seasonality. In general we can say that the area inside the mean seasonal cycle trajectory of PC 1 -PC 2 depends mostly on water availability in the growing and non-growing season, i.e. the contrast of wet summer and dry winter vs. dry summer and wet winter.
The hysteresis effect on PC 1 -PC 3 ( fig. 7b) shows a pronounced counterclockwise MSC trajectory mostly in warm temperate climates with dry summers, while it shows a clockwise MSC trajectory in most other areas, again tropical rainforests are an 330 exception due to their low seasonality. The most pronounced clockwise MSC trajectories are are found in tundra climates in arctic latitudes, where there is a consistent winter snow cover and a very short growing period. The lower end of PC 3 is positively correlated with ecosystem productivity, but there are still enough differences to PC 1 to distinguish the start and the end of the growing season and show different trajectories. A counterclockwise rotation can be found in summer dry areas, such as the Mediterranean and and California, but also some more more humid areas, such as the south east United States and the 335 south east coast of Australia. In these areas we can find a decrease on PC 3 in during the non-growing phase which probably corresponds to a drying out of the vegetation and soils.
The hysteresis effect on PC 2 -PC 3 ( fig. 7c) mostly depends on latitude, there is a large counterclockwise effect in the very northern parts, due to the large amplitude of PC 3 , the amplitude gets smaller further south until the rotation reverses in winter dry areas at the the northern and southern extremes of the tropics and disappears on the equatorial humid rain forests. 340 We can see that the hysteresis of pairs of indicators represents large scale properties of climatic zones. Not only the area enclosed gives interesting information, but also the direction of the rotation. Hysteresis can give information on the seasonal availability of water, seasonal dry periods or snowfall. With the method presented here, we can not observe intersecting trajectories, which would probably give even more interesting insights (e.g. the green trajectory in fig. 4b).

345
The deviation of the trajectories from their mean seasonal cycle should reveal anomalies and extreme events. These anomalies have a directional component which makes them interpretable the same way as the original PCs, therefore one can infer the state of the ecosystem during an anomaly. For instance the well-known Russian heatwave in summer 2010 (Flach et al., 2018) appears in fig. 8 as a dark brown spot in the southern part of the affected area, indicating lower productivity, and as a thin green line in the northern parts, indicating an increased productivity. This confirms earlier reports in which only the southern 350 agricultural ecosystems were negatively affected by the heatwave, while the northern predominantly forest ecosystems rather benefited from the heatwave in terms of primary productivity (Flach et al., 2018).
Another example of an extreme event that we find in the PCs is the very wet November rainy season of 2006 in the Horn of Africa after a very dry rainy season in the previous year. This event was reported to bring heavy rainfall and flooding events which caused an emergency for the local population but also an increased ecosystem productivity (Nicholson, 2014). The 355 rainfall event appears as green and blue spots in fig. 8b and c, preceded by the drought events which appear as red and brown spots. not show these strong events, because the observable response of the ecosystem was buffered due to the large water storage 360 capacity in the central Amazon basin.
Another extreme event that can be seen is the extreme snow and cold event affecting Central and South China in January 2008, causing the temporary displacement of 1.7 million people and economic losses of approximately US $ 21 billion (Hao et al., 2011). This event shows up clearly on PC 2 and PC 3 as cold and light anomalies respectively (see fig. 8k and f).

365
Observing single trajectories can give insight into past events that happened at a certain place, such as extreme events or permanent changes in ecosystems. The creation of trajectories is an old method used by ecologists, mostly on species assembly data of local communities, to observe how the composition changes over time (e.g. Legendre et al., 1984;Ardisson et al., 1990).
In this context, we observe how the states of the ecosystems inside the grid-cell shift over time, which comprises a much larger area than a local community but is probably also less sensitive to very localized impacts than a community level analysis. One 370 of the main differences of the method applied here to the classical ecological indicators is that the trajectories observed here are embedded into the space spanned by a single global PCA and therefore we can compare a much broader range of ecosystems directly.
The seasonal amplitude of the trajectory in the Brazilian Amazon increases due to deforestation and crop growth cycles. heatwave was stronger than the 2003 heatwave but the strongest parts of the 2010 heatwave were in eastern Europe (cf., fig. 8), while the center of the 2003 heatwave was located in France.
As we have seen here, observing single trajectories in reduced space can give us important insights into ecosystem states and changes that occur. While the trajectories can point us towards abnormal events, they can only be the starting points for 400 deeper analysis to understand the details of such state changes.

Trends in Trajectories
The accumulation of CO 2 in the atmosphere should cause an increase in global productivity of plants due to CO 2 fertilization, while larger and more frequent droughts and other extremes may counteract this trend. Satellite observations and models have shown that during the last decades the world's ecosystems have greened up during growing seasons. This is explained by CO 2 fertilization, nitrogen deposition, climate change and land cover change (Zhu et al., 2016;Huang et al., 2018;Anav et al., 2015). Tropical forests showed especially strong greening trends during growing season.
General patterns of trends that can be observed are a positive trend (higher productivity) on the first principal component in many arctic regions, many of these regions also show a wetness trend, with the notable exception of the western parts of Alaska which have become dryer, this is important, because wildfires play a major role in these ecosystems (Jolly et al., 2015;410 Foster et al., 2019), these changes are also accompanied by a decrease on PC 3 due to a loss in snow cover. A large scale dryness trend can also be observed across large parts of western Russia. Increasing productivity can also be observed on large parts of the the Indian subcontinent and eastern Australia. Negative trends in the first component can also be observed: they are generally smaller and appear in regions around the Amazon and the Congo basin, but also in parts of western Australia. The main difference from previous analyses on the observations presented here is that e.g. Zhu et al. (2016) looked only at trends 415 during the growing season while this analysis uses the entire time series to calculate the slope.
In the Amazon basin, we find a dryness trend accompanied by a decrease in productivity and a slight increase in PC 3 ; In the Congo basin, we find a wetness trend and an increasing productivity in the northern parts, while the southern part and woodland south of the Congo basin show a strong dryness trend with decreased productivity. This is different to the findings of In eastern Australia we find a strong wetness and greenness trend which is due to Australia having a "milennium drought" since the mid nineties with a peak in 2002 (Nicholls, 2004;Horridge et al., 2005) and extreme floods in 2010-2011 (Hendon 425 et al., 2014).
Large parts of the Indian subcontinent shows a trend towards higher productivity and an overall wetter climate. The greening trend in India happens mostly over irrigated cropland. However browning trends over natural vegetation have been observed but do not emerge in our analysis (Sarmah et al., 2018). A very notable greening and wetness trend can be observed in Myanmar due to an increase in intense rainfall events and storms, although the central part experienced some strong droughts at the same 430 time (Rao et al., 2013). In Myanmar we also find one of the strongest trends in PC 3 outside of the Artic.
In large parts of the Arctic, a trend towards higher productivity can be observed, vegetation models attribute this general increase in productivity to CO 2 fertilization and climate change. The changes also cause changes to the characteristics of the seasonal cycles (Forkel et al., 2016). Stine et al. (2009) found a decreased seasonal amplitude of surface temperature over norther latitudes due to winter warming.

435
The seasonal amplitude of atmospheric CO 2 concentrations has been increasing due to climate change causing longer growing seasons and changing vegetation cover in northern ecosystems (Forkel et al., 2016;Graven et al., 2013;Keeling et al., 1996). Therefore we checked for trends in the seasonal amplitude, but because each time series only consists of 11 values (one amplitude per year), after adjusting the p-values for false discovery rate, we could not find a significant slope. However, there were many significant slopes with the unadjusted p-values, see the appendix, fig. E1.

440
Another way to detect changes to the biosphere consists in the detection of breakpoints, which has been applied successfully to detect changes in global NDVI time series (de Jong et al., 2011;Forkel et al., 2013), or generally to detect changes in time series (Verbesselt et al., 2010). A proof of concept analysis can be found in fig. F1, we hope that applying this method to indicators instead of variables can detect a wider range of breakpoints analyzing a single time series.
3.9 Relations to other PCA-type analyses 445 One of the most popular applications of PCA in meteorology are EOFs, which applies PCA typically on a single variables, i.e. on a data set with the dimensions lat × lon × time, although EOFs can be calculated from multiple variables. EOFs can be calculated in S-mode and R-mode. If we matricize our data cube so that we have time in rows and lat × lon × variables in columns, then S-mode PCA works on the correlation matrix of the combined variable and space dimension. In T -mode, the PCA works on the correlation matrix formed the time dimension (Wilks, 2011). The PCA presented here works slightly 450 different: (1) We did a different matricization (lat × lon × time in rows and variables in columns) and then (2) the PCA works on the correlation matrix formed by the variables, therefore in this framework we could call this a V -mode PCA.
Ecological analyses use PCA usually with matrices of the shape object × descriptors, when calculating the PCA on the correlation matrix formed by the objects, then it is called a Q-mode analysis, when the PCA is applied on the correlation matrix formed by the variables, then it is called an R-mode analysis (Legendre and Legendre, 1998). The PCA done in this 455 study is closest to an R-mode analysis, in the present case the descriptors are the various data streams and the objects are the spatiotemporal pixels.
Using PCA as a method for dimensionality reduction means that we are assuming linear relations among features. A nonlinear method could possibly be more efficient in reducing the number of variables, but would also have significant disadvantages. In particular: nonlinear methods typically require tuning specific parameters, objective criteria are often lacking, a proper 460 weighting of observations is difficult, the methods are often not reversible, and it is harder to interpret the resulting indicators due to their nonlinear nature (Kraemer et al., 2018). The salient feature of PCA is that an inverse projection is well defined and allows for a deeper inspection of the errors, which is not the case for nonlinear methods which learn a highly flexible transformation that is hard to invert. Therefore interpretability of the transform in meaningful physical units in the input space is often not possible. In the machine learning community, this problem is known as the "pre-imaging problem" (Mika et al., 465 1999;Arenas-Garcia et al., 2013) and is a matter of current research.

Conclusions
To monitor the complexity of the changes occurring in times of an increasing human impact on the environment, we used PCA to construct indicators from a large number of data streams that track ecosystem state in space and time on a global scale.
We showed that a large part of the variability of the terrestrial biosphere can be summarized using two indicators. The first emerging indicator represents carbon exchange, while the second indicator shows the availability of water in the ecosystem, the third indicator represents mostly a binary variable that indicates the presence of snow cover. The distribution in the space of the first two principal components reflects the general limitations of ecosystem productivity. Ecosystem production can either be limited by water or energy.
The first three indicators can detect many well-known phenomena without analyzing variables separately due to their com- Using multivariate indicators we gain a high level overview of phenomena in ecosystems and the method therefore provides an interesting tool for analyses where it is required to capture a wide range of phenomena which are not necessarily known a priori. Future research should consider nonlinearities, adding data streams that represent different aspects (e.g. biodiversity, and habitat quality), and work to include different subsystems, such as the atmosphere or the anthroposphere.

485
Code and data availability. The data are available and can be processed at https://www.earthsystemdatalab.net/index.php/interact/data-lab/, last accessed 28 June 2019 Appendix A: Description of variables Variables used describing the biosphere can be found in tab. 1, here we provide a more complete description of all variables: Black Sky Albedo is the reflected fraction of total incoming radiation under direct hermispherical reflectance, i.e. direct 490 illumination (Muller et al., 2011). This dataset is derived from the SPOT4-VEGETATION, SPOT5-VEGETATION2, and the

MERIS satellite sensors.
White Sky Albedo is the reflected fraction of total incoming radiation under bihemispherical reflectance, i.e. diffuse illumination (Muller et al., 2011). Together with black sky albedo it can be used to estimate the albedo under different illumination conditions. This dataset is derived from the SPOT4-VEGETATION, SPOT5-VEGETATION2, and the MERIS satellite sensors.

495
Evaporation [mm/day] is the amount of water evaporated per day, depends on the amount of available water and energy.
This dataset is based on the GLEAMv3 model (Martens et al., 2017), using satellite data from ESA CCI and SMOS to derive a number of variables.
Evaporative Stress modeled water stress for plants, zero means that the vegetation has no water available for transpiration and one means that transpiration equals potential transpiration. This dataset is based on the GLEAMv3 model (Martens et al.,500 2017), using satellite data from ESA CCI and SMOS to derive a number of variables.
fAPAR the fraction of absorbed photosynthetically active radiation, a proxy for plant productivity (Disney et al., 2016). This dataset is based on the GlobAlbedo dataset (http://globalbedo.org) and the MODIS fAPAR and LAI products.
Gross Primary Productivity (GPP) [gCm −2 day −1 ] the total amount of carbon fixed by photosynthesis (Tramontana et al., 2016). This dataset is derived from upscaling eddy covariance tower observations to a global scale using machine learning 505 methods.
Terrestrial Ecosystem Respiration (TER) [gCm −2 day −1 ] the total amount of carbon respired by the ecosystem, includes autotrophic and heterotropic respiration (Tramontana et al., 2016). This dataset is derived from upscaling eddy covariance tower observations to a global scale using machine learning methods.
Net Ecosystem Exchange (NEE) [gCm −2 day −1 ] The total exchange of carbon of the ecosystem with the atmosphere 510 NEE = GPP − TER (Tramontana et al., 2016). This dataset is derived from upscaling eddy covariance tower observations to a global scale using machine learning methods.
Latent energy (LE) [Wm −2 ] the amount of energy lost by the surface due to evaporation (Tramontana et al., 2016). This dataset is derived from upscaling eddy covariance tower observations to a global scale using machine learning methods.
Sensible Heat (H) [Wm −2 ] the amount of energy lost by the surface due to radiation (Tramontana et al., 2016). This dataset 515 is derived from upscaling eddy covariance tower observations to a global scale using machine learning methods.
Root-Zone Soil Moisture [m 3 m −3 ] the moisture content of the root zone. This dataset is based on the GLEAMv3 model (Martens et al., 2017), using satellite data from ESA CCI and SMOS to derive a number of variables.
Surface Soil Moisture [mm 3 mm −3 ] the soil moisture content at the soil surface. This dataset is based on the GLEAMv3 model (Martens et al., 2017), using satellite data from ESA CCI and SMOS to derive a number of variables.