In times of global change, we must closely monitor the state of the planet in order to understand the full complexity of these changes. In fact, each of the Earth's subsystems – i.e., the biosphere, atmosphere, hydrosphere, and cryosphere – can be analyzed from a multitude of data streams. However, since it is very hard to jointly interpret multiple monitoring data streams in parallel, one often aims for some summarizing indicator. Climate indices, for example, summarize the state of atmospheric circulation in a region. Although such approaches are also used in other fields of science, they are rarely used to describe land surface dynamics. Here, we propose a robust method to create global indicators for the terrestrial biosphere using principal component analysis based on a high-dimensional set of relevant global data streams. The concept was tested using 12 explanatory variables representing the biophysical state of ecosystems and land–atmosphere fluxes of water, energy, and carbon fluxes. We find that three indicators account for 82 % of the variance of the selected biosphere variables in space and time across the globe. While the first indicator summarizes productivity patterns, the second indicator summarizes variables representing water and energy availability. The third indicator represents mostly changes in surface albedo. Anomalies in the indicators clearly identify extreme events, such as the Amazon droughts (2005 and 2010) and the Russian heat wave (2010). The anomalies also allow us to interpret the impacts of these events. The indicators can also be used to detect and quantify changes in seasonal dynamics. Here we report, for instance, increasing seasonal amplitudes of productivity in agricultural areas and arctic regions. We assume that this generic approach has great potential for the analysis of land surface dynamics from observational or model data.

Today, humanity faces negative global impacts of land use and land cover change

Many recent changes due to increasing anthropogenic activity are manifested in long-term transformations. One prominent example is “global greening” that has been attributed to fertilization effects, temperature increases, and land use intensification

The problem of identifying patterns of change in high-dimensional data streams
is not new. Extracting the dominant features from high-dimensional observations
is a well-known problem in many disciplines. One approach is to manually define
indicators that are known to represent important properties such as the “Bowen
ratio”

In plant ecology, indicators based on dimensionality reduction methods are used
to describe changes to species assemblages along unknown gradients

Understanding changes in land–atmosphere interactions is a complex problem, as
all aforementioned patterns of change may occur and interact: land cover change
may alter biophysical properties of the land surface such as (surface) albedo
with consequences for the energy balance

Here, we aim to summarize these high-dimensional surface dynamics and make them accessible for subsequent interpretations and analyses such as mean seasonal cycles (MSCs), anomalies, trend analyses, breakpoint analyses, and the characterization of ecosystems. Specifically, we seek a set of uncorrelated, yet comprehensive, state indicators. We want to have a set of very few indicators that represent the most dominant features of the above-described temporal ecosystem dynamics. These indicators should also be uncorrelated, so that one can study the system state by looking and interpreting each indicator independently. The approach should also give an idea of the general complexity contained in the available data streams. If more than a single indicator is required to describe land surface dynamics accurately, then these indicators shall describe very different aspects. While one indicator may describe global patterns of change, others could be only relevant in certain regions, for certain types of ecosystems, or for specific types of impacts. The indicators shall have a number of desirable properties: (1) represent the overall state of observations comprising the system in space and time, (2) carry sufficient information to allow for reconstructing the original observations faithfully from these indicators, (3) be of much lower dimensionality than the number of observed variables, and (4) allow intuitive interpretations.

In this work, we first introduce a method to create such indicators, and then we apply the method to a global set of variables describing the biosphere. Finally, to prove the effectiveness of the method, we interpret the resulting set of indicators and explore the information contained in the indicators by analyzing them in different ways and relating them to well-known phenomena.

Variables used describing the biosphere. For a description of the variables, see Appendix

Table

The datasets taken from

In this study, each variable was normalized globally to zero mean and unit variance to account for the different units of the variables, i.e., transform the variables to have standard deviations from the mean as the common unit. Because the area of the pixel changes with latitude in the equirectangular coordinate system used by the ESDL, the pixels were weighted according to the represented surface area. Only spatiotemporal pixels without any missing values were considered in the calculation of the covariance matrix.

As a method for dimensionality reduction, we used a modified principal component
analysis to summarize the information contained in the observed variables. PCA
transforms the set of

The data streams consist of

The fundamental idea of PCA is to project the data to a space of lower
dimensionality that preserves the covariance structure of the data. Hence, the
fundament of a PCA is the computation of a covariance matrix,

Note that the actual calculation of the covariance matrix is even more
complicated, because summing up many floating-point numbers one by one can lead
to large inaccuracies due to precision issues of floating-point numbers and
instabilities of the naive algorithm (

Based on this weighted and numerically stable covariance matrix, the PCA can be
computed using an eigendecomposition of the covariance matrix,

The canonical measure of the quality of a PCA is the fraction of explained
variance by each component,

The reconstruction error,

The principal components estimated as described above are ideally low-dimensional representations of the land surface dynamics that require further interpretation. These components have temporal dynamics that need to be understood in detail. One crucial question is how the dynamics of a system of interest deviate from its expected behavior at some point in time. A classical approach is inspecting the “anomalies” of a time series, i.e., the deviation from the mean seasonal cycle at a certain day of year.

Another key description of such system dynamics are trends. We estimated trends
of the indicators as well as of their seasonal amplitude using the Theil–Sen
estimator. The advantage of the Theil–Sen estimator is its robustness to up to
29.3 % of outliers

To test the slopes for significance, we used the Mann–Kendall statistics

To identify disruptions in trajectories, breakpoint detection provides a good
framework for analysis. For the estimation of breakpoints, the generalized
fluctuation test framework

Example polygons and their areas,
Eq. (

The analysis of a different type of dynamic considers bivariate relations. In
the context of oscillating signals it is particularly instructive to quantify
their degree of phase shift and direction – even if both signals are not
linearily related. A “hysteresis” would be such a pattern describing how the
pathways

In the following, we first briefly present and discuss the quality of the global
dimensionality reduction (Sect.

Figure

Reconstruction error of the data cube using varying numbers of principal components aggregated by the mean squared error. Reconstruction errors aggregated over all time steps and variables are shown in the left column:

We estimated the reconstruction error sequentially up to the first three principal components (Fig.

The first PC summarizes variables that are closely related to primary
productivity (GPP, LE, NEE, fAPAR) and therefore are highly interrelated (see
Fig.

The second component represents variables related to the surface hydrology of
ecosystems (see Fig.

We observe that the third component is most strongly related to albedo
(Fig.

The relation of PC

Trajectories of some points (colored lines) and the area-weighted density
over principal components one and two (the gray background shading shows the
density) for

The bivariate distribution of the first two principal components forms a
“triangle” (gray background in Fig.

Both components form a subspace in which most of the variability of ecosystems
takes place. Component one describes productivity and component two the limiting
factors to productivity. Therefore, we can see that most ecosystems with high
values on component one (a high productivity) are at the approximate center of
component two. When ecosystems are found outside the center of component two,
they have lower values on component one (lower productivity) because they are
limited by water or temperature (see Fig.

The background shading shows the distribution of the mean seasonal cycle of
the spatial points (see Fig.

To further interpret the triangle we analyze how the Bowen ratio embeds in
the space of the first two dimensions.
Energy fluxes from the surface into the atmosphere can represent either a
radiative transfer (sensible heat) or evaporation (latent heat). Their ratio is
the “Bowen ratio”,

The leading principal components represent most of the variability of the space
spanned by the observed variables, summarizing the state of a spatiotemporal
pixel efficiently. This means that the PCs track the state of a local ecosystem
over time (Fig.

A first inspection reveals a substantial overlap of seasonal cycles of very
different regions of the world. We also see that very different ecosystems may
reach very similar states in the course of the season, even though their
seasonal dynamics are very different. For instance, a midlatitude pixel (blue
trajectory in Fig.

Depending on their position on Earth, ecosystem states can shift from limitation
to growth during the year (Fig.

The third component shows a different picture. Due to a consistent winter snow cover in higher latitudes, the albedo is much higher and the amplitude of the mean seasonal cycle is much larger than in other ecosystems. Other areas show comparatively little variance on the third component and their relation to productivity and moisture content is even positively correlated to the third component, which is the opposite of what is expected from an albedo axis.

Mean seasonal cycle of the first three principal components (in columns)
during the seasons (in rows).
Left column: first principal component.
Middle column: second principal component.
Right column: third principal component.
Rows from top to bottom: equally spaced intervals during the year.
Values have been clamped to

The global pattern of the first principal component follows the productivity
cycles during summer and winter (Fig.

The second principal component (Fig.

The third principal component (Fig.

Although the principal components are globally uncorrelated, they covary locally
(see Fig.

Observing the mean seasonal cycle of the principal components gives us a tool to
characterize ecosystems and may also serve as a basis for further analysis, such
as a global comparison of ecosystems

The area inside the mean seasonal cycles of

The alternative return path between ecosystem states forming the hysteresis
loops arises from the ecosystem tracking seasonal changes in the environmental
condition, e.g. summer–winter or dry–rainy seasons
(Fig.

The orange trajectory (area close to Moscow) in Fig.

The trajectories that show a more pronounced counterclockwise hysteresis effect in
PC

The hysteresis effect on PC

The hysteresis effect on PC

We can see that the hysteresis of pairs of indicators represents large-scale
properties of climatic zones. The enclosed area and the direction of the rotation provide interesting
information. Hysteresis can provide
information on the seasonal availability of water, seasonal dry periods, or
snowfall. With the method presented here, we can not observe intersecting
trajectories, which would probably provide even more interesting insights (e.g. the green trajectory in Fig.

Anomalies of the first three principal
components. The brown–green contrast shows the anomalies on
PC

The deviation of the trajectories from their mean seasonal cycle should reveal
anomalies and extreme events. These anomalies have a directional component which
makes them interpretable the same way the original PCs are. Therefore one can
infer the state of the ecosystem during an anomaly. For instance the well-known
Russian heat wave in summer 2010

Another example of an extreme event that we find in the PCs is the very wet
November rainy season of 2006 in the Horn of Africa after a very dry rainy
season in the previous year. This event was reported to bring heavy rainfall and
flooding events which caused an emergency for the local population but also
increased ecosystem productivity

Figure

Another extreme event that can be seen is the extreme snow and cold event
affecting central and south China in January 2008, causing the temporary
displacement of 1.7 million people and economic losses of approximately USD 21 billion

Trajectories of the first two principal components for single pixels.

Observing single trajectories can give insight into past events that happened at
a certain place, such as extreme events or permanent changes in ecosystems. The
creation of trajectories is an old method used by ecologists, mostly on species
assembly data of local communities, to observe how the composition changes over
time (e.g.

The seasonal amplitude of the trajectory in the Brazilian Amazon increases due
to deforestation and crop growth cycles. Figure

The 2010 Russian heat wave has a very clear signal in the trajectories.
Figure

Variability of rainfall during the November rainy season in the Horn of Africa
(3

The 2003 European heat wave is reflected in the trajectories just like the 2010
Russian heat wave. Figure

As we have seen here, observing single trajectories in reduced space can give us important insights into ecosystem states and changes that occur. While the trajectories can point us towards abnormal events, they can only be the starting points for deeper analysis to understand the details of such state changes.

The accumulation of

General patterns of trends that can be observed are a positive trend (higher
productivity) on the first principal component in many arctic regions. Many of
these regions also show a wetness trend, with the notable exception of the
western parts of Alaska, which have become drier.
This is important, because wildfires play a major role in these ecosystems

In the Amazon basin, we find a dryness trend accompanied by a decrease in
productivity and a slight increase in PC

In eastern Australia we find a strong wetness and greenness trend which is due
to Australia having a “millennium drought” since the mid-1990s with a peak
in 2002

Large parts of the Indian subcontinent show a trend towards higher productivity
and an overall wetter climate. The greening trend in India happens mostly over
irrigated cropland. However browning trends over natural vegetation have been
observed but do not emerge in our analysis

In large parts of the Arctic, a trend towards higher productivity can be
observed. Vegetation models attribute this general increase in productivity to

The seasonal amplitude of atmospheric

Another way to detect changes to the biosphere consists in the detection of
breakpoints, which has been applied successfully to detect changes in global
normalized difference vegetation index (NDVI) time series

One of the most popular applications of PCA in meteorology are EOFs, which
typically apply PCA to a single variable, i.e., on a dataset with the
dimensions

Ecological analyses usually use PCA with matrices of the shape

Using PCA as a method for dimensionality reduction means that we are assuming
linear relations among features. A nonlinear method could possibly be more
efficient in reducing the number of variables but would also have significant
disadvantages. In particular, nonlinear methods typically require tuning
specific parameters, objective criteria are often lacking, a proper weighting of
observations is difficult, the methods are often not reversible, and it is
harder to interpret the resulting indicators due to their nonlinear nature

To monitor the complexity of the changes occurring in times of an increasing human impact on the environment, we used PCA to construct indicators from a large number of data streams that track ecosystem state in space and time on a global scale. We showed that a large part of the variability of the terrestrial biosphere can be summarized using three indicators. The first emerging indicator represents carbon exchange, the second indicator shows the availability of water in the ecosystem, while the third indicator mostly represents a binary variable that indicates the presence of snow cover. The distribution in the space of the first two principal components reflects the general limitations of ecosystem productivity. Ecosystem production can be limited by either water or energy.

The first three indicators can detect many well-known phenomena without analyzing variables separately due to their compound nature. We showed that the indicators are capable of detecting seasonal hysteresis effects in ecosystems, as well as breakpoints, e.g. large-scale deforestation. The indicators can also track other changes to the seasonal cycle such as patterns of changes to the seasonal amplitudes and trends in ecosystems. Deviations from the mean seasonal cycle of the trajectories indicate extreme events such as the large-scale droughts in the Amazon during 2005 and 2010 and the Russian heat wave of 2010. The events are detected in a similar fashion as with classical multivariate anomaly detection methods while directly providing information on the underlying variables.

Using multivariate indicators, we gain a high level overview of phenomena in ecosystems, and the method therefore provides an interesting tool for analyses where it is required to capture a wide range of phenomena which are not necessarily known a priori. Future research should consider nonlinearities, adding data streams describing other important biosphere variables (e.g. related to biodiversity and habitat quality), and including different subsystems, such as the atmosphere or the anthroposphere.

Variables used describing the biosphere can be found in Table

Time and space patterns of PC

The minimum

Pairwise covariances of the first three principal components mean seasonal cycles by space.

Trends in the amplitude of the yearly cycle, 2001–2011. Only Theil–Sen
estimators for significant slopes (

Breakpoint detection,

As the environmental conditions change, due to climate change and human
intervention, the local ecosystems may change gradually or abruptly. Detecting
these changes is very important for monitoring the impact of climate change and
land use change on the ecosystems. We applied breakpoint detection to the
trajectories (Fig.

Breakpoints on the first component were found in the entire Amazon, and the largest breakpoint is dated to the year 2005 during the large drought event. The entire eastern part of Australia shows its largest breakpoint towards the end of the time series because of a La Niña event, which caused lower temperatures and higher rainfall than usual during the years 2010 and 2011.

The data are available and can be processed at

GK and MDM designed the study in collaboration with MR and GCV. GK conducted the analysis and wrote the manuscript with contributions from all co-authors.

The authors declare that they have no conflict of interest.

We thank Fabian Gans and German Poveda for useful discussions. We thank Jake Nelson for proofreading a previous version of the manuscript. We thank Gregory Duveiller and the three anonymous reviewers for very helpful suggestions and Kirsten Thonicke for editorial advice that improved the manuscript greatly.

This study is funded by the Earth System Data Lab – a project by the European Space Agency. Miguel D. Mahecha and Markus Reichstein have been supported by the Horizon 2020 EU project BACI under grant agreement no. 640176. Gustau Camps-Valls' work has been supported by the EU under the ERC consolidator grant SEDAL-647423. The article processing charges for this open-access publication were covered by the Max Planck Society.

This paper was edited by Kirsten Thonicke and reviewed by Gregory Duveiller and three anonymous referees.

_{2}Seasonal Amplitude, Nature, 515, 394–397,