Lack of direct carbon, water, and energy flux observations at global scales makes it difficult to calibrate land surface models (LSMs). The increasing number of remote-sensing-based products provide an alternative way to verify or constrain land models given their global coverage and satisfactory spatial and temporal resolutions. However, these products and LSMs often differ in their assumptions and model setups, for example, the canopy model complexity. The disagreements hamper the fusion of global-scale datasets with LSMs. To evaluate how much the canopy complexity affects predicted canopy fluxes, we simulated and compared the carbon, water, and solar-induced chlorophyll fluorescence (SIF) fluxes using five different canopy complexity setups from a one-layered canopy to a multi-layered canopy with leaf angular distributions. We modeled the canopy fluxes using the recently developed land model by the Climate Modeling Alliance, CliMA Land. Our model results suggested that (1) when using the same model inputs, model-predicted carbon, water, and SIF fluxes were all higher for simpler canopy setups; (2) when accounting for vertical photosynthetic capacity heterogeneity, differences between canopy complexity levels increased compared to the scenario of a uniform canopy; and (3) SIF fluxes modeled with different canopy complexity levels changed with sun-sensor geometry. Given the different modeled canopy fluxes with different canopy complexities, we recommend (1) not misusing parameters inverted with different canopy complexities or assumptions to avoid biases in model outputs and (2) using a complex canopy model with angular distribution and a hyperspectral radiation transfer scheme when linking land processes to remotely sensed spectra.

Land surface models (LSMs) simulate the carbon, water, and energy fluxes at the land–atmosphere interface at regional and global scales and are a key
component for Earth system models (ESMs). The ability of LSMs to accurately model the carbon, water, and energy fluxes within vegetation canopy
largely determines the predictive skills of the ESMs. Modeling canopy carbon, water, and energy fluxes dates back to the early 20th century, and
various canopy models have different complexities from a single layer to multiple layers (see

It should be noted that a big-leaf model may refer to different models within the last decades given their interchangeable uses

The biggest advantage of the big-leaf model family is computational efficiency given the simple mathematical formulation. The potential disadvantages
of the big-leaf model family are also obvious; e.g., the model is too simplified and thus not able to resolve vertically varying profiles and
microclimates in the canopy, such as air temperature, humidity, and wind speed

One of the most important functions of canopy models is to predict carbon, water, and energy fluxes globally in the future to determine whether
the land will remain a carbon sink. Though canopy models with different complexity levels have been extended to a global scale in different LSMs,
researchers are facing a key problem: a lack of direct global-scale carbon, water, and energy flux observations. The lack of data makes it difficult
to calibrate the LSMs at global scales, particularly those using more complex canopy setups given the more parameters required. As a result, though it
is shown that a multi-layered canopy model better resolves energy fluxes in the canopy

To better constrain LSMs with data, people realized the promise of remote sensing data given their global coverage and satisfactory spatial and temporal
resolutions. Regarding carbon, research has shown that solar-induced chlorophyll fluorescence (SIF) and the near-infrared reflection of vegetation (NIRv)
are correlated with plant gross primary productivity

Despite the increasing number of inverted fluxes and plant trait datasets, there is limited research into testing the capability of these data in improving LSM
predictions. Among the various reasons that hamper the fusion of large-scale datasets into LSMs, incompatibility between model and data assumptions
seems to be the major reason. For example, the disagreement in canopy complexity may introduce errors into modeling if one uses the data inverted from
a canopy complexity level (e.g., one-layered canopy) in a model with a different canopy complexity level (e.g., multi-layered canopy). Further, the
flux and trait maps inverted from remote sensing data often use simplified plant physiological representations, which are, however, key processes in
land modeling. For example, studies that derive GPP from SIF or NIRv often assume linear correlation between them, whereas vegetation models must
account for light saturation

Ideally, LSMs can be constrained using raw reflection and fluorescence spectra. This, nevertheless, requires the LSMs to move from broadband canopy
radiation to a hyperspectral representation and from sunlit and shaded fractions to leaf angular distributions

Canopy complexity levels. 1X: single-layer canopy without sunlit or shaded fractions. 2X: single-layer canopy with sunlit and shaded fractions. KX: multiple-layer canopy without sunlit or shaded fractions. 2KX: multiple-layer canopy with sunlit and shaded fractions per layer. IJKX: multiple-layer canopy with sunlit and shaded fractions per layer, with the sunlit fraction being further partitioned based on leaf inclination and azimuth angular distributions.

To resolve the problems of a complicated canopy, we examined by how much carbon, water, and SIF fluxes may differ when using different canopy complexity
representations in the CliMA Land model, spanning from a one-layered canopy to a multi-layered canopy with hyperspectral radiation and leaf angular
distributions (Fig.

We used the CliMA Land model (v0.1) to evaluate how canopy model complexity impacts the simulated carbon, water, and SIF fluxes. The CliMA Land model
mechanistically addresses soil–plant–air continuum processes and is able to simulate canopy carbon and water fluxes as well as SIF simultaneously

Non-linear leaf responses to the environmental and physiological parameters.

Leaf physiological responses to light are highly non-linear, such as stomatal conductance to water vapor (

A list of vegetation models corresponding to our tested canopy complexity schemes. CLM: Community Land Model. ISBA: Interactions between soil–biosphere–atmosphere. JULES: Joint UK Land Environment Simulator. ORCHIDEE: Organising Carbon and Hydrology In Dynamic Ecosystems. SCOPE: Soil Canopy Observation, Photochemistry and Energy fluxes. 2X: single-layer canopy with sunlit and shaded fractions. KX: multiple-layer canopy without sunlit or shaded fractions. 2KX: multiple-layer canopy with sunlit and shaded fractions per layer. IJKX: multiple-layer canopy with sunlit and shaded fractions per layer, with the sunlit fraction being further partitioned based on leaf inclination and azimuth angular distributions.

To evaluate how much canopy model complexity matters, we modeled the canopy using five different levels of complexity, and they are denoted as “1X”,
“2X”, “KX”, “2KX”, and “IJKX” (Fig.

For IJKX, we simulated the canopy radiative transfer using the CliMA Land-adapted SCOPE model

Corresponding APAR values for the shaded and sunlit leaves are

The 2KX fraction and APAR were derived from IJKX by weighing APAR for sunlit leaves per canopy layer:

The KX fraction and APAR were derived from 2KX by weighing APAR for all sunlit and shaded leaves per canopy layer:

The 2X fraction and APAR were derived from 2KX by weighing APAR for sunlit and shaded leaves for all canopy layers, respectively, the following:

1X APAR was derived from KX by weighing APAR for all layers:

Comparison of profiles of mean absorbed photosynthetically active radiation (APAR) for four different canopy complexity levels. 1X: single-layer canopy without sunlit or shaded fractions. 2X: single-layer canopy with sunlit and shaded fractions. KX: multiple-layer canopy without sunlit or shaded fractions. 2KX: multiple-layer canopy with sunlit and shaded fractions per layer. IJKX: multiple-layer canopy with sunlit and shade fractions per layer, with the sunlit fraction being further partitioned based on leaf inclination and azimuth angular distributions. The abbreviations “sl” and “sh” stand for sunlit and shaded leaves, respectively.

We emphasize here that to derive canopy fluorescence spectrum and its sun-sensor geometry, we need to simulate the canopy radiative transfer using
hyperspectral reflectance, transmittance, and fluorescence. Due to the high spectral resolution and multiple layers required, radiative transfer and
canopy fractions in complex models such as SCOPE are computed numerically. In comparison, radiative transfer and sunlit/shaded fractions are computed
analytically in the two-leaf radiation scheme, as the model is single layered and uses broadband reflectance and transmittance

Leaf traits in the canopy are not uniform among the canopy layers. Typically, leaf photosynthetic capacity (usually represented by the maximum
carboxylation rate at 25

To examine how much the vertical

The

Note that we tuned the maximum electron transport rate and leaf respiration rate in the same manner as

We simulated the canopy carbon and water fluxes using a stomatal optimization model developed in

At each canopy complexity level, for a given environmental condition set, we were able to obtain the steady-state stomatal conductance for each APAR,
from which we computed steady-state

We remind the reader here that soil evaporation is a function of soil water content, soil surface temperature, and atmospheric-vapor-pressure deficit and that soil evaporation should be the same for all tested canopy complexity models; this is also the case for evaporation from intercepted water on plant surface. Therefore, the modeled ET difference is 100 % caused by canopy transpiration, and using transpiration would not result in any biases in the relative difference of modeled ET.

For IJKX, we used

Despite the importance of vertical microclimate heterogeneity in modeled canopy energy fluxes

We ran a sensitivity analysis to environmental cues for all five complexity levels to examine how much they differ in predicted carbon, water, and SIF
fluxes. The tested cues included solar radiation, atmospheric-vapor-pressure deficit (VPD), temperature, soil water potential (

To evaluate how much the canopy complexity models differ in real-world simulations, we ran the model using weather data from a flux tower located at
Ozark, Missouri, USA

We compared the model-predicted carbon, water, and SIF fluxes. Note here that observed SIF depends on the sun-sensor geometry and that SIF
retrievals often have different sun-sensor geometries

Given that averaging APAR theoretically results in overestimated carbon and water fluxes, we expected that the difference among different canopy
complexity levels meets the following trends: (a) 1X

Net ecosystem exchange of

When a uniform

The SIF responses to changing environmental cues in general agreed in trends among tested complexity levels (Fig.

Net ecosystem exchange of

When an exponential vertical

Relative differences between IJKX, 2KX, and 2X for the net ecosystem exchange of

2KX and 2X had a lower difference from IJKX compared to KX and 1X, and 2KX had the lowest error given the better-resolved APAR
fractions (Figs.

When accounting for a vertically heterogeneous

Diurnal cycle of carbon and water fluxes using five different canopy complexity levels.

Difference between five different canopy complexity levels in a diurnal-cycle simulation of carbon and water fluxes. The carbon flux was represented by the site-level net ecosystem exchange of

Our model simulations suggest that all tested canopy complexity levels can qualitatively capture the trends of carbon and water fluxes at the tested
flux tower site (Fig.

Difference between five different canopy complexity levels in modeled solar-induced chlorophyll fluorescence (SIF) at a different viewing zenith angle and relative azimuth angle. The color indicates SIF at 740

Difference between five different canopy complexity levels in modeled solar-induced chlorophyll fluorescence (SIF) at different viewing zenith angle and relative azimuth angle. The color indicates SIF at 740

Using less complicated canopy complexity (namely 2KX, KX, 2X, and 1X) impacted the observed SIF depending on the sun-sensor geometry
(Figs.

Leaf fluorescence responses to radiation,

While simpler canopy models in general predicted higher carbon, water, and SIF fluxes, there were some scenarios that the simpler models predict
contrasting SIF responses compared to IJKX: (a) when total radiation increased, SIF of the simpler canopy models was lower than that of IJKX at low
radiation but were higher than that of IJKX at high radiation (Figs.

When total radiation was higher, the product of

The 1X model SIF response to atmospheric

The KX model SIF response to atmospheric

Our model simulations showed that different canopy complexity levels predicted divergent carbon, water, and SIF fluxes. 1X and KX without
partitioning the canopy to sunlit and shaded fractions, in particular, showed very high biases compared to the other three levels of complexity,
namely 2X, 2KX, and IJKX. Further, as we expected, IJKX, which has the most complex canopy, had the lowest predicted carbon and water
fluxes, followed by 2KX and 2X and then KX and 1X. Moreover, when we accounted for a profile of vertical canopy photosynthetic capacity,
the difference among canopy complexity levels increased. Though 2KX and 2X were, in general, close to IJKX in predicted canopy fluxes, the
disagreements may range up to

The disagreements among canopy complexity levels make it difficult to parameterize a land model using a complex canopy setup and thus hamper the fusion
of large-scale remote-sensing-based datasets with land models at a global scale. Thus, it is necessary to revisit the flux and plant trait inversions
using more applicable land model setups to make sure the inverted datasets and land models are consistent in their assumptions. This is the only way
to ensure that inverted parameters are quantitatively useful in future land surface modeling. Moreover, it is also possible for land models to go
without the inverted fluxes or traits if the land model runs using a complex canopy such as IJKX. This way, the model can be directly compared
against satellite observations

As suggested by

We recognize that increasing model complexity can make it (a) less user-friendly for researchers to use (e.g., when implementing the model into their
research projects) and (b) slower to run the model, particularly using less efficient programming languages such as Python and R (compared to C). In
our highly modularized CliMA Land model, we use Julia, a just-in-time compiled programming language that allows for the versatility of a scripting
language like Python but with the speed of fully compiled languages such as C and Fortran (see

We evaluated how much canopy carbon, water, and SIF fluxes differ when using five different canopy complexity levels in a land model. We found that when using the same model inputs, simpler canopy models predicted higher carbon, water, and SIF fluxes, and when we accounted for a profile of vertically heterogeneous photosynthetic capacity, we found more disagreements among canopy models with varying complexity levels. We also found that the modeled SIF varied with sun-sensor geometry among tested canopy complexity levels. Our model results suggest that misusing parameters inverted from different canopy complexities and assumptions may have resulted in biases in predicted canopy fluxes, and thus we recommend more cautious model parameterization regarding canopy complexity levels. Further, we recommend using complex canopy models with leaf angular distribution and a hyperspectral radiation transfer scheme to compare against remote sensing data in order to accurately mimic observed radiation. However, the use of complex canopy models in land surface modeling may be less efficient and not user-friendly for researchers. We believe more open and modular land models like CliMA Land will help lower the threshold to researchers.

We coded our model and did the analysis using Julia (version 1.6.2), and the current version of the CliMA Land model is available from the project website at

YW and CF designed and conducted the research, performed the general data analysis, and wrote the paper.

The contact author has declared that neither they nor their co-author has any competing interests.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

We gratefully acknowledge the generous support of Eric and Wendy Schmidt (by recommendation of the Schmidt Futures) and the Heising-Simons Foundation.

This research has been supported by the National Aeronautics and Space Administration (grant nos. 80NSSC18K0895 and 80NSSC21K1712).

This paper was edited by Martin De Kauwe and reviewed by Xiangzhong Luo and one anonymous referee.