Wide Discrepancies in the Magnitude and Direction of Modelled SIF in Response to Light 1 Conditions 2 3

Abstract. Recent successes in passive remote sensing of far-red solar induced chlorophyll fluorescence (SIF) have spurred development and integration of canopy-level fluorescence models in global terrestrial biosphere models (TBMs) for climate and carbon cycle research. The interaction of fluorescence with photochemistry at the leaf- and canopy- scale provides opportunities to diagnose and constrain model simulations of photosynthesis and related processes, through direct comparison to and assimilation of tower, airborne, and satellite data. TBMs describe key processes relating to absorption of sunlight, leaf-level fluorescence emission, scattering and reabsorption throughout the canopy. Here, we analyze simulations from an ensemble of process-based TBM-SIF models (SiB3, SiB4, CLM4.5, CLM5.0, BETHY, ORCHIDEE, BEPS) at a subalpine evergreen needleleaf forest near Niwot Ridge, Colorado. These models are forced with tower observed meteorological data, and analyzed against continuous far-red SIF and gross primary productivity (GPP) partitioned eddy covariance data at diurnal and synoptic scales during the growing season (July–August 2017). Our primary objective is to summarize the site-level state of the art in TBM-SIF modeling over a relatively short time period (summer) when light, structure, and pigments are similar, setting the stage for regional- to global-scale analyses. We find that these models are generally well constrained in simulating photosynthetic yield, but show strongly divergent patterns in the simulation of absorbed photosynthetic active radiation (PAR), absolute GPP and fluorescence, quantum yields, and light response at leaf and canopy scale. This study highlights the need for mechanistic modeling of non-photochemical quenching in stressed and unstressed environments, and improved representation of light absorption (APAR), distribution of sunlit and shaded light, and radiative transfer from leaf to canopy scale.


Comments and Author Response: 20 21 The authors have done a great job in replying to my comments and changing the manuscript 22 accordingly. Especially, the addition of SCOPE model simulations benefits the analysis a lot, I 23 find. I have just a few minor comments not requiring further reviewer assessment: 24 25 We agree the addition of SCOPE was extremely beneficial, thank you for the suggestion 26 27 l. 45: "distribution of light across sunlit and shaded leaves" or "within-canopy distribution of 28 direct and diffuse radiation"?! 29 30 We changed to the former, thanks 31 32 Canopy scanning spectrometers such as PhotoSpec thus provide an opportunity to understand 160 the physical processes that lead to a breakdown of SIF-GPP linearity at leaf to canopy scale (or 161 conversely, emergence of linearity at increasing scale), and for detailed evaluation and diagnosis 162 of TBM performance. This study provides a preliminary benchmarking site-level assessment for 163 simulations of SIF within a TBM framework and across an ensemble of TBMs, with the primary 164 purpose being an initial investigation into the response of modelled SIF and GPP to light during 165 peak summer. We leverage continuous measurements of SIF and GPP at the Niwot Ridge US-NR1 166 Ameriflux flux tower in Colorado from June-July 2017 (Magney et al., 2019b), and simulations of 167 canopy radiative transfer, photosynthesis, and fluorescence from a stand-alone version of SCOPE, 168 to (1) Benchmark TBM-SIF modeling, (2) Evaluate sensitivity to underlying processes and scaling 169 techniques, (3) Identify strengths and weaknesses in current modeling strategies, and (4) 170 Recommend strategies for models and observations. 171 The paper is organized as follows: Section 2 describes SCOPE and the seven TBM-SIF models (SiB3, 172 SiB4, ORCHIDEE, BEPS, BETHY, CLM4.5, CLM5) which have recently been published or are in 173 review, and provides more details on site level benchmarking observations. Section 3 summarizes 174 results comparing modelled and predicted SIF and GPP at hourly and daily scales, as they relate 175 to absorbed light, GPP and SIF yields, and quantum yields. Section 4 discusses results in more 176 detail, including attribution of SIF magnitude and temporal phasing biases and sensitivities to 177 absorbed light, and areas for improvement. However, factors such as observation angle, fraction of sunlit/shaded canopy components, and 238 difference in footprint from APAR, necessitates an additional diagnostic variable defined as 239 relative SIF (SIFrel). SIFrel is emitted SIF per reflected radiance in the far red spectrum where SIF 240 retrievals occur (SIF/Reffr). This is useful because is normalizes for the exact amount of 241 'illuminated' canopy components within the sensor field of view, whereas APAR measurements 242 are integrated for the entire canopy. 243 These quantities represent different but equally important versions of reality. It is difficult for 244 models to exactly reproduce the distribution and timing of sunlight in the canopy as observed by 245 PhotoSpec. While SIFrel removes model-observation differences in illumination, it confounds our 246 interpretation of the relationship with GPPyield, which is derived from APAR. As such, we provide 247 both results to be comprehensive, but note the temporal stability associated with SIFrel as the 248 more physical interpretation of canopy yield for this short period of study. for data access. We focus our model-data analysis on the 2017 growing season (July-August, 266 2017) to maximize overlap between observations of SIF, GPP, and APAR. 267 Diurnal composites of PhotoSpec SIF in 2017 show a late morning peak and afternoon dip (Fig  268   S3A). The afternoon dip is consistent with decreased incoming shortwave, PAR and APAR (Figs S1 269 and S2, respectively). However, we note the retrieved signal from PhotoSpec is also affected by 270 (1) viewing geometry, (2) fraction of sunlit vs shaded leaves (sun/shade fraction, i.e. the quantity 271 of needles illuminated by incident sunlight) due to self-shading within the canopy, and (3) 272 direct/diffuse fraction due to cloud cover. Structural and bidirectional effects lead to different 273 SIF emission patterns depending on view angle and scanning patterns (Yang and van der Tol, 274 2018). The viewing geometry of PhotoSpec (as implemented at NR1 in 2017) causes a higher 275 fraction of illuminated vegetation in the morning, which leads to a 2 to 3 hour offset in the timing 276 of peak SIF ( Fig S3A) and incoming far-red reflected radiance within the retrieval window (Fig  277   S3B), from the peak zenith angle of the sun at noon (coinciding with the expected peak in PAR) 278 to late morning. Normalizing SIF by far-red reflected radiance as relative SIF (SIFrel, Fig S3C) and 279 rescaling to SIF (Fig S3D) shifts the peak back to noon and preserved the afternoon dip (albeit 280 with reduced magnitude). SIFrel helps to account for factors 1-3 listed above because it accounts 281 for the amount of reflected radiation in the field of view of PhotoSpec, which is impacted by 282 canopy structure, sun angle, and direct/diffuse light. As discussed above, SIFrel is likely a better 283 approximation of SIFyield because it normalizes for the exact amount of 'illuminated' canopy 284 components in each retrieval, whereas APAR integrates the entire canopy. As such, we expect 285 SIFrel to have a strong seasonal change associated with downregulation of photosynthesis, and a 286 more subtle diurnal change, as during mid-summer the SIF signal is primarily driven by light 287

intensity. 288
It is important to note that the PhotoSpec system is highly sensitive to sun/shade fraction in the 289 canopy (factor 2) due to the narrow FOV of the PhotoSpec telescoping lens. Increased afternoon 290 cloud cover during summer causes diurnal asymmetry in incident PAR (Fig S1A) for eddy covariance during nighttime, but the relatively flat area of the tower reduces impact on 306 daytime flux measurements . Details on the flux measurements, data 307 processing and quality control are provided in Burns et al. (2015). 308

TBM-SIF Overview 310
The parent TBMs are designed to simulate the exchanges of carbon, water, and energy between 311 biosphere and atmosphere, from global to local scales depending on inputs from meteorological 312 forcing, soil texture, and plant functional type. The addition of a fluorescence model that 313 simulates SIF enables a direct comparison to remotely sensed observations for benchmarking, 314 process diagnostics, and parameter/state optimization (data fusion) for improved GPP 315 estimation. The TBM-SIF models analyzed here differ in ways too numerous to discuss. We refer 316 the reader to the appropriate references in Section 2.3.4 for more detailed model descriptions. 317 Instead, we focus on key differences affecting joint simulation of GPP and leaf/canopy level SIF 318 at diurnal and synoptic scale, during the peak of summer. These differences, which are 319 summarized in Table 1, include the representation of stomatal-conductance (all use Ball-Berry 320 except CLM5.0, BEPS, and ORCHIDEE), canopy absorption of incoming radiation (all account for 321 sunlit/shaded radiation except ORCHIDEE, SiB3, and SiB4), limiting factors for photosynthesis 322 (Vcmax, LAI, radiation, stress) and SIF (kN, fluorescence photon re-absorption), scaling and radiative 323 transfer methods for transferring leaf-level SIF simulations to top of canopy, and parameter 324 optimization. Further details on (a) photosynthetic structural formulation and parameter choice, 325 (b) representation of leaf level processes important to SIF ( $ and % ), and (c) leaf-to-canopy 326 scaling approach ( )*+,-. ) are provided in Sections 2.3.2 and 2.3.3. 327

Photosynthesis Models 328
All TBM-SIF models in this manuscript used enzyme-kinetic models to simulate leaf assimilation 329 rate (gross photosynthesis) as limited by the efficiency of photosynthetic enzyme system, the 330 amount of PAR captured by leaf chlorophyll, and the capacity of leaves to utilize end products of 331 photosynthesis (Farquhar et al., 1980;Collatz et al., 1991Collatz et al., , 1992Sellers et al., 1996). However, 332 there are important differences in the representation of (a) stomatal conductance that couples 333 carbon/water cycles, and (b) limiting factors on carbon assimilation due to leaf physiology 334 (maximum carboxylation capacity, Vcmax), radiation (APAR or fAPAR), canopy structure (LAI, leaf 335 angle distribution), and stress (water supply and demand, temperature), that affect plant 336 physiological processes and canopy radiative transfer. The underlying stomatal conductance 337 models in the TBMs analyzed here are represented by the Ball-Berry family of empirical models 338 rooted in the leaf gas exchange equation but with different representations of atmospheric 339 demand (relative humidity or vapor pressure deficit), including the Ball-Berry-Woodrow model 340 (Ball et al., 1987), the Leuning model (Leuning, 1995), the Yin-Stuik model (Yin and Struik, 2009), 341 and the Medlyn model (Medlyn et al., 2011). These structural and parametric differences also 342 influence calculated values such as the degree of light saturation (Section 2.3.3), which influence 343 both the fluorescence and quantum yield as used by the fluorescence models. Differences in 344 stomatal conductance, canopy type / radiation scheme, stress, Vcmax, and LAI are summarized in 345 Table 1.

Leaf level SIF emission 369
The 'quantum yield' approach has been used in SIF models to characterize the fraction of photons 370 that are used for PQ, NPQ, or re-emitted as fluorescence (van der Tol 2014). It is important to 371 note, that this does not translate into the actual amount of SIF emission leaving the leaf, but is 372 used as an approximation. TBM-SIF models typically represent % using lake model formalism, NPQ models can be classified as stressed (drought) and unstressed relative to water availability 390 depending on the dataset from which empirical fits are derived. The unstressed model is ideal 391 for irrigated systems such as crops, and the stressed model is more appropriate for water limited 392 ecosystems such as Niwot Ridge. We examine each of these models using drought and unstressed 393 models from van der Tol (2014), and a drought-based model from Flexas et al. (2002). These 394 models use different empirical fits but are otherwise identical. In general, $ increases more 395 rapidly with APAR (light saturation), and ramps up to a higher level, in the drought-based model 396 compared to the unstressed model. Additionally, some models provide unique improvements 397 such as dependence on environmental conditions (e.g., water stress vs no water stress in 398 ORCHIDEE), and equations for reversible and sustained NPQ to represent the different time 399 scales (minutes to seasonal) at which NPQ regulation occurs (e.g., CLM4.5) influenced by 400 pigmentation changes in the leaf. 401

Leaf-to-Canopy scaling 402
The TBM-SIFs produce leaf-level fluorescence which needs to be converted to canopy-level 403 fluorescence (SIFcanopy) to be directly compared to PhotoSpec and satellite observations. Leaf-to 404 canopy-level conversion of SIF requires a representation of canopy radiative transfer, which in 405 general is too computationally expensive to include within the TBMs in this study, that are 406 designed for global scale application. Therefore, most TBMs analyzed here account for canopy 407 radiative transfer of SIF using some representation of SCOPE (van der tol 2009a,b). The most 408 commonly used approach is to run independent simulations of SIF from SCOPE to create an 409 empirical conversion factor ( FGH ) between leaf and canopy level SIF that is a function of Vcmax 410  Table 1. 414

TBM-SIF Models 415
Here we provide a brief description of individual TBM-SIF models and within model experiments. 416 We point out key differences in modeling of photosynthesis, fluorescence, and leaf-to-canopy 417 scaling. We note that within model experiments, labeled as Experiment 1 (exp1), Experiment 2 418 (exp2), etc, represent increasing order of realism, rather than a specific set of conditions common 419 across models. As such, Experiment 1 in BETHY (BETHY-exp1) is not equivalent to Experiment 1 420 in CLM4.5 (CLM4.5-exp1).

Data Assimilation 547
Details of the data assimilation protocols for ORCHIDEE is provided in Bacour et al. (2019). An 548 ensemble of parameters related to photosynthesis (including optimal Vcmax) and phenology were 549 optimized for several plant functional types. Note that none of the assimilated pixels encompass 550 the location of the US-NR1 tower. In ORCHIDEE, the study site is treated as boreal needleleaf 551 evergreen (ENF); as such, the ORCHIDEE-exp3 simulations in this study are based on parameters 552 optimized against OCO-2 SIF data using an ensemble of worldwide ENF pixels. Note that for 553 BETHY, each experiment uses the same set of optimized parameters whereas in ORCHIDEE the 554 SIF simulations are performed separately for the standard parameters (ORCHIDEE-exp1/exp2) 555 and optimized parameters (ORCHIDEE-exp3), thus providing a test of sensitivity to parameter 556 optimization as discussed below. 557

Illumination Conditions 558
In order to gain insight into how SIF emissions and quantum yields vary with illumination, we 559 further analyze Photospec and a subset of models with respect to (a) changes in incoming light protocol. For example, CLM4.5 is better suited than others in prescribing observed vegetation 583 characteristics at the study site. One ORCHIDEE experiment (ORCHIDEE-exp3) is preliminary 584 optimized by assimilating independent Orbiting Carbon Observatory 2 (OCO-2) SIF data at the 585 global scale (Section 2.4). We emphasize that our point here is not to identify the best model but 586 to identify common patterns in model behavior through normalized SIF and deviation from 587 observed behavior to identify areas requiring the most attention. 588 The results are organized around two parallel themes. The first theme addresses four key 589 processes driving canopy-level fluorescence: (1) incoming illumination, (2) energy partitioning on 590 incoming light between photochemistry, fluorescence, and NPQ, and (3) leaf-to-canopy emitted 591 SIF, including linearity of yields at leaf and canopy scale. The second theme addresses sensitivity 592 of these processes to environmental conditions at diurnal and synoptic scales. Here, synoptic 593 scale refers to the impact of day-to-day changes in weather, including two storm events which 594 brought sustained cool, wet, and cloudy conditions from July 22-31 and then from August 6-10. 595

Incoming Illumination 597
Two key features dominate observed APAR variability: afternoon depression (Fig 2A) and 598 reduction during two summer storms (Fig 2D). Both features are captured by models. More 599 generally, models capture synoptic variability with high correlation (r > 0.8) and low across model 600 spread ( = 10%). The exception is BETHY, which is simulated outside our observation year (2015). 601 High model fidelity is expected given that observed PAR is prescribed, and it is promising that 602 models show a consistent response to changes in illumination. The primary shortcoming across 603 TBM-SIFs and SCOPE is a systematic high bias in APAR magnitude (129%), with most models 604 exceeding the upper range of observed APAR (as determined from the six within canopy PAR 605 sensors, Fig S2), and high model spread. These errors are likely related to differences in predicted 606 fAPAR. In the case of ORCHIDEE, high APAR is expected due to the big leaf assumption where all 607 leaves are considered as opaque and fully absorbing. 608

Canopy Photosynthesis 609
Observed GPP shows a broad peak from mid-morning to early afternoon (~9 am -1 pm local), 610 followed by slight decrease until 4 pm (Fig 2B), consistent with afternoon cooling and reduced 611 light availability (Fig 1B-D). The two month period under investigation is relatively flat with 612 generally weak day-to-day variability ( = 17%), but modest correlation with APAR (r = 0.61, Fig  613   2E). Some models capture the afternoon GPP depression, but all models strongly underestimate 614 its magnitude, apparently independent of stomatal conductance formulation or more explicit 615 accounting for plant hydraulic water stress such as in CLM5.0. SCOPE and BETHY, which don't 616 account for water stress, show no afternoon depression. Models are mostly uncorrelated with 617 observed GPP at synoptic scale (r ranges from -0.2 to 0.36, highest value in SiB4), high biased, 618 and show increased spread (in predicted magnitude) relative to APAR (143% +/-23%). SCOPE-619 exp2 shows slight improvement in GPP magnitude with the larger Vcmax value in late summer. 620 While observed GPPyield is mostly stable over the diurnal cycle, most models (except BEPS) show 621 a distinct midday minimum (Fig 3A). Half of the models show a similar midday minimum in 622 photochemical quantum yield ( % , Fig 4A), with the other half either increasing or decreasing in 623 the afternoon (CLM5.0 and SiB3/SiB4, respectively). The midday dip in yield is likely associated 624 with reduced photosynthetic efficiency at high light levels, as demonstrated by reductions in GPP, 625 GPPyield, % with APAR (Fig 5A, C, E). 626 Observed GPPyield shows significant structure at synoptic temporal scale (Fig 3C), most notably 627 increased yield during the cool/rainy period (reduced heat and water stress), and decreased yield 628 in mid-to late-August (increased heat and water stress following the cooling pattern). In contrast 629 to predicted GPP, models show high fidelity in capturing the magnitude and variability of GPPyield  0.1, Fig 4B). 633

Canopy Fluorescence 634
Observed SIFcanopy is strongly correlated with observed APAR at diurnal and synoptic scale (r = 635 0.77), with common features including afternoon depression and reduction during rainy periods 636 (Fig 2C & 2F). Observed PAR also feeds into the fluorescence sub-model and, unlike GPP, strongly 637 correlates with SIFcanopy at synoptic scale (r ranges from 0.58 to 0.92, highest values in SCOPE and 638 ORCHIDEE). However, we find a persistent positive model bias in SIFcanopy (170% +/-45%) 639 consistent with, but not proportional in magnitude to, the APAR bias. We note that models are 640 especially oversensitive to APAR at high light levels ( Fig 5D). 641 We investigate the high bias in SIFcanopy in more detail using SCOPE-exp2 and CLM4.5-exp3. 642 Specifically, we examine leaf and canopy level SIF and quenching under sunlit and shaded leaves. 643 Analysis of quantum yields in SCOPE-exp2 ( Fig S5) shows a reversal in the fractional amounts of 644 absorbed energy going to SIF and PQ vs NPQ in low-vs high-light conditions that is consistent 645 with leaf level data and theory (Porcar-Castell et al., 2014). More specifically, SCOPE-exp2 646 predicts low " and % and high $ in sunlit leaves relative to shaded leaves, with more energy 647 going to fluorescence and photochemistry than to NPQ in shaded leaves, and more energy going 648 to (shed off by) NPQ in sunlit leaves ( Fig S5). Likewise, total " shows decreasing values with 649 increasing APAR in SCOPE and BETHY-exp2/3 compared to BETHY-exp1, consistent with observed 650 SIFyield (Fig 5E-F), as $ ramps up to higher levels in the drought parameterized Kn model. 651 Moreover, in stark contrast to SIFyield and SIFcanopy, " does not show high values relative to other 652 models (Fig 4D). These results point to an issue in SCOPE and BETHY with leaf to canopy scaling 653 in needleleaf forests. 654 Analysis of CLM4.5-exp3 suggests several possible reasons for oversensitivity to APAR. First, we 655 focus on emissions from sunlit/shaded portions of the canopy (Fig S6). CLM4.5-exp3 and 656 PhotoSpec both show higher SIF under "high light" conditions (sunlit leaves and direct radiation, 657 respectively) compared to "low light" conditions (shaded leaves and diffuse radiation, 658 respectively), which is promising (Fig S6 A,D). Comparing the ratio of sunlit to shaded SIF in 659 CLM4.5-exp3 to the ratio of direct to diffuse SIF in PhotoSpec (Fig S6 B,E) shows higher ratio in 660 CLM4.5-exp3 on average. The difference peaks in midday, when sunlit leaf area is maximized 661 (self-shading minimized) in CLM4.5 but no major difference in the amount of direct radiation, 662 and decreases with increasing sun angle (morning and afternoon) and with increasing rainfall (in 663 the afternoon on average, and during the rainy period in late July / early August), both of which 664 increase the shaded fraction. As such, accounting for view angle and different illumination 665 metrics for PhotoSpec and CLM4.5 (most comparable in morning, afternoon, and during rainy 666 days) reduces, but does not entirely remove, the positive bias in high light conditions. 667 Second, the degree of light saturation (x) is twice as high in the sunlit canopy in CLM4.5 (Fig S7), 668 which leads to low fluorescence efficiency in sunlit leaves and high fluorescence efficiency in 669 shaded leaves. While this produces high photochemistry in shaded leaves, it contributes a small 670 fraction of SIF to the total canopy (~20%) despite higher fractions of shaded leaves (~2/3 at noon, 671 and only three models (SCOPE, ORCHIDEE-exp3, and BETHY-exp2/3), produce correlation greater 687 than 0.5. These are the only three models that also capture a negative relationship between 688 SIFyield and APAR (Fig 5E). 689 In general, predicted SIFyield is stable during our short study period (Fig 3). Half of models show a 690 significant positive correlation with GPPyield (r > 0.85) and half show zero or negative correlation 691 ( Fig S8). While these findings run counter to observed SIFyield, which shows a clear response 692 during and following the storm event and moderate positive correlation with observed GPPyield (r 693 = 0.40), they show some consistency with observed SIFrel (grey line in Fig 3 and Fig S8A) which 694 like many models is stable and uncorrelated with GPPyield. We refer the reader to Section 2.2.2 695 for clarification of the important difference between SIFyield and SIFrel. At least at diurnal scale, there is some evidence that leaf and canopy emissions look more similar 704 for models adopting simplified empirical scaling functions (SiB3, SiB4, CLM4.5, CLM5.0, BEPS) 705 than for models that more explicitly account for radiative transfer (SCOPE, BETHY, ORCHIDEE). 706 For the more explicit models, the diurnal cycle of 1 is out of phase with SIFyield, the former of 707 which peaks in the afternoon and the latter of which peaks in the morning. This produces 708 reasonable agreement to PhotoSpec in phase and magnitude between SIFyield and SIFrel for 709 ORCHIDEE, but produces divergence in the magnitude of SIFcanopy for ORCHIDEE. 710 Model performance in leaf-to-canopy scaling is summarized in Figure S8. The only three models 711 with a positive relationship between yields ( Fig S8B) and between quenching terms (Fig S8C)  712 include explicit representation of radiative transfer (i.e., SCOPE, BETHY, and ORCHIDEE). CLM4.5 713 is the only model with a positive relationship between yields, but not between quenching terms. 714 SiB3/SiB4 are the only models with a positive relationship between quenching terms, but not 715 between yields. 716 Finally, we clarify an important difference between observed and predicted estimates of canopy 717 average SIF. PhotoSpec scans direct emissions from sunlit and shaded leaves within the canopy, 718 thus observing the 'total' emission from leaves in the instrument FOV. We then average each of 719 these leaf-level scans and report as canopy averages. Model output, in contrast, is reported at 720 the TOC, which represents the 'net' emission from leaves after attenuation in the canopy 721 (through canopy radiative transfer, re-absorption of SIF, and shading). Assuming sunlit and 722 shaded leaves within the canopy emit at the same rate as TOC leaves, attenuation will reduce the 723 effective signal from leaf-level emissions within the canopy. As such, the average of leaf level 724 emissions (canopy average) is expected to be lower than the net emission of leaves reaching the 725

top of canopy. 726
This is important because CLM4.5 shows strong attenuation of SIF from leaf-level to TOC, 727 decreasing by a factor of 2-3 at midday (Fig S7). The interpretation here is that the model bias in 728 absolute SIF may actually be higher than reported here; however, we note that more quantitative Different models match some observed parameters better than others (with respect to APAR and 768 yield), but no model gets both APAR and SIFyield magnitude and/or sensitivities close to the 769 observations. For example, BEPS closely matches the magnitude of APAR (Fig 2A), and BETHY 770 captures the decline in SIFyield with APAR for NPQ quenching based on stressed species (Fig 5E), 771 but both models overestimate observed yield by a factor of 2, hence SIF is overestimated (Fig 2). 772 CLM4.5 correctly captures the diurnal SIFyield change, but overestimate APAR; in this case, SIF and 773 SIFyield are overestimated. Importantly, models diverge strongly from each other and from 774 observations in the magnitude of SIFyield and its decline with APAR (Fig 5E), partially reflecting 775 model variability in 1 (Fig 5F), but in general show a characteristic pattern of weak SIFyield decline 776 with APAR. GPPyield shows higher agreement between models and with observations (Fig 5B), 777 despite divergent % (Fig 5C), which could be indication that the primary uncertainty is due to 778 the representation of fluorescence and not the photosynthesis model. 779 Consequently, we find a strong linear and positive relationship between observed SIFyield and 780 GPPyield for absolute SIF, which is underestimated on average by models (Fig S8A-B). In contrast, 781 models show quite strong positive relationships between 1 and % (Fig S8C). Our study 782 highlights an apparent challenge for models in transferring leaf level processes to canopy scale, 783 and consequently, linking the proper canopy mechanistic SIF-GPP relationship at the leaf level. 784 The mismatch between multi-model simulations and tower-based observations of SIF and GPP 785 at hourly and daily scales can be summarized as symptoms of five main factors: (1) PhotoSpec 786 scan strategy, (2) radiative transfer of incoming PAR and impact on APAR and sunlit/shaded 787 fraction, (3) representation of photosynthesis and sensitivity to water limitation especially during 788 afternoon conditions, (4) representation of fluorescence and sensitivity to reversible NPQ 789 response at Niwot Ridge, and (5) radiative transfer of fluorescence from leaf to canopy. Several 790 persistent biases falling under these broad categories are discussed below. 791

Apples to Apples Comparison. 792
PhotoSpec is unique in its ability to scan entire canopies for signals that are largely hidden from 793 nadir-oriented instruments. However, this creates unique challenges for interpretation of data 794 and comparison to models. For example, the diurnal cycle of observed SIF is highly sensitive to 795 view angle. PhotoSpec was set up in 2017 to scan back-and-forth between northwest and 796 northeast view angles, but the instrument was slightly biased to the northwest, causing a low 797 phase angle in the morning (more aligned with rising sun) and increased phase angle in the 798 afternoon (more opposed to setting sun). As such, PhotoSpec observed predominantly 799 illuminated canopies in the morning and shaded canopies in the afternoon (i.e., more shaded 800 fraction), leading to the late morning peak in reflected radiance (Fig S3). It is critical that model evaluation relative to measured SIF data and data assimilation studies 816 properly account for the specificities of the instrument (viewing of the instrument, spectral band, 817 time of the overpass for space-borne instruments), the representation of canopy emission, and 818 correct observations for directional variations in SIF relative to observation geometry. Although 819 normalizing SIF by reflected radiance partially alleviates scan angle effects, this highlights the 820 need for models to get canopy structure, radiative transfer, and sunlit/shaded fraction correct, 821 which feed all the way through to SIF and GPP. Further ground-based investigations of SIF 822 anisotropy, sunlit/shade fraction, and vertical distribution (within canopy, canopy integrated, 823 and top of canopy) with PhotoSpec and SCOPE may help to inform models on the physical aspects 824 of the signal. Despite the issues we highlight in comparing observations to models, the potentially 825 more interesting and important story here is with respect to model-model comparisons, which 826 reveals wide divergence in response to light conditions and other factors, as discussed below. 827

TBM SIF is too sensitive to APAR. 828
Our results indicate a wide range of SIF responses to APAR: TBM-SIFs and SCOPE are usually far 829 too sensitive to APAR, observations of absolute SIF are less sensitive, and observations of relative 830 SIF (SIFrel) are least sensitive (Fig. 5D). We remind the reader that SIFrel is normalized by the 831 amount of far-red light reflected from leaves in the FOV of PhotoSpec, and thus has reduced 832 sensitivity to absorbed light than absolute SIF. The fact that SIFrel is the least sensitive to APAR 833 means other processes are driving changes in SIF under increased light absorption. In this case, 834 it reveals a strong SIF response to changes in photochemical quenching. SIF models appear 835 especially sensitive to sunlit leaves. In CLM4.5, SIF emissions from the sunlit portion of the canopy 836 are a factor of 5 higher than emissions from shaded leaves, despite twice as fewer leaves in the 837 sunlit canopy (Fig S6C). In CLM4.5, the combination of higher than average 1 (Fig 5F) with higher 838 fluorescence efficiency in the sunlit portion of the canopy, produce an increase in the magnitude 839 and sensitivity to sunlit fraction, thus contributing to the high bias (factor of 3 higher than 840 observed) and strong diurnal cycle (2-fold increase from morning to midday). 841

Linearity of SIF and GPP yields. 842
Observations show a positive but not significant linear relationship between SIFyield and GPPyield 843 GPPyield, are also the only models that also show a decline in SIFyield with APAR, as discussed below. 854 These results are likely to change when we expand the study to several years; however, the 855 purpose of this study was to provide an initial investigation into the response of modelled SIF and 856 GPP to light during peak summer. 857

Insufficient decline in SIFyield with APAR. 858
In general, models show an insufficient decline in SIFyield with APAR, when compared to observed 859 SIFyield ( Fig 5E). All models except SiB3 and SiB4 show some decline, with BETHY showing the best 860 agreement in slope magnitude. SCOPE and BETHY are the only models with full radiative transfer 861 but this does not appear to have a substantial impact on SIFyield, which has a similar (albeit 862 suppressed) decline with APAR as 1 (Fig 5F). Within model experiments show little to no 863 sensitivity of SIFyield or 1 decline with APAR to water stress (e.g., ORCHIDEE) or prescribed LAI 864 (e.g., SiB3), but high sensitivity to the formulation of NPQ with respect to species calibration (e.g., 865 BETHY) and reversibility (e.g., CLM4.5). decreases the magnitude (but not shape) of APAR closer to observed values (Fig 2), and leads to 894 improvements in GPPyield and SIFyield (Fig 3). Nevertheless, after the assimilation there are still 895 disagreements in SIFyield vs GPPyield relative to the measured quantities ( Fig S8). For diurnal and 896 synoptic cycles, the assimilation effectively acts to scale the magnitude of SIF, GPP and APAR (and 897 related yields), but it does little to alter variability. Although data assimilation (i.e. calibrating 898 model parameters) is critical to improving modelled SIF and GPP, this should be done in 899 conjunction with improvements in the model formulation (as summarized in Section 5), 900 otherwise the estimated model parameters can be sub-optimal to compensate for the lack of 901 missing processes. 902

Conclusions/Recommendations 903
Our results reveal systematic biases across TBM-SIF models affecting leaf-to-canopy simulations results from the inability to account for afternoon water stress to properly restrict stomatal 920 conductance and hence GPP and SIF. Additional effort is needed to characterize SIF and GPP 921 sensitivity to increased atmospheric demand and/or reduced soil moisture across a range of 922 managed and unmanaged systems. We also recommend more inclusion of stomatal 923 optimization models (e.g., Eller et al., 2020) as optional parameterizations for TBMs, to better 924 account for plant hydraulic functioning under water stress compared to the more widely used 925 semi-empirical models. 926 3) Leaf Mechanism for Energy Partitioning. We provide evidence that many models fail to 927 capture the correct reversible NPQ response to light saturation, leading to biases in SIFyield 928 during high light conditions and especially with increasing moisture limitation at the end of 929 summer. Further investigation using models such as BETHY and CLM is needed to better 930 characterize sensitivity of NPQ formulations to PFT and environmental conditions. We also 931 emphasize a need for more simultaneous measurements of active and passive chlorophyll 932 other RTMs) to directly account for multiple observation angles. 949 6) Finally, we note that our focus on a water limited subalpine evergreen needleleaf forest 950 represents a challenging case study for models and observations. In many cases, there is 951 strong covariance between LAI, SIF, APAR and GPP in cropping systems (Dechant et al., 2020), 952 but because this study site experiences little change in canopy structure and APAR 953 throughout the season (Magney et al, 2019b), our study sought to provide more explicit 954 insight into the models sensitivity to photosynthesis and fluorescence. As such, it is possible 955 that we would see more convergence of results, and a reduction in confounding effects (e.g., 956 decreased NPQ), in a well-watered high-LAI cropping system. We therefore recommend 957 similar model-observation assessments across a wider range of biota and climate. 958

Data availability 959
All observational data (APAR, SIF, GPP, and relative SIF) are provided as hourly time series. The 960 data can be found at https://data.caltech.edu/records/1231. The data are saved as a .csv file.  1233   Table 1. Summary of TBM-SIF models and within model experiments illustrating model 1234 components that may have led to differences in modeled SIF., These include a representation of 1235 stomatal-conductance (column 3), canopy absorption of incoming radiation (column 4), limiting 1236 factors for photosynthesis (Stress, Vcmax, LAI; columns 5-7) and SIF (kN; column 8), leaf-to-canopy 1237 scaling of SIF (column 9), and parameter optimization (column 10).