Moderate forest disturbance as a stringent test for gap and big-leaf models

Disturbance-induced tree mortality is a key factor regulating the carbon balance of a forest, but tree mortality and its subsequent effects are poorly represented processes in terrestrial ecosystem models. It is thus unclear whether models can robustly simulate moderate (non-catastrophic) disturbances, which tend to increase biological and structural complexity and are increasingly common in aging US forests. We tested whether three forest ecosystem models – Biome-BGC (BioGeochemical Cycles), a classic big-leaf model, and the ZELIG and ED (Ecosystem Demography) gap-oriented models – could reproduce the resilience to moderate disturbance observed in an experimentally manipulated forest (the Forest Accelerated Succession Experiment in northern Michigan, USA, in which 38 % of canopy dominants were stem girdled and compared to control plots). Each model was parameterized, spun up, and disturbed following similar protocols and run for 5 years post-disturbance. The models replicated observed declines in aboveground biomass well. Biome-BGC captured the timing and rebound of observed leaf area index (LAI), while ZELIG and ED correctly estimated the magnitude of LAI decline. None of the models fully captured the observed post-disturbance C fluxes, in particular gross primary production or net primary production (NPP). Biome-BGC NPP was correctly resilient but for the wrong reasons, and could not match the absolute observational values. ZELIG and ED, in contrast, exhibited large, unobserved drops in NPP and net ecosystem production. The biological mechanisms proposed to explain the observed rapid resilience of the C cycle are typically not incorporated by these or other models. It is thus an open question whether most ecosystem models will simulate correctly the gradual and less extensive tree mortality characteristic of moderate disturbances.


Introduction
Natural and anthropogenic disturbances have numerous effects on the carbon (C) and energy dynamics in forested ecosystems and result in a variety of feedbacks between terrestrial ecosystems and climate (Goetz et al., 2012).In particular, disturbance-induced tree mortality is a key factor regulating the forest C balance but a complicated one due to high temporal and spatial heterogeneity (Vanderwel et al., 2013).Partly as a result, mortality and disturbance are poorly represented processes in terrestrial ecosystem models (Medvigy and Moorcroft, 2012;Peters et al., 2013;Dietze and Matthes, 2014).
Published by Copernicus Publications on behalf of the European Geosciences Union.

B. Bond-Lamberty et al.: Moderate forest disturbance as a stringent test
Most North American forests are at some stage of recovery from either natural or anthropogenic disturbance (Pan et al., 2011).In the US upper Midwest and northeast, low-severity disturbance is increasing in frequency and extent in regional forests, which have regrown following stand-replacing disturbances over a century ago (Frelich and Reich, 1995).The resulting cohort of fast-growing, deciduous trees is now past maturity and beginning to decline, while longer-lived species representation is increasing (Gough et al., 2010b).At the same time, forest disturbances in the region are transitioning away from severe events that historically caused complete stand replacement towards more subtle disturbances that result in only partial canopy defoliation or loss of selected species (Pregitzer and Euskirchen, 2004;Williams et al., 2012;Birdsey et al., 2006).These subtler disturbances include partial harvests, wind, pathogenic insects, diseases, and age-related senescence (e.g., Caspersen et al., 2000), which contribute to a gradient of disturbance intensities across the landscape.Unlike stand-replacing disturbance, moderate disturbances tend to increase biological and structural complexity and, consequently, are expected to have entirely different functional consequences for ecosystems (Nave et al., 2011;Peters et al., 2013).
Moderate disturbances have mixed effects on successional trajectories of forest C production and storage (Birdsey et al., 2006;Knohl et al., 2002;Vanderwel et al., 2013).In many forests, C storage shows unexpected resilience or even resistance to partial canopy defoliation (Hicke et al., 2011;Gough et al., 2013;Mathys et al., 2013) or thinning (Granier et al., 2008).The reasons and mechanisms for different functional responses to moderate disturbance are not clear, but these results have large potential implications, as the long-assumed future decline of production in aging stands is expected to reduce continental C sink strength (Birdsey et al., 2006).Recent empirical evidence indicates, however, that net ecosystem production (NEP, the ecosystem carbon balance) may be sustained or even increase in older forests that experience moderate disturbance (Luyssaert et al., 2008).For example, NEP in the ∼ 100 yr old Harvard Forest has more than doubled in the last 18 years (Keenan et al., 2012).More broadly, recent syntheses of North America's mixed temperate forests found no evidence for a substantial decline in NEP or net primary production (NPP) with age (He et al., 2012;Amiro et al., 2010).
Many ecosystem-scale models, designed for and tested in early to mid-successional forests with low biological and structural complexity, can be expected to have trouble reproducing these results (Landsberg and Waring, 1997;Raulier, 1999;Law et al., 2003;Li et al., 2003;Zhao et al., 2009).Such models are typically developed from, and tested most thoroughly against, classic primary-and secondarysuccession scenarios featuring stand-replacing or at least gap-size disturbances (Peters et al., 2013;Weng et al., 2012;Wang et al., 2014).Most model experiments using moderate (non-catastrophic) disturbance intensities have been per-formed in the context of timber management, e.g., assessing the sustainability of harvesting for a particular ecosystem or region (e.g., Peng et al., 2002;Rolff and Ågren, 1999).As a result, it is unclear whether most ecosystem models will be able to correctly simulate naturally occurring disturbances in mature forests, which may be spatially more heterogeneous and generally do not involve biomass removals.This is particularly important given the rapidly aging distribution of eastern US forests (USDA, 2013;Radeloff et al., 2012).
With moderate disturbances increasing in aging North American forests, and only an emerging understanding of the mechanisms underpinning such forests' resilience to disturbance, it is clearly important to understand how, and how well, forest models simulate these events.Doing so not only provides a quantitative assessment of model performance, but also may help identify knowledge gaps and processes missing or not properly implemented in ecosystem models more generally.This study tested three forest ecosystem models -a classic big-leaf model and two gap models -to understand how well they reproduce observed resilience to moderate disturbance in an experimentally manipulated forest and to explore specific mechanisms limiting model skill.

Site description
The study site is the University of Michigan Biological Station (UMBS, 45 • 35.5 N, 84 • 43 W), nested within a secondary successional forest that is comprised of bigtooth aspen (Populus grandidentata), northern red oak (Quercus rubra), red maple (Acer rubrum), paper birch (Betula papyrifera), and eastern white pine (Pinus strobus).Average overstory tree age in 2013 was 95 years.NEP in the unmanipulated footprint of the UMBS control tower (US-UMBS) was 0.80-1.98Mg C ha −1 yr −1 from 1999 to 2006, averaging 1.58 Mg C ha −1 yr −1 with substantial landscape variation (Gough et al., 2009).The forest was heavily logged in the late 1800s and early 1900s and disturbed by fire until 1923; its present-day plant composition is typical of many forests in the upper Great Lakes region (Gough et al., 2007).

The Forest Accelerated Succession Experiment
The Forest Accelerated Succession Experiment (FASET) is an ongoing experiment, in which > 6.700 aspen and birch trees (equivalent to 38 % of stand basal area) were stem girdled in 2008 within a 39 ha area.FASET is investigating how C storage and fluxes change following moderate disturbance as Great Lakes forests transition from an assemblage of early successional canopy trees to later successional canopy dominants.The experiment's overarching hypothesis is that forest NEP will be resilient following partial canopy defoliation and subsequently increase as canopies become more biologically and structurally complex and as nitrogen (N) not taken up by senescing aspen and birch trees is redistributed to other, longer-lived species assuming canopy dominance.The experiment employs a suite of paired C cycling measurements within separate treatment and control meteorological flux tower footprints.The C cycling parameters reported here for the control and treatment forests are aboveground biomass (AGB), gross primary production (GPP), ecosystem respiration (ER), leaf area index (LAI), total (aboveand belowground) NPP, and NEP.Site methodological approaches for the derivation of each are described by Gough et al. (2013Gough et al. ( , 2008)), but, briefly, AGB was estimated biometrically, using dendrometers and site-specific allometry; LAI from litter traps; NPP from biometry and fine root cores; and ER, GPP, and NEP (here treated as equivalent to net ecosystem exchange) from eddy covariance (Gough et al., 2013).
FASET results were most recently summarized by Gough et al. (2013).Briefly, the girdling treatment successfully expedited the mortality of early successional aspen and birch, promoting an emerging canopy that approximates projected regional changes in forest composition and structure (e.g., Wolter and White, 2002).In the first 4 years following disturbance, GPP and ER both initially rose in the treatment plots relative to the controls, while NPP and NEP were not significantly different in the control and treatment forests even though LAI in the latter declined by up to 44 % (summarized in Fig. 1).This high resilience of the C cycle was attributed to high N retention and rapid reallocation of this limiting resource in support of new leaf area production as aspen and birch declined (Nave et al., 2011).Decadal records of tree growth indicate that resilience to age-related declines in NPP is highest where a diversity of canopy tree species is present because later successional species rapidly compensate for the declining growth of early successional species (Gough et al., 2010b).Investigators are also finding that resilience of forest production to disturbance is dependent upon canopy structural reorganizations that enhance C uptake by increasing light use efficiency (Hardiman et al., 2011;Gough et al., 2013) and by hydrodynamic responses that increase postdisturbance water use efficiency in some species (Matheny et al., 2015).

Model descriptions
We tested three complementary models for their ability to replicate disturbance-related changes in production and LAI observed in FASET; model attributes and differences are summarized in Table 1.The first was a version of Biome-BGC (BioGeochemical Cycles) (Running and Hunt, 1993;Thornton et al., 2002).This model has coupled water, carbon, and nitrogen cycles (Thornton et al., 2007), uses a Farquhar photosynthesis submodel linked to prognostic leaf area, and runs on a daily timestep.The model partitions NPP into the leaves, roots and stems using dynamic allocation patterns, accounting for nitrogen and water limitations.It has been widely used for simulating carbon flows in forest ecosys-tems (Kimball et al., 1997;Pietsch et al., 2003;Tatarinov and Cienciala, 2009;Warren et al., 2011).We used a version of the model that incorporates an explicit disturbance mechanism (Bond-Lamberty et al., 2007).
The second model tested was ZELIG, a gap model based on the original principles of the JABOWA (Botkin et al., 1972) and FORET (Shugart and West, 1977) models.ZELIG simulates the growth, death, and regeneration of individual trees (Urban, 1990;Urban et al., 1991) in a two-dimensional grid of 400 m 2 cells (i.e., gaps) representing the forest canopy.Trees in each cell influence the availability of resources in adjacent cells, although direct tree-to-tree interactions are not represented (Taylor et al., 2009).ZELIG's main routines include growth, mortality, regeneration, and tracking environmental conditions.In each model timestep, forest processes (e.g., seedling establishment rate, diameter increment, survival rate) are reduced from their maximum potential rates based on available resources.Potential tree regeneration, growth, and survival are functions of light conditions, soil moisture, level of soil fertility resources, and temperature.The model runs on a monthly timestep.Specific details on the methodical approaches used in the model can be found in Urban (1990), Urban et al. (1991) and Larocque et al. (2006).ZELIG has been applied over many large-scale and diverse landscapes (see list and further references in Holm et al., 2012).
The third model was ED (Ecosystem Demography), a terrestrial biosphere model that uses size-and age-structure partial differential equations (PDEs) (Moorcroft et al., 2001) to approximate the behavior of a stochastic gap model on medium to large scales.It combines an individual-based gap model, describing a particular plant community, with a biogeochemical simulation of carbon, water, and nitrogen fluxes.Modeled processes include leaf-level photosynthesis, explicit competition for water and mortality, and C and N allocation above-and belowground (Moorcroft et al., 2001).Much of the soil model is based on that of CENTURY (Parton et al., 1987).ED then models subgrid (∼ 10 ha) disturbance heterogeneity using its PDEs to approximate the behavior of a spatially distributed ensemble of individual plants, and it has been used for a variety of optimization and data assimilation exercises (Medvigy et al., 2009).
It is important to note the complementary nature of these models: one is a classic "big-leaf" biogeochemical model focusing on process representation in a nonspatial framework, another a gap model representative of its class, and the third emphasizes mathematical scaling of a gap model across time and space.In addition, Biome-BGC's algorithms underlie the current version of the Community Land Model (CLM) (e.g., Bonan and Levis, 2010), while work is underway to make ED's algorithms an optional component in the next version of CLM.This provides a strong framework and motivation for examining whether the high C cycling resilience observed following FASET's moderate disturbance can be reproduced in modeling experiments.Biome-BGC was subjected to a pre-experiment optimization exercise, with the goal of algorithmically adjusting its parameters, within observational ranges, such that model output best matched the pre-experiment carbon stocks and pools of the UMBS forest.The choice of parameters to include was based on three factors: the known sensitivities of Biome-BGC (White et al., 2000); our a priori knowledge of the FASET research site and possible physiological mechanisms underlying forest resilience to disturbance (Gough et al., 2013); and known uncertainties in measured data (C.M. Gough et al., unpublished data).The final set of optimized parameters is shown in Table 2. Constraining against observed C stocks can provide significant improvements in model performance (Carvalhais et al., 2010); in this study, slow-turnover soil C, tree stem C, and NPP were used as constraining variables.For the parameter-space search itself we used a variant of the simplex algorithm (Nelder and Mead, 1965) that uses a randomly oriented set of basis vectors instead of fixed coordinate axes, as implemented (gsl_multimin_fminimizer_nmsimplex2rand) in Gnu Scientific Library version GSL-1.16 (Gough, 2009).For each combination of parameter values selected by the algorithm, Biome-BGC was "spun up", i.e., its slow soil pools were brought to equilibrium, and the C pools noted above compared to observed soil C values.A linear cost function ranked model performance, imposing a large penalty if a parameter varied more than 2σ (based on expert judgment) from its observed mean.ZELIG was parameterized with species-specific and sitespecific parameters representative of the UMBS study site.The silvicultural and biological parameters for each of the eight temperate tree species required for ZELIG are listed in Table 3, with species data collected in previous studies (Larocque et al., 2006;Leemans and Prentice, 1989;Holm et al., 2013).Soil field capacity (cm) and wilting point (cm) were determined from measurements at the study site (unpublished data).We used allometric equations to estimate aboveground biomass (AGB, Mg C ha −1 ), these were generated from on-site harvests at the UMBS site or from general allometric equations typical of northeastern trees (Gough et al., 2008).
ED's parameters were used from the versions developed for studying both anthropogenic and natural disturbance across US forests (Hurtt et al., 2002;Fisk et al., 2013).This configuration uses two tree functional types: a cold deciduous and an evergreen type.Allometric equations, leaf characteristics, and phenology parameters are described in Hurtt et al. (2002) and summarized in Table 4.
For the main modeling experiment, Biome-BGC and ZELIG were driven by identical reanalysis daily climate (NCEP (National Centers for Environmental Prediction); Kanamitsu et al., 2002) from 1970-2012 data, with mean values of air temperature (5.1 • C) and precipitation (575 mm yr −1 ).In contrast, ED used a climatology (i.e., with no year-to-year variation) comprised of the average monthly diurnal cycle for light, temperature and humidity, and mean monthly precipitation from the slightly warmer (mean 6.5 • C) North American Regional Reanalysis for 1979-2010(NARR, 2013)).We recognize that using different climatic inputs is not ideal, but Biome-BGC and ZELIG both took steps that made their results comparable to those of ED.For Biome-BGC, we used ensembling (Thornton et al., 2002) to characterize the mean climate and effect of interannual climate variability on model outputs, reporting model outputs as means ± standard deviation computed by running the model starting with each successive year in the climate data.For ZELIG, each year the model stochastically generated new monthly temperature and precipitation, based on the range provided by the NCEP data, thus also diminishing the effect of year-to-year variability in the input data.In summary, all model results are reported based on mean climatic conditions, not exact year-to-year changes.

Modeling experiment
As far as possible, we used the same experimental protocol with each of the three models.The models were spun up, i.e., brought to a steady state with a mature forest, and then the entire site was clear-cut, with all trees removed, i.e., harvested and the biomass taken away.This approximates the known stand-replacing disturbances of the early 20th century (Gough et al., 2007) in the UMBS forest.The models then allowed the forest to recover over 90 years before imposing 13-14 % harvests of basal area (ED and ZELIG) and biomass (Biome-BGC) in 2008, 2009, and 2010.This approach was used, as opposed to a single ∼ 40 % cut in 2008, to better mimic the slow death of girdled trees observed over 2-3 years in the FASET study, as lagged mortality has been shown to exert a strong influence on the modeling of forest disturbances (Dietze and Matthes, 2014).None of the models allowed for tree girdling, and we used harvests as a secondbest alternative; under this protocol, the models remove tree stems while allowing leaves and fine litter to decay on-site.This was consistent with our observations that girdled trees in FASET did not senesce at once and remained standing for multiple years without significantly decaying (Gough et al., 2013).
As ZELIG is an individual-based, species-specific forest demographic model, we had the ability to more precisely replicate the FASET experiment by only harvesting aspen and birch trees in the forest simulator.This allowed the remaining species to continue growing, starting from their trajectories prior to the harvest, but to be subject to less competition due to the removal of aspen and birch trees.Prior to beginning the girdling experiment, early successional aspen and birch accounted for 49 % of the basal area in ZELIG (versus 38 % in the FASET study site), and these species were preferentially removed to match the 13-14 % annual har-  vests used by the two other models.Although ED also tracks the dynamics of individual trees, the configuration used here was limited to two tree functional types (cf.Table 4).This precluded species-specific girdling; instead, 13-14 % of the basal area across all individuals was harvested annually for the 3-year period.
The disturbances occurred on 1 May in all models, replicating the timing of the girdling treatment just prior to spring leaf-out (Gough et al., 2013).We examined six primary model outputs at an annual resolution -GPP, ER, NPP, NEP (all these fluxes in Mg C ha −1 yr −1 ), maximum LAI (unitless), and aboveground biomass (Mg C ha −1 ) -comparing them to observed data for 0 to 4 years after disturbance.We particularly focused on the models' structure and flux dynamics, i.e., whether they could replicate the relative changes observed in FASET.

Results
Summarizing the models' absolute performance provides a useful context for evaluating their relative changes discussed below.Pretreatment (i.e., control plots in 2007-8) aboveground biomass and LAI were 81.2 ± 25.4 Mg C ha −1 and 4.3 ± 1.3, respectively (Fig. 1).The models' comparable values ranged from 51 (Biome-BGC) to 101 (ED) to 109 (ZELIG) Mg C ha −1 for biomass and 1.5 to 4.9 to 6.4 for LAI, respectively.Biome-BGC's forest, in other words, was significantly smaller than the observed data; ZELIG's slightly larger; and that simulated by ED roughly comparable.Observed pretreatment gross primary production (GPP) was 12.2 Mg C ha −1 yr −1 , and ecosystem respiration (ER) was 9.1 Mg C ha −1 yr −1 , resulting in net C fluxes of 6.6 and 2.2 Mg C ha −1 yr −1 for NPP and NEP, respectively (Fig. 1e, f).The models' pretreatment GPP values ranged from 2.2 (ED) to 6.8 (ZELIG) Mg C ha −1 yr −1 , with Biome-BGC roughly halfway between these two; all were thus much lower than observations.Con-  2a), although the decline occurred more slowly because of the protocol used in this modeling experiment (i.e., three successive years of 13-14 % cut instead of a single large girdling event).Leaf area index was less well reproduced: ED and ZELIG came close to capturing the magnitude of the observed decline (−30 and 33 %, respectively, compared to −37 to −44 % observed) but not the observed rebound of LAI by 2011 (Fig. 2d).Leaf area in Biome-BGC, in contrast, captured the timing and rebound of observed LAI but not its magnitude, as LAI only declined by 13 % in the model.
None of the models fully captured the main C flux dynamics observed in FASET.GPP initially rose in the treatment plots relative to the observed plots, but the models all simulated GPP declines (Fig. 1c) of up to 5 % (Biome-BGC), 10 % (ED), and 14 % (ZELIG).The models also all produced modest ER declines for 2008-2010, whereas observed ER rose by 10 % relative to control values; this is perhaps not surprising, given that our modeling protocol removed "girdled" trees from the ecosystem.Observed net primary production did not significantly differ between treatment and control plots (Fig. 1f), but the models all exhibited NPP declines, by up to 3 % (Biome-BGC), 10 % (ED), and 14 % (ZELIG).All models' treatment NPP had, however, recovered to control levels by 2012 (Fig. 2f).Net ecosystem production was also unchanged in the observations, while Biome-BGC NEP declined by 23-27 % (Fig. 2e).ED and ZELIG recorded even larger drops -of 79 and 43 %, respectively -although NEP had, like NPP, recovered to control levels 4 years following disturbance in all models.
The models' skill -i.e., how well they replicated both the magnitude and timing of all observed variables -is summarized in Fig. 3, a Taylor plot (Taylor, 2001) that is useful for summarizing both multiple aspects of complex models and relative skill.Here, all models exhibited low correlation (0.08-0.29) with observations, high root-mean-square difference (9-18 %) between simulated and observed values, and high standard deviation, implying overall low model skill.
In ZELIG, aspen and birch exhibited low to moderate resilience (i.e., full recovery to pretreatment basal area was not achieved) following moderate forest disturbance.The model also predicted which species thrived or declined postdisturbance (Fig. 4).Of the two treatment species that were girdled, aspen showed a stronger resilience and recovered to 71 % of pretreatment basal area after 4 years, increasing by 3.1 m 2 ha −1 .In contrast, birch remained at posttreatment basal area over the next 60 years, increasing by only 0.2 m 2 ha −1 .The ZELIG forest became dominated by red oak (Fig. 4), with that species' basal area increasing nearly 2-fold, followed by sugar maple and white pine, which increased by 72 and 6 %, respectively.Thirty years after disturbance, the total basal area as predicted by ZELIG was 33.6 vs. 32.7 m 2 ha −1 pretreatment, and recovery of basal area (a proxy for recovery of biomass) was achieved, even though ZELIG failed to capture the observed high resilience in C fluxes during the first 4 years after disturbance.
Similarly, the reduction in the number of individuals in ED resulted in a direct reduction in LAI, due to the strict allometric relationships used.Because NPP and NEP are so closely tied to LAI in ED, this resulted in low resistance to the disturbance event.

Discussion
Relatively few previous studies have examined how well models can simulate non-catastrophic forest disturbance.Peters et al. ( 2013 NPP for forest stands across the upper Midwest, and found that increasing intensity had no effect on deciduous species but decreased evergreen NPP.Wang et al. (2014) also used PnET-CN and reported that measured and modeled evergreen needleleaf forests had lower resilience to disturbance than deciduous forests.This agrees with Biome-BGC's behavior, in which broadleaf deciduous trees (such as simulated here) are less sensitive to moderate disturbance than are evergreen conifers (Thornton et al., 2002).The interaction of disturbance intensity and forest resilience thus has both short-and long-term effects, presenting significant challenges to models (Seidl et al., 2014;Dietze and Matthes, 2014).Gough et al. (2013) proposed several mechanisms supporting sustained C uptake and storage (in particular the fluxes in NPP and NEP) after the FASET disturbance: enhancement of canopy light use efficiency, maintenance of light absorption as later successional species take advantage of increased light availability, and redistribution of N from senescent to early successional trees (Nave et al., 2011).The three models used in this study are highly variable in their assumptions, parameters, and processes, and it is instructive to understand how and why each had difficulty reproducing the FASET results with respect to these proposed mechanisms.

Model mechanisms and behaviors
All the models here, along with most others (e.g., Potter et al., 2003), assume a fixed light use efficiency (LUE): trees in the model can produce more or less leaf area, intercepting more or less radiation, but that area will produce a fixed amount of photosynthate under particular environmental conditions of light, temperature, etc.In reality trees can produce leaves with different structural, chemical, and photosynthetic characteristics (e.g., Sardans et al., 2012).These changes, integrated across leaves within a forest canopy, would likely result in different post-disturbance biotic and   1.
abiotic dynamics; FASET has already shown the assumption of a fixed LUE not to be true at the stand level (Gough et al., 2013).Recent work has also shown that the use of a spatially variable LUE parameterization, using C flux measurements from the Fluxnet data set, can significantly improve the accuracy of modeled GPP (Madani et al., 2014).Maintenance of canopy light absorption in the FASET forest depends on a structurally heterogeneous canopy so that subdominant trees quickly increase their absorption following the girdling of canopy dominants (Gough et al., 2013).We would have expected, a priori, that ZELIG would be best able to simulate this dynamic, as it models a wide range of competing tree species, both early and late successional, competing in the same forest (Fig. 4).All models simulated small to moderate declines in both GPP and ER with disturbance, in contrast to the small observational increases.The differences were generally small, however, and both fluxes are not direct observations but rather derived from tower measurements of ecosystem exchange, and they thus use less well constrained than NPP and NEP.For this reason we consider the models' inability to replicate the absolute GPP and ER (which were 2-3 times higher than simulations) more troubling than their failure to exactly match the relative patterns shown in Fig. 2. NPP and NEP are better constrained observationally than are derived fluxes.Biome-BGC best maintained these fluxes with disturbance but for the wrong reason, i.e., too resilient a leaf area (Fig. 2b), rather than by increasing LUE when LAI declined in the FASET study.We note, however, that the Biome-BGC phenology submodel was quite accurate (cf.Gough et al., 2010a), a critical first step to accurately simulate stand C dynamics (Richardson et al., 2012).
The proximal reason for Biome-BGC's too strong resilience is that the fraction of photosynthetically active radiation absorbed by the canopy, FPAR, does not diminish change linearly with LAI changes.Radiation transmission and absorption through canopies is a complex, computationally expensive process, and the three models studied here all use a common simplification: Beer's law (Campbell and Norman, 1998), which models this process as an exponential decrease downwards through the canopy.Biome-BGC, ZELIG, and ED also all assume a (mostly) equal extinction coefficient, and this implies that the models' FPAR declines theoretically peaked at 3, 12, and 8 %, respectively (Fig. 5), compared to 6 % as measured in the field (Gough et al., 2013).The mathematical form of Beer's law means that FPAR declines are smallest at low and high LAI values.For Biome-BGC, with its low-biomass forest, this meant relatively small FPAR declines with disturbance, small to moderate quantities of stored C and N lost to disturbance, and enough stored C resources to fully leaf out the canopy and support photosynthesis over the growing season.
ZELIG and ED both matched the observed LAI decline, and reasonably approximated FPAR as well, but exhibited large declines in NPP and NEP for both models.In ZELIG, even with the post-disturbance increase in available light, the remaining subdominant species were not able to quickly increase their growth to make up the difference in NPP loss.This may be due to the inherent growth and life history strategies of these subdominant species, which is accounted for in the species parameterization and initialization of ZELIG (Table 3).Only one species, red oak, recovered quickly (Fig. 4 In a separate study, ZELIG-TROP, a modified version of ZELIG that simulates tropical forests, was successful at replicating a nonsignificant change in NPP as a result of gradual, less extensive tree mortality (Holm et al., 2014).That study used a continual low-level elevated mortality rate as a treatment, i.e., doubling annual background mortality rate, and ZELIG-TROP predicted highly resilient NPP.However, following a one-time dramatic disturbance event (removing 20 % of basal area) NPP also declined, matching the modeled results seen here.Thus, the ZELIG results are characteristic of the model and not dependent on the particular forest type, soils, or climate of the FASET experiment.
In ED, despite the increase in light availability following disturbance, the remaining undisturbed trees were not able to respond sufficiently to offset NPP loss.This may be due in part to the limited number of plant functional types used here not representing the competition of early and late successional species.Additionally, ED's scaling of individual trees to stand dynamics does not maintain the full level of canopy complexity, which may be required for resilience to a disturbance of this type.
Among the models tested here, nitrogen redistribution and limitation was only possible in Biome-BGC, as ZELIG lacks an N cycle, and ED's integrated N cycle was not parameterized or enabled in this study.Biome-BGC's integrated N cycle encompasses N fixation, its deposition and leaching, plant growth, and microbial decomposition, and should, in theory, constrain C uptake in many circumstances (Thornton et al., 2007).Such an effect was not noticeable here, however, as equal percentages of C and N were removed in the Biome-BGC disturbances (data not shown); this implies leaching/loss, i.e., a lack of N conservation as opposed to what was observed in FASET (Nave et al., 2011).This may also partly be an artifact, as all models used stem biomass removals to simulate the real-world girdling (although in Biome-BGC leaves were transferred to the litter pool, providing some N reallocation).We speculate, however, that excessive N limitation was a factor in the model's inability to match the C stock and flux values of the UMBS forest.
In summary, the biological mechanisms proposed (Gough et al., 2013) to explain the carbon-cycle resilience of a midsuccessional forest to disturbance are ones that most models either do not simulate (integrated C and N cycles, changing light use efficiency) or do so only crudely (canopy structure, heterotrophic respiration).At fine spatial scales, factors such as canopy structure can be simulated, but the computational demands are large and thus impractical for larger-scale models (Caspersen et al., 2011), a consideration that inspired the development of models such as ED (Moorcroft et al., 2001).Similarly, how to translate the N recycling microbial dynam- ics into ecosystem-to global-scale models is an area of intense research (Wieder et al., 2013), as most models (including those tested here) use a few conceptual soil pools following simple first-order kinetics.C-N integration inside such models is increasingly common (Zaehle et al., 2014;Thornton et al., 2007), enabling N redistribution and limitation dynamics, and should improve future simulations of moderate disturbances.

Conclusions
The FASET results were unexpected and intriguing (Nave et al., 2011;Gough et al., 2013;Hardiman et al., 2013).How well can current forest models simulate such moderate, i.e., not stand-replacing, disturbances?Not all disturbances, even of the same severity, equally affect biogeochemical processes that support recovery -for example, slow versus immediate tree death have very different consequences (Franklin et al., 1987).Our results suggest that some ecosystem models, developed to simulate processes following stand-replacing disturbances, may not simulate gradual death scenarios well (McDowell et al., 2013), specifically nonlinear or threshold responses of the carbon cycle in disturbance intensity (Goodrich-Stuart et al., 2015) over short timescales.Their skill over longer (decadal) periods remains an open ques-tion.This is particularly important as the moderate disturbances associated with slow tree death (insect outbreaks, fungal pathogens) are on the rise worldwide (Allen et al., 2010) and in aging US forests.It is thus increasingly important to confront models with non-catastrophic disturbance scenarios.

Figure 1 .
Figure 1.Observed data from FASET treatment and control forests.Panels include (a) aboveground biomass (AGB, in Mg C ha −1 ), (b) ecosystem respiration (ER, Mg C ha −1 ), (c) GPP (Mg C ha −1 ), (d) leaf area index (LAI, unitless), (e) net ecosystem production (NEP, Mg C ha −1 ), and (f) net primary production (NPP, Mg C ha −1 ).Vertical shaded area shows approximate time of the girdling treatment described in the text.Error bars indicate ±1 SD based on eight measurement plots(Gough et al., 2013).Control and treatment sites had near-identical data in 2006 and 2007, and thus the latter (dashed) line is not visible in panels (a), (d), and (f) in those years.

Figure 2 .
Figure 2. Model performance in replicating the FASET experiment.Panels include (a) aboveground biomass (AGB, in Mg C ha −1 ), (b) ecosystem respiration (ER, Mg C ha −1 ), (c) GPP (Mg C ha −1 ), (d) leaf area index (LAI, unitless), (e) net ecosystem production (NEP, Mg C ha −1 ), and (f) net primary production (NPP, Mg C ha −1), all expressed on a common normalized scale (relative change between treatment and control).Vertical shaded area shows approximate time of the girdling treatment described in the text.Vertical lines show May 1 forest harvests imposed in the Biome-BGC, ED, and ZELIG models.

Figure 3 .
Figure 3. Taylor diagram(Taylor, 2001) summarizing model skill at predicting all (AGB, ER, GPP, LAI, NEP, NPP; cf.Fig.1) observed data, normalized relative to the control forest.The standard deviation of the simulated data (colored by model) is gauged by the radial distance from the origin and can be compared to the observed data (circle on horizontal axis); model correlation to observations is found by azimuthal position; the curve contours show root mean square error (%).

Figure 4 .
Figure 4. Species-specific basal area trajectories simulated by ZELIG, before and after the 2008-2010 tree removals mimicking the FASET experiment.Species codes are as in Table1.

Figure 5 .
Figure 5.Effect of disturbance on fraction of photosynthetically active radiation absorbed by the canopy (FPAR).Observed line is based on data from Fig. 4 in Gough et al. (2013).Model lines show implied (i.e., theoretical, based on Beer's law) FPAR based on the observed and modeled leaf area index values and a common extinction coefficient of k = −0.45, the model mean.

Table 1 .
Comparison of the models used in this study.LUE stands for light use efficiency, APAR stands for absorbed photosynthetically active radiation, and PFT stands for plant functional type.

Table 2 .
Selected site-specific parameters used by Biome-BGC.Model inputs differ from observed values because of the optimization procedure used (see Methods section).

Table 3 .
(Urban, 1990)fic allometric and ecological parameters for the eight tree species used in ZELIG, representing species found in the upper Great Lakes.All species were assigned a probability factor of stress mortality of 0.369, a probability factor of natural mortality of 2.408, and a zone of seed influence of 200.Full explanations of all parameters can be found in the original ZELIG paper(Urban, 1990).

Table 4 .
Allometric and ecological parameters used in the ED model.The two plant functional types represent generic cold deciduous hardwood and evergreen needleleaf trees, respectively.Mg C ha −1 yr −1 , respectively), while ED was 8.2 Mg C ha −1 yr −1 .ZELIG was very close (2.1 Mg C ha −1 yr −1 ) to the observed NEP value, with ED and Biome-BGC much smaller (1.4 and 0.3 Mg C ha −1 yr −1 , respectively).In summary, pretreatment carbon stocks and fluxes varied significantly among the models, with Biome-BGC consistently low -a smaller forest producing and sequestering less C. The other two models varied in their fidelity to observations, with only ED able to achieve observed NPP, while ZELIG was closest to overall C balance, but neither could achieve the high observed GPP values.Aboveground biomass declined by 35-36 % between 2006 and 2010 in the FASET experiment.The models tracked this well (Fig.

www.biogeosciences.net/12/513/2015/ Biogeosciences, 12, 513-526, 2015 B. Bond-Lamberty et al.: Moderate forest disturbance as a stringent test and
), while the remaining dominant species subdominant species could not contribute to an increase in NPP and NEP.Based on the current model structure of ZELIG, leaf production and leaf loss are tightly linked with NPP and NEP; therefore, the decline in LAI corresponded to a resulting decline in C fluxes.