© Author(s) 2010. CC Attribution 3.0 License.

Abstract. Terrestrial biosphere models show large differences when simulating carbon and water cycles, and reducing these differences is a priority for developing more accurate estimates of the condition of terrestrial ecosystems and future climate change. To reduce uncertainties and improve the understanding of their carbon budgets, we investigated the utility of the eddy flux datasets to improve model simulations and reduce variabilities among multi-model outputs of terrestrial biosphere models in Japan. Using 9 terrestrial biosphere models (Support Vector Machine – based regressions, TOPS, CASA, VISIT, Biome-BGC, DAYCENT, SEIB, LPJ, and TRIFFID), we conducted two simulations: (1) point simulations at four eddy flux sites in Japan and (2) spatial simulations for Japan with a default model (based on original settings) and a modified model (based on model parameter tuning using eddy flux data). Generally, models using default model settings showed large deviations in model outputs from observation with large model-by-model variability. However, after we calibrated the model parameters using eddy flux data (GPP, RE and NEP), most models successfully simulated seasonal variations in the carbon cycle, with less variability among models. We also found that interannual variations in the carbon cycle are mostly consistent among models and observations. Spatial analysis also showed a large reduction in the variability among model outputs. This study demonstrated that careful validation and calibration of models with available eddy flux data reduced model-by-model differences. Yet, site history, analysis of model structure changes, and more objective procedure of model calibration should be included in the further analysis.


Introduction
The terrestrial biosphere plays important roles in regional and global carbon cycles, as well as in climate, through biogeochemical and biophysical processes.Large uncertainties in terrestrial biosphere models and the potential effects of these uncertainties on the projection of future global environmental changes were pointed out through an intercomparison study of coupled carbon cycles and climate models (Friedlingstein et al., 2006).Assessment and refinement of terrestrial biosphere models are essential to improve estimation of terrestrial carbon budgets and future projections of environmental changes.Several Dynamic Global Vegetation Models (DGVMs) have also been evaluated in an earth system model framework (Sitch et al., 2008), and large 2062 K. Ichii et al.: Multi-model analysis of terrestrial carbon cycles in Japan differences were found in the simulation results among the different models.
To compare the performance of terrestrial biosphere models for simulating carbon and/or water fluxes and evaluate differences among model, terrestrial biosphere models have been compared for global (e.g., Cramer et al., 1999, 2001), regional (e.g., VEMAP members, 1995;Jung et al., 2007), and point scales (e.g., Potter et al., 2001;Gerten et al., 2008).These studies evaluated a number of terrestrial biosphere models, and the models' outputs showed large variability in their estimates of terrestrial carbon cycles.For example, Cramer et al. (1999) found that the range of global net primary productivity was 40-80 GtC yr −1 (with an average of 55±10 GtC yr −1 ), and Cramer et al. (2001) showed that the future terrestrial carbon budget showed considerable scatter.One of the case studies that successfully reduced uncertainties in the estimation of carbon cycles was an intercomparison of atmospheric transport models and estimated atmosphere-ocean and atmosphere-land carbon cycles (Transcom;Gurney et al., 2002).The detailed comparison with multi-model outputs successfully identified a reliable carbon budget, as well as constraints that are needed for more reliable estimations.
However, most intercomparison studies of terrestrial biosphere models have not been performed systematically, and some models have been used without appropriate validation.Recently, multiple terrestrial biosphere models were used to analyze the interannual variability of terrestrial carbon budgets for continental scale monitoring of terrestrial biospheres (Reichstein et al., 2006;Vetter et al., 2007;Piao et al., 2009;Hashimoto et al., 2010).These biosphere models were used to identify key mechanisms behind the recent global changes and the impacts of these changes on terrestrial biospheres; however, large differences in each biosphere models still lack calibration with observations (In this paper, calibration is defined as "Adjustment of the parameters of a mathematical or numerical model in order to optimize the agreement between observed data and the model's prediction."based on American Meteorological Society Glossary of meteorology, http://amsglossary.allenpress.com/glossary/browse?s=m&p=49).These multi-model differences could potentially be reduced by using the observed data to calibrate and validate the models and by analyzing the current status of each model.As stated before, previous model intercomparison projects generally lacked detailed model calibration and evaluation using observations, resulting in errors in the simulations.Owing to the recent increase in the availability of relevant data (Baldocchi, 2008), model improvements have been carried out using eddy flux data, and the impact on the model simulations was evaluated on a global scale (Friend et al., 2007;Stokli et al., 2008), as well as on a regional scale (Ueyama et al., 2010).Since the differences among multi-models should be reduced by the constraints from the observed data, potential differences should be evaluated after each model has been calibrated based on observations.
In this study, we assessed the extent to which differences between the models can be reduced by calibrating the models using eddy flux data from Japan.First, we conducted baseline simulations using default model settings without using prior eddy flux observation information.These simulations were run on both a point-scale (four flux tower sites) and a regional scale.Then, using the eddy flux data as model constraints, we modified the model parameterizations.Finally, based on the modified models, we ran the models at point and regional scales and analyzed the inter-model variability of the outputs.We report the extent to which model calibration using the eddy flux data can reduce the uncertainties and differences in the modeled carbon cycles for both point and regional scales.
Among the eight ultimate goals of the CarboEastAsia A3 Foresight program (http://www.carboeastasia.org),our group is ultimately aiming at one goal: to quantify the strength of carbon sinks/sources and their spatial and temporal variability (dynamic) and uncertainty (prediction).As a first step, we selected a test region in Japan where the flux network was dense and could be used as a case study.This region was used to test the extent to which ground observations can reduce uncertainties in carbon cycle estimations at a regional scale.By comparing terrestrial biosphere models, we aimed to identify the current potential problems in terrestrial ecosystem models to better estimate terrestrial carbon budget in East Asia.

Study area and eddy flux observation sites
We focused our analysis on Japan (Fig. 1).More than 75% of the land is covered by forests, mainly with mixed forests, deciduous broadleaf forests, and evergreen needleleaf forests.Generally, the climate is characterized as warm-temperate and humid (southern half) and cool-temperate and humid (northern half), and the topography is mountainous.
In this analysis, we used data from four eddy flux observation sites for the point experiments.These sites included the Fujiyoshida forest meteorology research site (FJY), the Takayama deciduous broadleaf forest site (TKY), the Teshio CC-LaG experiment site (TSE), and the Tomakomai flux research site (TMK).Details for each site are given in Table 1.The FJY site is located in Fujiyoshida City, which is on the Kenmarubi lava flow on Mt.Fuji (35 • 27 N, 138 • 46 E).The dominant species is the Japanese red pine (Pinus densiflora), and there are also mixed evergreen and deciduous broadleaf trees, such as Japanese holly (Ilex pedunculosa) and azalea (Rhododendron dilatatum).The climate is relatively cool for its latitude, and snow depth reaches up to 0.5 m in winter.The TKY site is located in a mountainous region in the central part of the main island of Japan (36 The site is about 15 km east of Takayama City.The forest is

Terrestrial biosphere models
Terrestrial biosphere models are conventionally separated into four categories: empirical models, diagnostic models,   In this study, we used 9 terrestrial biosphere models (Table 2) covering all four categories described above.The models include one empirical model (Support Vector Machinebased regression (SVM) Yang et al., 2006Yang et al., , 2007)), two diagnostic terrestrial biosphere models (the Carnegie-Ames-Stanford Approach Model -CASA; Potter et al., 1993;and the Terrestrial Observation and Prediction System -TOPS; Nemani et al., 2003), three prognostic terrestrial biosphere models (Biome-BGC; Thornton et al., 2002;DAYCENT;Parton et al., 1998; Vegetation Integrative SImulator for Trace gases -VISIT; Ito et al., 2007), and three dynamic global vegetation models (the Lund-Potsdam-Jena Model -LPJ; Sitch et al., 2003; Spatially-Explicit Individual-Base DGVM -SEIB; Sato et al., 2007;and Top-down Representation of Interactive Foliage and Flora Including Dynamics -TRIFFID; Cox, 2001).The details of each model are described in Appendix A.

Flux tower observation data
We used climate data from four eddy flux observation sites as the inputs for point-scale models, and we used the carbon fluxes data to validate the point-scale models.The four sites (FJY, TKY, TSE, and TMK) are described in Sect.2.1.The temporal coverage of the observation data was 2000-2004for FJY, 2001-2003for TKY, 2002for TSE, and 2001-2003 for TMK.The methods of gap-filling and flux-partitioning (i.e., infer Ecosystem Respiration -RE -and Gross Primary Productivity -GPP -from Net Ecosystem Exchange -NEE) were based on Hirata et al. (2008).Based on the suitable temporal scales for each model, the observed climate data were converted into corresponding temporal averages.Monthly variations in the carbon fluxes (GPP, RE and NEP) were calculated for model validation.Although the GPP and RE were derived from observed NEP, we describe them as observed GPP and RE throughout the paper.

Satellite-based time variable data
We used eight-day or sixteen-day composites of the Moderate Resolution Imaging Spectroradiometer (MODIS)-based Land Surface Temperature (LST) (Wan et al., 2002), the Vegetation Indices (VIs) (the Normalized Difference Vegetation Index -NDVI -and the Enhanced Vegetation Index -EVI) (Huete et al., 2002), and the Leaf Area Index (LAI)/Fraction of Photosynthetically Active Radiation absorbed by a canopy (FPAR) (Myneni et al., 2002) from 2001 to 2006.The original temporal resolutions of LST, LAI/FPAR, and VIs were 8-day, 8-day and 16-day, respectively.Therefore, we converted the temporal resolution to fit each model's temporal resolution.
For point scale analysis, we used MODIS 1-km resolution subset data sets (collection 5; http://daac.ornl.gov/MODIS/),each of which consisted of 7-by-7 km regions centered on the flux towers (Cook, 2004).At each time step, we averaged the MODIS observations by only using high-quality pixels (with the mandatory quality assurance -QA -flag being good in the QA data) based on the method of Yang et al. (2007), and missing data were replaced by a 2001-to-2006 average calculated using high-quality pixels.For the spatial analysis, we created 4-km spatial resolution data using the original 1-km data of MODIS LST, VIs, and LAI/FPAR (collection 4).All gaps in the data were filled using averaged 8-day data calculated from 2001 to 2006 at each grid point if the QA flags were not acceptable.

Climate data
Each terrestrial biosphere model requires climate data as inputs.Necessary climate data include temperature (daily maximum temperature -Tmax, daily minimum temperature -Tmin, daily average temperature -Tave), precipitation (Prec), vapor pressure deficit (VPD), relative humidity (RH), incoming solar radiation (Srad), and wind speed (Wind).For point analysis, we used either observed climate data from each flux site (SVM, CASA, TOPS, Biome-BGC, DAYCENT, LPJ, TRIFFID) or long-term climate reanalysis data  from NCEP/NCAR reanalysis 1 (Kalnay et al., 1996) that was extracted at the corresponding pixel and corrected using site observations (VISIT and SEIB-DGVM).Temporal averaging (e.g., daily, 8 days, and monthly) was applied to fit the model time-step.

Static data
The models needed several ancillary data sets to complete the simulations.All necessary data were derived from the global set of soil data (Global Soil Wetness Project -GSWP -2 Input data; http://www.iges.org/gswp/input.html),MODIS

Experiments
The study consisted of two steps for each model run and evaluation: point and spatial analysis.Point runs were conducted for the four flux sites in Japan, and spatial runs were conducted for the entire country.Each step included two model runs: default and modified model runs.The default model run was conducted as a benchmark test, and the modified model run was conducted to analyze the extent of model improvement when the eddy flux data were used as constraints.For RE and NEP simulations, SVM and TOPS were excluded since these models do not simulate them (see also Appendix A1 and A2).

Point model runs
First, as a benchmark test, we ran all models using the default model settings for each eddy flux site, using input meteorological and satellite data.Then we tuned all models to fit the observed GPP, RE, and NEP data by adjusting the model parameters iteratively.We set the following guideline to modify the model; (1) All procedures for the model parameter tunings and structure changes were done through hand-by-hand approach.Through the sensitivity analysis of model parameters, several key model parameters were selected and tuned to fit the observed carbon cycles.
(2) If model parameter tunings were not sufficient to reproduce observed carbon cycle, model structure changes were also allowed.
(3) We tried to apply minimum changes to fit the observed carbon cycles.As a result, no models needed further algorithm changes, and model modifications were achieved.Details of model modifications are described in Appendix A and Table 3.The model initialization or spin-up processes are also described in Appendix A.
Using the results from default and modified models, we first analyzed the modeled seasonality of GPP, RE, and NEP from all models, focusing on how well the models could simulate the observed carbon cycle seasonality and how much the model-to-model variability could be reduced by the model tuning process using eddy flux data as constraints.For each model run, we obtained the average monthly variation using the whole observation period for GPP, RE, and NEP, and calculated its standard deviation among the models for each month for both the original and modified models.
Second, the interannual variations in GPP, RE, and NEP were analyzed for all models.To test how each model reproduces the anomalies of interannual variability in the carbon cycle, we focused on the years 2001-2003 by taking an overlapping period and calculated the monthly anomalies by subtracting the mean monthly variations for 2001-2003.The TSE site was excluded because its observation period was only 1 year.

Spatial model runs
Using both the original and modified models, we conducted spatial analysis for the grids with forest vegetation cover in Japan (Fig. 1).All spatial information was used to run all the models.Spin-ups were conducted using whole 1990-2006 climate data with the CO 2 concentration fixed at the 1990 level, and we ran the model from 1990 to 2006 with a time-variant CO 2 concentration.Only the results from 2001-2006 were analyzed.As for the modified models, we used the models that had been calibrated in the point simulation.
To evaluate the outputs from multiple models, basic statistics were calculated along with the annual average of GPP, RE, and NEP by taking the average from the 2001-2006 period, and basic statistics (average and standard deviation) were calculated.Then, interannual variability in GPP and NEP were analyzed by calculating the total for all of Japan.We did not use DAYCENT for spatial analysis due to technical reasons (see also Appendix A3).

Default model simulations
Simulated monthly GPP, RE, and NEP data showed large deviations from the observations, and there was large modelby-model variability for all sites (Fig. 2).For GPP, most models underestimate its seasonal amplitude, and seven (SVM, CASA, TOPS, VISIT, LPJ, SEIB, and TRIFFID) out of eight models clearly underestimated the GPP seasonal amplitude.One model (Biome-BGC) overestimated the seasonal amplitude for FJY, three models (CASA, LPJ and TRIFFID) underestimated the seasonal amplitude for TKY and TSE, and most models underestimated the seasonal amplitude for TMK.The timing of the start of the growing season also deviated from the observed start times.Some models estimated a significantly earlier start of the growing season (Biome-BGC for FJY, TOPS for TKY and TMK, and DAYCENT for FJY), while others estimated later start of the growing season (DAYCENT for TKY and TMK).For RE, similar to GPP, most models underestimated the seasonal RE amplitude.The seasonal NEP amplitude was also underestimated, especially at the TKY and TMK sites.
Ensembles of multiple models also underestimated the seasonal amplitude of GPP, RE, and NEP (except at the FJY site) with large model-to-model differences in the simulated seasonal carbon cycle (yellow in Fig. 2).Comparison of the observed and modeled (multi-model ensemble mean) monthly variations in GPP, RE, and NEP clearly indicated that the ensembles from the 9 models underestimated the seasonal amplitude (the slopes of the regression lines were less than 1, namely, 0.64, 0.63, and 0.51 for GPP, RE, and NEP, respectively) (Fig. 3).These differences might be one of the reasons that current terrestrial biosphere models are biased and might be the cause of large differences among the model outputs.
Commonly, the differences in GPP, RE and NEP between the model results and the observations can be explained by two reasons.First, the underestimation of seasonal amplitude of GPP, RE, and NEP is probably the result of inappropriate modeling of photosynthesis activities, which results in underestimation of RE.Since NEP is a difference of GPP and RE, these small biases may affect NEP.Second, the deviation of seasonal timing of the start of the growing season can be explained by empirical (e.g. the use of temperature summation to determine the threshold for the start of the growing season) and inaccurate phenolgy models included in terrestrial biosphere models.Accordingly, all the models were modified to reduce such biases (see the Appendix A for details).

Modified model simulations
All the models were modified by fitting to the magnitude and patterns of the observed fluxes.Then we found model improvement in terms of carbon cycle component for most sites (Fig. 4).Seasonal variations in GPP and RE and the amplitudes of these variations were accurately simulated by most models for all sites.The reductions in model-to-model differences at the FJY site were especially large, and the seasonal variations in GPP, RE and NEP were close to the observations for TKY, TSE, and TMK.Only RE at TKY showed a small overestimation by most models.As a result of changes in simulated GPP and RE, we found that the seasonal NEP amplitude was also much more accurately simulated in the modified model, especially at the TKY and TMK sites.Only SEIB-DGVM showed deviations from the observations, which were probably caused by the difficulty of representing forest dynamics stably.Large disturbances (e.g., death of big trees) might affect carbon dynamics greatly and result in large gaps between the modeled and observation values.
After the tuning, the multi-model ensemble mean of monthly GPP, RE, and NEP values across four sites was also closer to the observations (Fig. 3).We obtained a correlation between the modeled and observed carbon cycle components that was closer to 1:1.For example, the slopes of the regression lines became closer to 1, changing from 0.64→0.88for GPP, 0.63→0.80 for RE, and 0.51→0.75for NEP.The R 2 values were also higher than those from the default simulations.
Model-to-model differences, as measured by the standard deviations of multi-model outputs, were also greatly reduced by the model modification process for GPP simulation (Fig. 5).For example, the standard deviations of the modeled GPP values were reduced by 57, 19, 3, and 10% for FJY, TKY, TSE and TMK, respectively.On the other hand, the standard deviation of RE (except for FJY) and NEP (except for TKY) were not largely changed by the model modifications (Fig. 5).The difficulties with the modeled RE and NEP calibration are potentially the result of several issues.First, small biases in GPP and RE greatly affect NEP; therefore, it is difficult to improve NEP in the model.Second, inclusion of site history is needed to simulate RE and NEP accurately (Friend et al., 2007).Since these forests are not predominantly natural forests, forest regrowth can be one of the important mechanisms for current carbon budgeting.Third, observations sometimes contains biases in the closure of the energy balance, and gap-filling and flux partitioning technique (Massman and Lee, 2002;Foken, 2008).

Interannual variations
Interannual variations in terrestrial carbon cycle processes (GPP and NEP) were generally well reproduced by the models (Fig. 6).Hirata et al. (2007), Saigusa et al. (2008) (Saigusa et al., 2008) at TKY and TMK were captured by most models.In the summer of 2003, negative GPP anomalies for FJY and TKY and a positive anomaly for TMK that were caused by radiation anomalies (Saigusa et al., 2008(Saigusa et al., , 2010) ) were well-captured by the models.NEP anomalies were also generally reproduced by the model.Especially for the TMK site, both GPP and NEP anomalies for the whole period were well reproduced by the models (R 2 =0.74 for GPP and 0.81 for NEP).
Some models showed deviations from observations and from other model outputs, indicating the need for a more detailed model assessment.For example, the year-to-year anomalies in GPP were much higher in the TOPS model simulation, especially for TKY, and some diverse responses in GPP and NEP were found using the DAYCENT model simulation for the spring of 2001 and 2002 for TMK.Other deviations from the observations and from other models were found for FJY and TKY with the SEIB simulation.These deviations were primarily caused by the model responses to anomalies in input parameters such as climate data and satellite data, and these facts need to be analyzed in detail to further constrain the terrestrial biosphere models.

Annual summary
In the default model simulations, we found large differences in the annual average GPP from 2001 to 2006 among the models (Fig. 7).The differences were primarily produced by the site-level differences in the model. of GPP on the regional scale; CASA and TRIFFID estimated much lower GPP values than did BIOME-BGC and VISIT (e.g., about 1200 gC m −2 yr −1 by CASA and TRIF-FID and 2000 gC m −2 yr −1 by BIOME-BGC and VISIT), which resulted in large standard deviations among the models (500-600 gC m −2 yr −1 ).The spatial patterns in GPP based on the modified models were similar in magnitude among the models, reducing the model-to-model GPP differences (Fig. 8).For the annual GPP, most models estimated about 2000 gC m −2 yr −1 for most of Japan, and the standard deviation among the models was significantly reduced (300 gC m −2 yr −1 ).
Annual total statistics in Japan were also greatly changed by the model modifications (Table 4).Both total GPP and RE were increased by the model modifications, and their standard deviations among the different models were reduced by about half (e.g., 106→58 for GPP, and 117→70 for RE; Table 4).Standard deviations in NEP were not significantly changed, which is probably the result of a compensating effect by taking differences in GPP and RE.

Interannual variability
Simulations of the interannual variability also showed yearto-year variability in the model outputs (Fig. 9).For example, during the 2003 summer, we experienced a strong reduction in solar radiation with heavy rain in 2003 in Japan (Saigusa et al., 2010).For that time, the model simulated large reductions in GPP for most of the study region.During the spring of 2002, warmer temperatures led to a significant enhancement in GPP in most ecosystems, causing significant increases in NEP.These anomalous patterns were successfully simulated in most of the terrestrial biosphere models used in this study.In addition, standard deviations of GPP among the models were also reduced by the process of model modifications on the point scale.

Implications of the results of the calibrated biosphere models
We constrained the modeled seasonal variation and its magnitude in annual totals of GPP and RE at both point and regional scales by using the observations.In terms of variability in the modeled seasonal variation at point scale, the modified model resulted in standard deviations that were reduced by 22%, 14%, and 1% in magnitude for GPP, RE, and NEP, respectively (values are the average of the standard deviations of the modeled GPP, RE, and NEP at each site).In the spatial scale, standard deviations among modeled GPP, RE, and NEP were also reduced by 45%, 40%, and −20%, respectively.To reduce differences in the NEP simulation, the site history should be included in further studies.uptake.Therefore, including the site history could potentially improve the net ecosystem exchange.
A multi-model ensemble simulation of the terrestrial carbon cycle is effective in estimating seasonal and interannual variability.With the default model settings, we found large variations in the seasonal carbon cycle; however, an average of all models reasonably estimated terrestrial carbon cycles, though with some underestimations.The modified models simulated seasonal and interannual variations in the carbon cycle, with smaller standard deviations compared to the default model settings.
This model intercomparison analysis has several implications for the ground observations.In the process of making modifications to the models, we commonly adjusted the maximum photosynthesis activity, the Q 10 value of respiration, the stomatal conductance, and phenology-related parameters for most models.The fact that these parameter tunings were required suggests the necessity of further constraints from observations.For further model improvements, the accurate estimation of highly-sensitive site-specific parameters and their spatial distributions should be achieved.Then, mechanisms and universal relationships should be inferred.

Potential limitations and implications on the understanding of the terrestrial carbon budget
Several potential problems are listed in this section.First, in this study, we have one or two observations in each land cover class.It is ideal that we use multiple sites to calibrate the model within the same plant function types, train the model against one site, and evaluate the parameter against other sites.Second, the model calibration conducted in this study can constrain short-term (daily-seasonaly-yearly time scales) processes, but cannot constrain long-term processes such as forest dieback and vegetation transition.Model calibration using longer-term observation should be conducted in the future.Third, carbon cycles of the ecosystems in this study show sensitivities to temperature and radiation, but not to water availability.Model sensitivity to water availabilities is one of the unknown facts for ecosystem modeling (Morales et al., 2005).Fourth, more objective methods of model parameter tuning such as to set a cost-function and apply optimization routine should be applied in further studies.In this study, parameter tuning is done by iteration, and part of model differences may be reduced by applying these methods.Fifth, we mostly focused on model parameter tuning in this study.Most terrestrial biosphere models used in this study were developed outside of Asia, and we found that direct application of these models may cause problems in the simulation of carbon cycles.This is partially due to the unique ranges in annual climate covered by the AsiaFlux network and caused by the Asian monsoon climate.The eddy flux data show that AsiaFlux sites cover a wider range of ecosystems, especially in terms of annual precipitation, than those reported in other FLUXNET observation networks such as AmeriFlux and CarboEurope (precipitation in most observation sites of AmeriFlux and CarboEurope is less than 2000 mm yr −1 ).Therefore, the humid temperate climate in Asia and Japan requires further model parameter calibration and model structure and algorithm changes to accurately simulate the observed carbon cycle.
To improve the models, we need to address two items.First, the site history should be included in any further studies in order to improve the NEP simulation as described above.Second, to estimate the nationwide carbon budget for Japan, we need further observations of cropland, grassland, and evergreen broadleaf ecosystems.

Conclusions
We demonstrated that eddy flux data are useful in constraining the terrestrial biosphere models and in helping to reduce model-to-model variability and difference at both point and regional scales through the comparison of 9 models.After the model parameter tuning, the model-by-model differences in GPP and RE were greatly reduced.For NEP, although the ensemble of modified model was improved, model-bymodel differences did not show large reduction.We discussed the limitations and implications of this study to constrain the simulation of terrestrial carbon cycle.In particular, we need to include site history for improvement of simulated NEP and to assess model algorithm uncertainties.The uncertainties in model algorithm will be fully explored in the future through more rigorous investigations by adding other constraints to the model, such as biomass and soil carbon and LAI, as well as some water cycle estimations.For example, Ito et al. ( 2010) investigated the differences in modeled soil respiration using five models out of nine used in this study, and identified a large discrepancy of contribution of roots to total soil respiration among models.
This study, a case study in Japan, is the first step in reducing the differences and uncertainties among the terrestrial biosphere models.Therefore, it will be extended into the whole Asian monsoon region in the next step as a subproject of the A3 CarboEastAsia Program after solving the potential limitations mentioned above.The use of systematically gap-filled and flux-partitioned data over Asia, covering Siberia, Mongolia, China, Korea, Japan, and southeastern Asia regions, can potentially characterize the similarities or differences between the Asian terrestrial ecosystem and other regions, such as North America and Europe.The Asian monsoon region includes potentially unique terrestrial ecosystems, such as larch forest over the northern region, temperate and humid forest over East Asia, and paddy fields over east and southeast Asia, that should be analyzed in future studies by refining the model using observations.The future responses of these ecosystems to climate change should also be analyzed.

Overview of the terrestrial biosphere models and their modification
We used 9 terrestrial biosphere models from four different categories including empirical, diagnostic, prognostic, and dynamic models.Below are brief descriptions of each ecosystem models in each category and the modifications we made to the models.

A1.1 Support Vector Machine-based regression (Yang et al., 2006, 2007)
Support Vector Machine-based regression (SVM) is an empirical model based on the regression-type support vector machine driven by inputs of satellite-based surface radiation (Rad), land surface temperature (LST), and enhanced vegetation index (EVI) (Yang et al., 2006(Yang et al., , 2007)).Support Vector Machine regression is a machine learning technique that transforms nonlinear regressions into linear regressions by mapping the original low-dimensional input space to a higher-dimensional feature space using kernel functions (e.g., Cristianini and Shawe-Taylor, 2000).The method was assessed for more than 20 Ameriflux sites over the continental United States to estimate spatial distributions both in evapotranspiration (Yang et al., 2006;Ichii et al., 2009) and in gross primary productivity (Yang et al., 2007).The model calculates GPP only as carbon cycle component.The model output has also been used for inverse estimation of key biosphere model parameters, such as maximum light use efficiency (Yang et al., 2007) and the rooting depth (Ichii et al., 2009), and for the analysis of climate and terrestrial carbon cycles in Asia (Saigusa et al., 2010).
As an original model, we used the model tuned for the AmeriFlux observation sites, which were substantially similar to those studied by Yang et al. (2007).In the modified model run, we tuned the model using the eddy flux data and satellite-based observations at four sites: FJY, TKY, TMK, and TSE.The original model significantly underestimated the seasonal amplitude of GPP, but the use of flux observation data from Japan improved the model.

A2 Diagnostic models
A2.1 CASA (Potter et al., 1993) CASA is a diagnostic terrestrial biosphere model driven by climate-and satellite-based data on monthly time-scale (Potter et al., 1993).NPP is calculated as the product of maximum Light Use Efficiency (ε max ), Photosynthetically Active Radiation (PAR), FPAR (Fraction of PAR absorbed by vegetation), and climate-driven regulation factors that are functions of air temperature and soil water content.We changed the model input from NDVI for the original model to the satellite-based LAI and FPAR for the modified model, because the satellite-based LAI/FPAR data were available.We assumed that GPP is 2×NPP in the analysis.The CASA model is widely used for terrestrial carbon and water cycle monitoring from point to global scales (Potter et al., 1993(Potter et al., , 2001;;Hashimoto et al., 2010).
We used needleleaf evergreen tree parameters for FJY, broadleaf deciduous tree parameters for TKY and TSE (same parameter set for the two sites), and high lat deciduous tree parameters for TMK in point simulations.The broadleaf deciduous tree parameters were used for mixed forest pixels in spatial simulations.In the model modification process, we increased ε max because the simulated GPP seasonality is significantly underestimated.In the original model, ε max was biome-independent, and we adjusted the parameter in each vegetation type.The range of ε max was confirmed by studies in the literature (e.g., Ruimy et al., 1994).Without further modifications, the model reproduced observed carbon cycle seasonality.Model spinup was conducted by running the model for about 3000 years using whole climate input repeatedly.

A2.2 TOPS (Nemani et al., 2003)
TOPS is a diagnostic terrestrial biosphere model that simulates the terrestrial water and carbon cycle processes using daily climate and satellite (FPAR/LAI) data (Nemani et al., 2003;White and Nemani, 2004).Simulations of hydrologic states and fluxes were based largely on the Biome-BGC model (Thornton et al., 2002) with the use of the remotely sensed LAI.Calculation of GPP was based on a production efficiency model (PEM) approach with an environmental stress scalar set as the minimum limits for Tmin, VPD, and soil water potential.The snow model was updated using a physically based energy balance model (Ichii et al., 2008).The model calculates GPP only as carbon cycle component.The TOPS model is widely used for terrestrial biosphere status monitoring in North America (e.g., White and Nemani, 2004;Nemani et al., 2009;Ichii et al., 2009).As a default model parameter, we used literature-based parameters (White et al., 2000) with updated maximum light use efficiencies inversely calculated from satellite-based GPP, and a light use efficiency model (Yang et al., 2007).
We used evergreen needleleaf forest parameters for FJY, deciduous broadleaf forest parameters for TKY and TSE (same parameter set for the two sites), deciduous needleleaf forest parameters for TMK for the point simulations, and we used deciduous broadleaf forest parameters for mixed forest in the spatial simulations.In the model modification process, we changed the maximum light use efficiency and the maximum stomatal conductance by comparing the fluxes observed in GPP and ET seasonality.Since the definition of maximum light use efficiency is different from CASA (GPP and NPP per light in TOPS and CASA, respectively), we adjusted the maximum light use efficiency of TOPS for all ecosystems (e.g., 1.02→1.8for ENF, 1.56→1.8for DBF, and 1.56→2.1 for DNF).In addition, to better simulate the seasonal timing of the growing season, we adjusted the minimum temperature multiplier for the light use efficiency (−8.0→0.0 • C and 0.0→8.0• C for the daily minimum temperature at 100% and 0% inhibition of GPP, respectively).These modifications affected seasonal carbon cycle variations.Model spinup was conducted by running the model once using whole climate input since only water cycle is needed to be initialized.

A3.1 Biome-BGC (Thornton et al., 2002)
Biome-BGC is a prognostic biogeochemical model driven by daily climate data for the prescribed land cover.Biome-BGC uses the Farquhar biochemical photosynthesis model (Farquhar et al., 1980) to calculate GPP, and it then estimates NPP as the remainder of GPP subtracted from autotrophic respiration, which is a function of temperature and biomass.Stomatal conductance in Biome-BGC is modeled using a Jarvis-type model (Jarvis, 1976) as the product of the predefined maximum stomatal conductance and climate regulation factors (shortwave radiation, air temperature, soil water potential, and Vapor Pressure Deficit -VPD).The model has been applied widely in regional to global carbon and water cycle studies (e.g., Thornton et al., 2002;Reichstein et al., 2006;Jung et al., 2007;Vetter et al., 2007;Hashimoto et al., 2010).As a default ecophysiological model parameter set, we used literature based one (White et al., 2000).
We used evergreen needleleaf forest parameters for FJY, deciduous broadleaf forest parameters for TKY and TSE (same parameter set for the two sites), and deciduous needleleaf forest parameters for TMK in point simulations, and we used deciduous broadleaf forest parameters for mixed forest in spatial simulations.In the model modification process, we first reduced the maximum stomatal conductance to improve evapotranspiration seasonality.Second, we adjusted the temperature limitation factor for the stomatal conductance and adjusted the phenology model parameter (the growing degree temperature) to fit the timing of observed seasonal timing of GPP increase and decrease.Third, we adjusted the www.biogeosciences.net/7/2061/2010/Biogeosciences, 7, 2061-2080, 2010 Q 10 factor of maintenance respiration to fit the observed RE seasonality, which was nearly equal to zero during the winter.
We also adjusted the mortality rate for vegetation due to the anomalous high biomass in the default model.Model spinup was conducted by running the model using whole climate input repeatedly until soil carbon reached equilibrium.

A3.2 DAYCENT (Parton et al., 1998)
The DAYCENT model is the daily version of the CENTURY ecosystem model that was designed to simulate carbon, nitrogen, and phosphorus cycling and the plant production of ecosystems at a monthly time step (Parton et al., 1996).DAYCENT additionally incorporates more detailed submodels for simulating soil moisture, soil temperature, soil nitrogen, trace gas flux, and soil organic matter on a daily time step, while plant growth is updated weekly (Parton et al., 1998;Del Grosso et al., 2001).DAYCENT has been previously used to simulate long-term responses of grassland production and soil carbon and nitrogen to land use change, climate change, and elevated CO 2 levels (e.g.Del Grosso et al., 2001).DAYCENT has also been included in some model comparison studies (e.g.Gerten et al., 2008).In this study, a modified DAYCENT (Hajima, 2008) was used in which the sub-models of photosynthesis and evapotranspiration processes were replaced by models described in a more detailed physical and physiological manner (Table 2).
In the default run, we used temperate evergreen needleleaf parameters for FJY, temperate summergreen broadleaf forest parameters for TKY and TSE (same parameter set for the two sites), and boreal summergreen needleleaf forest parameters for TMK in point simulations.For the model calibration for modified model run, first, we adjusted maximum LAI and maximum photosynthesis capability related parameter, and stomatal conductance related parameter to reproduce seasonal maximums of ET and GPP, if these were not accurately simulated.Second, to improve seasonal variation of GPP, we adjusted temperature regulation of GPP.For deciduous forests, we also adjusted phenology related temperature parameters.As a result of model modification, physiological parameters, such as the ratio of photosynthetic capacity to leaf nitrogen, the parameter for stomatal conductance, and the parameter for temperature dependency in photosynthetic process, were adjusted.The phenological threshold and the maximum LAI value were also calibrated for each site.Model spinup was conducted by running the model for about 2000 years using whole climate input repeatedly, and model run was executed considering site history (aboveground biomass was removed in the year corresponding to forest age).

A3.3 VISIT (Ito et al., 2007)
The VISIT model is a prognostic terrestrial biosphere model that simulates the budgets of major greenhouse gases (CO 2 , CH 4 , and N 2 O) in atmosphere-ecosystem biogeochemical interactions.The model simulates terrestrial energy, water, carbon, and nitrogen cycles for the atmosphere-ecosystem biogeochemical interactions at a daily time step.The carbon cycle model is based on Sim-CYCLE (Ito and Oikawa, 2002), including separation of the understory and overstory's canopy and the improved respiration model (Ito et al., 2007).Different biomes are characterized by different physiological parameters defining rates of photosynthesis and respiration, allocation and allometry, leaf phenology, and mortality (for typical parameter values; see Ito, 2008).The model and its previous version (i.e.Sim-CYCLE) have also been widely used for point, regional, and global scales (Ito and Sasai, 2006;Ito et al., 2007;Ito, 2008).
In point model run, parameters for evergreen needleleaf forests and deciduous needleleaf forests were used for the FJY and TMK sites, respectively.For the TKY and TSE sites, parameters for deciduous broadleaf forest were used (each site has different parameter settings).The model was mainly tuned by adjusting physiological parameters, such as the maximum photosynthetic rate and the respiratory temperature dependence (Q 10 ), such that the agreement with observational GPP, RE, and NEE was improved in an empirical manner.In spatial model run, parameters for deciduous broadleaf forest calibrated at TKY site were used for mixed forest.Model spinup was conducted by running the model for about 1000 years using whole climate input repeatedly.

A4.1 LPJ (Sitch et al., 2003)
LPJ includes a dynamic biogeography sub-model, which determines the land cover implicitly from climate data, in addition to carbon and water cycle processes.The dynamic submodel simulates area-based competition for light and water availability among 11 plant functional types.This model has been widely used for various temporal and spatial scale models (e.g., Sitch et al., 2003Sitch et al., , 2008;;Reichstein et al., 2006;Vetter et al., 2007;Piao et al., 2009;Hashimoto et al., 2010).
We used temperate evergreen needleleaf parameters for FJY, temperate summergreen broadleaf forest parameters for TKY and TSE (same parameter set for the two sites), and temperate summergreen needleleaf forest parameters for TMK in point simulations (i.e., the dynamic vegetation mode was off).In a spatial model run, the dynamic vegetation module was coupled.In the model modification process, we changed the quantum efficiency (the intrinsic quantum efficiency of CO 2 uptake in C3 plants) for photosynthesis to fit observed GPP seasonality.The parameter is one of the most important parameters to control carbon cycle in LPJ model (Zaehle et al., 2005).This modification significantly affected seasonal maximum GPP.Since GPP also affected the vegetation biomass, RE seasonality was also reasonably simulated by the model.Model spinup was conducted by running the model for about 1000 years using whole climate input repeatedly.

A4.2 SEIB-DGVM (Sato et al., 2007)
SEIB-DGVM is an individual-based dynamic global vegetation model that simulates alteration of vegetation types and the processes that cause such change, namely establishment, competition, and mortality, in addition to simulating terrestrial carbon and water cycles.The model considers several sample forests or grasslands covering a small area (30 m×30 m) placed at each grid box, and it calculates growth and decay of individual trees by explicitly calculating tree height crown diameter, crown depth, and light availability for each tree.By doing so, we expect that the speed of alteration of one vegetation type relative to another will be represented reasonably without introducing any additional parameterizations.The model adopted plant functional types (PFTs) and parameters needed for some important processes used in LPJ-DGVM.The model is used for both point and global scale modeling (Kawamiya et al., 2005;Sato, 2009).In both point and spatial simulations, the dynamic vegetation module was used.
In the model modification process, firstly, we fit phenology and photosynthesis related parameters to obtain reasonable composition of vegetation cover.Second, we adjusted photosynthesis related and settlement related parameters toward a better simulation of LAI, GPP, and NEP magnitude and seasonality.In the process of model modification, we made the following changes in the parameters.Maximum, minimum, and optimal temperatures for photosynthesis were lowered by 3 • C for forest ecosystems and by 2 • C for grassland ecosystems.Maximum LAI was lowered for temperate deciduous broadleaf forest and boreal evergreen needleleaf forest, and it was increased for boreal deciduous needleleaf forest and boreal deciduous broadleaf forest.The specific leaf area was set at 0.6 times that of the original model and the rate of establishment of new plant functional types was halved.For spatial analysis, the maximum photosynthesis rate and light use efficiency were set 5% and 12.5% higher than those of the default setting, respectively.Regulation of the existence of two boreal PFTs (boreal needle-leaved evergreen and boreal broad-leaved summergreen) against air temperatures higher than 23 • C was inactivated.Model spinup was conducted by running the model for about 2000 years using whole climate input repeatedly.

A4.3 TRIFFID (Cox et al., 2001)
TRIFFID is a dynamic global vegetation model that simulates the dynamics of the areal coverage of each plant functional type as well as carbon and water cycles at a sub-daily time scale.Sub-daily variation in climate variables is generated internally based on daily climate inputs.The model includes 5 plant functional types as vegetation classes.The model is coupled with a land surface model (MOSES-2) (Essery and Clark, 2003) that calculates short time scale phenomena such as evapotranspiration and photosynthesis.The dynamic vegetation submodel (changes in the area of the plant functional types) is based on the Lotka-Volterra equation.The TRIFFID model has been used to project global environmental changes coupled with General Circulation Models (GCMs) (Cox et al., 2000;Matthews et al., 2005) and to compare terrestrial models at the continental (e.g.Piao et al., 2009) and global scales (Cramer et al., 2001;Sitch et al., 2008).We used the version included in the University of Victoria Earth System Climate Models (UVic-ESCM) version 2.8 (Cox et al., 2001;Matthews et al., 2005).
We fixed the vegetation type in the point simulations (i.e., the dynamic vegetation model was off) and used needleleaf forest parameters for FJY and TMK (same parameter set for the two sites) and broadleaf forest parameters for TKY and TSE (same parameter set for the two sites).In a spatial model run, the dynamic vegetation module was coupled and used.In the model modification processes, we applied six modifications: (1) the snow sublimation model was modified due to anomalous sublimation during the winter in the original model, (2) the quantum efficiency of photosynthesis and top leaf nitrogen concentration (nitrogen:carbon ratio) was enhanced due to underestimation of the seasonal amplitude of GPP in the original model, (3) the temperature sensitivity of photosynthesis was raised, (4) Q 10 for respiration was decreased due to high respiration during the summer in the original model, and (5) the effective soil depth for energy balance and soil temperature calculations was set shallower than the original model.Model spinup was conducted by running the model for about 1000 years using whole climate input repeatedly.

Fig. 1 .
Fig. 1.Land cover of the study area based on MODIS land cover data (MOD12Q1; Friedl et al., 2002) in year 2001 with the eddy flux observation sites used in this study.Xs show the locations of the eddy flux observation stations used in the study.Abbreviations of land cover classes: Evergreen Needleleaf Forest (ENF), Evergreen Broadleaf Forest (EBF), Deciduous Needleleaf Forest (DNF), Deciduous Broadleaf Forest (DBF), Mixed Forest (MF), Grassland (GR), and Cropland (CR).

Fig. 2 .
Fig. 2.Seasonal variations in GPP, RE, and NEP at four eddy flux sites from nine models (colored lines) and from observations (gray circle).Default models were used for the simulation.The model average (black bold) and the average ± standard deviations for the models (yellow) are also shown.

Fig. 3 .
Figure3 , and Saigusa et al. (2010) identified anomalous carbon budgets for the spring of 2002 and the summer of 2003 through the analysis of climate data and eddy-covariance measurements for 2001-2003.For GPP, the general characteristics of the anomalous pattern in 2002 and 2003 were well reproduced for all sites.In the spring of 2002, positive GPP anomalies caused by warmer temperatures

Fig. 5 .Fig. 6 .
Fig. 5. Average of standard deviations (unit: gC m −2 month −1 ) of modeled (a) GPP, (b) RE, and (c) NEP for all models during growing season (April to September).The results from the default (gray) and modified (black) models are shown with numbers.

Table 1 .
Eddy flux observation sites used in this study.
• 27 N, 138 • 46 E 36 • 08 N, 137 • 25 E 45 • 03 N, 142 • 06 E 42 • 44 N, 141 • 31 E prognostic models, and dynamic vegetation models.Empirical models use observations to construct the model with regression or other empirical algorithms.Diagnostic models use satellite-based time-variable observations such as the vegetation index, Leaf Area Index (LAI) and Fraction of Photosynthetically Active Radiation absorbed by a canopy (FPAR), and climate data to capture accurate spatial and temporal patterns in the status of terrestrial ecosystems.Prognostic models use only climate data as time-variable inputs and are capable of yielding future projections.Dynamic vegetation models usually use only climate data as timevariable model inputs, and can simulate changes in spatial patterns in vegetation cover and vegetation types based on competition among different plant functional types.

Table 2 .
Details for the terrestrial biosphere models used in this study.
Climate input: T , P , RH, VPD, Rad, and Wind denote air temperature, precipitation, relative humidity, vapor pressure deficit, surface shortwave radiation, and wind speed, respectively.Satellite input: LST, EVI, NDVI, LAI, and FPAR denote land surface temperature, enhanced vegetation index, normalized difference vegetation index, leaf area index and fraction of photosynthetically active radiation absorbed by a canopy, respectively.GR and MR denote growth and maintenance respiration, respectively.* 1 The original model uses cloud cover as climate data input instead of radiation.We modified the model to use radiation directly by removing the radiation calculation routine from cloud cover.*2 The original model used NDVI to calculate FPAR and LAI.We modified the model to use satellite-based LAI/FPAR directly.*3 LUE denotes light use efficiency.*4 Farquhar et al. (1980).*5 Jarvis (1976).*6 DePury

Table 3 .
Applied model parameter and structure modifications in this study.