www.biogeosciences-discuss.net/6/8455/2009/ © Author(s) 2009. This work is distributed under the Creative Commons Attribution 3.0 License.

Terrestrial biosphere models show large uncertainties when simulating carbon and water cycles, and reducing these uncertainties is a priority for developing more accurate estimates of both terrestrial ecosystem statuses and future climate changes. To reduce uncertainties and improve the understanding of these carbon budgets, we inves5 tigated the ability of flux datasets to improve model simulations and reduce variabilities among multi-model outputs of terrestrial biosphere models in Japan. Using 9 terrestrial biosphere models (Support Vector Machine-based regressions, TOPS, CASA, VISIT, Biome-BGC, DAYCENT, SEIB, LPJ, and TRIFFID), we conducted two simulations: (1) point simulations at four flux sites in Japan and (2) spatial simulations for Japan with a 10 default model (based on original settings) and an improved model (based on calibration using flux observations). Generally, models using default model settings showed large deviations in model outputs from observation with large model-by-model variability. However, after we calibrated the model parameters using flux observations (GPP, RE and NEP), most models successfully simulated seasonal variations in the carbon 15 cycle, with less variability among models. We also found that interannual variations in the carbon cycle are mostly consistent among models and observations. Spatial analysis also showed a large reduction in the variability among model outputs, and model calibration using flux observations significantly improved the model outputs. These results show that to reduce uncertainties among terrestrial biosphere models, we need 20 to conduct careful validation and calibration with available flux observations. Flux observation data significantly improved terrestrial biosphere models, not only on a point scale but also on spatial scales.


Introduction
The terrestrial biosphere plays important roles in regional and global carbon cycles, as well as in climate, through biogeochemical and biophysical processes.Large uncer-8457 tainties in terrestrial biosphere models and the potential effects of these uncertainties on the projection of future global environmental changes were pointed out through an intercomparison study of coupled carbon cycles and climate models (Friedlingstein et al., 2006).Assessment and refinement of terrestrial biosphere models are essential to improving estimation of terrestrial carbon budgets and future projections of environmental changes.Several Dynamic Global Vegetation Models (DGVMs) have also been evaluated in an earth system model framework (Sitch et al., 2008), and large differences were found in the simulation results among the different models.
To quantify the uncertainties in the terrestrial biosphere models and to determine the causes of these uncertainties, terrestrial biosphere models have been compared for global (e.g., Cramer et al., 1999;Cramer et al., 2001), regional (e.g., VEMAP members, 1995;Jung et al., 2007), and point scales (e.g., Potter et al., 2001;Gerten et al., 2008).These studies evaluated a number of terrestrial biosphere models, and the models' outputs showed large variability in their estimates of terrestrial carbon cycles.For example, Cramer et al. (1999) found that the range of global net primary productivities was 40-80 GtC year −1 (55±10 GtC year −1 ), and Cramer et al. (2001) showed that the future terrestrial carbon budget has considerably scattered.One of the case studies that successfully reduced uncertainties in the estimation of carbon cycles was an intercomparison of atmospheric transport models and estimated atmosphere-ocean and atmosphere-land carbon cycles (Transcom; Gurney et al., 2002).The detailed comparison with multi-model outputs successfully identified a robust carbon budget, as well as constraints that are needed for more reliable estimates more reliable estimations.However, most intercomparison studies of terrestrial biosphere models have not been performed systematically, and some models have been and continue to be used without sufficient validation.Recently, multiple terrestrial biosphere models were used to analyze the interannual variability of terrestrial carbon budgets for continental scale monitoring of terrestrial biospheres (Reichstein et al., 2006;Vetter et al., 2007;Piao et al. 2009;Hashimoto et al., 2009).These biosphere models were able to identify key mechanisms behind the recent global changes and the impacts of these changes on terrestrial biospheres; however, large uncertainties in each biosphere models remain owing to lack of validation with observations.These multi-model uncertainties could potentially be reduced by using the observed data to calibrate and validate the models and by analyzing the current status of each model.As stated before, previous model intercomparison projects generally lacked detailed validations, resulting in errors in the simulations.Owing to the recent increase in the availability of relevant data (Baldocchi, 2008), model improvements have been carried out using flux observations, and the impact on the model simulations was evaluated on a global scale (Friend et al., 2007;Stokli et al., 2008), as well as on a regional scale (Ueyama et al., 2009), for larch forest parameter refinements.Since the uncertainties among multi-models should be reduced by the constraints from the observed data, potential differences should be evaluated after each model has successfully simulated terrestrial carbon cycles.
In this study, we assessed the extent to which differences between the models can be reduced by calibrating the models using flux observations from Japan.First, we conducted baseline simulations using default model settings without using prior flux observation information.These simulations were run on both a point-scale (four flux tower sites) and a spatial scale.Then, using the flux data as model constraints, we improved the models.Finally, based on the improved models, we ran the models at point and spatial scales and analyzed the inter-model variability of the outputs.We report the extent to which model calibration using the flux observations can reduce the uncertainties in the modeled carbon cycles through calibration for seasonal and interannual time-scales and spatial scales.
Among the eight ultimate purposes of the CarboEastAsia A3 Foresight program (Summary of the Special Issue; Hirano et al., 2009), this study had three goals: (1) to quantify the distribution and the strength of carbon sinks/sources and their spatial and temporal variability (dynamic) and uncertainty (prediction), (2) to develop a new generation of carbon cycle models (mechanistic and prognostic) suitable for the east Asian ecosystems, and (3) to quantify the contribution of terrestrial ecosystems in East Asia to the global carbon balance.The ultimate purpose of this study is to estimate 8459 carbon cycles in East Asia.As a first step, we selected a test site in Japan where the flux network was dense and could be used as a case study.This site was used to test the extent to which ground observations can reduce uncertainties in carbon cycle estimations.By comparing terrestrial biosphere models, we aimed to identify the current robust findings and future potential improvements in order to understand terrestrial carbon budget estimation in East Asia.

Study area and flux observation sites
We focused our analysis on Japan (Fig. 1).More than 75% of the land is covered by forests, mainly with mixed forests, deciduous broadleaf forests, and evergreen needleleaf forests.Generally, the climate is characterized as warm-temperate and humid (southern half) and cool-temperate and humid (northern half), and the topography is mountainous.
In this analysis, we used data from four flux observation sites for the point experiments.These sites included the Fujiyoshida forest meteorology research site (FJY), the Takayama deciduous broadleaf forest site (TKY), the Teshio CC-LaG experiment site (TSE), and the Tomakomai flux research site (TMK).Details for each site are given in Table 1.The FJY site is located in Fujiyoshida City, which is on the Kenmarubi lava flow on Mt.Fuji ( 35• 27 N, 138 • 46 E).The dominant species is the Japanese red pine (Pinus densiflora), and there are also mixed evergreen and deciduous broadleaf trees, such as Japanese holly (Ilex pedunculosa) and azalea (Rhododendron dilatatum).The climate is relatively cool for its latitude, and snow depth reaches up to 0.5 m in winter.
The TKY site is located in a mountainous region in the central part of the main island of Japan (36 In this study, we used 9 terrestrial biosphere models (Table 2) covering all four cat-8461 egories described above.The models include one empirical model (Support Vector Machine-based regression (SVM) Yang et al., 2006Yang et al., , 2007)), two diagnostic terrestrial biosphere models (the Carnegie-Ames-Stanford Approach Model (CASA), Potter et al., 1993;and the Terrestrial Observation and Prediction System (TOPS), Nemani et al., 2003), three prognostic terrestrial biosphere models (Biome-BGC, Thornton et al., 2002;DAYCENT, Parton et al., 1998; Vegetation Integrative SImulator for Trace gases (VISIT), Ito et al., 2007), and three dynamic global vegetation models (the Lund-Potsdam-Jena Model (LPJ), Sitch et al., 2003; Spatially-Explicit Individual-Base DGVM (SEIB), Sato et al., 2007;and Top-down Representation of Interactive Foliage and Flora Including Dynamics (TRIFFID), Cox, 2001).The details of each model are described in Appendix A.

Data 2.3.1 Flux tower observation data
We used climate data from four flux observation sites as the inputs for point-scale models, and we used the carbon fluxes data to validate the point-scale models.The four sites (FJY, TKY, TSE, and TMK) are described in Sect.2.1.The temporal coverage of the observation data was 2000-2004for FJY, 2001-2003for TKY, 2002for TSE, and 2001-2003 for TMK.The methods of gap-filling and flux-partitioning (i.e., infer Ecosystem Respiration (RE) and Gross Primary Productivity (GPP) from Net Ecosystem Exchange (NEE)) were based on Hirata et al. (2008).Based on the suitable temporal scales for each model, the observed climate data were converted into corresponding temporal averages.Monthly variations in the carbon fluxes (GPP, RE and NEP) were calculated for model validation.Although the GPP and RE were derived from observed NEP, we describe them as observed GPP and RE throughout the paper.

Satellite-based time variable data
We used eight-day or sixteen-day composites of the Moderate Resolution Imaging Spectroradiometer (MODIS)-based Land Surface Temperature (LST) (Wan et al., 2002), the Vegetation Indices (the Normalized Difference Vegetation Index (NDVI) and the Enhanced Vegetation Index, EVI) (Huete et al., 2002), and the Leaf Area Index (LAI)/Fraction of Photosynthetically Active Radiation absorbed by a canopy (FPAR) (Myneni et al., 2002) from 2001 to 2006.The original temporal resolutions of LST, LAI/FPAR, and VI were 8-day, 8-day and 16-day, respectively.Therefore, we converted the temporal resolution to fit each model's temporal resolution.
For point scale analysis, we used MODIS 1-km resolution subset data sets (http: //daac.ornl.gov/MODIS/),each of which consisted of 7-by-7 km regions centered on the flux towers (Cook, 2004).At each time step, we averaged the MODIS observations by only using high-quality pixels (with the mandatory quality assurance (QA) flag being good in the QA data) based on the method of Yang et al. (2007), and missing data were replaced by a 2001-to-2006 average calculated using high-quality pixels.For the spatial analysis, we created 4-km spatial resolution data using the original 1-km data of MODIS LST, VIs, and LAI/FPAR.All gaps in the data were filled using averaged 8-day data calculated from 2001 to 2006 at each grid point if the QA flags were not good.

Climate data
Each terrestrial biosphere model requires climate data as inputs.Necessary climate data include temperature (daily maximum temperature (Tmax), daily minimum temperature (Tmin), daily average temperature (Tave)), precipitation (Prec), vapor pressure deficit (VPD), relative humidity (RH), incoming solar radiation (Srad), and wind speed (Wind).For point analysis, we used either observed climate data from each flux site or long-term climate reanalysis data  from NCEP/NCAR reanalysis 1 (Kalnay et al., 1996) that was extracted at the corresponding pixel and corrected using site observations.Temporal averaging (e.g., daily, 8 days, and monthly) was applied to fit the 8463 model time-step.
For spatial analysis, we created climate dataset covering 1990 to 2006 at a 4-km spatial resolution using point observations from the Automated Meteorological Data Acquisition System (AMeDAS) climate data network in Japan.The data included hourly measurements of temperature, precipitation, and wind speed.We estimated daily Tmax, Tmin, Prec, and Wind based on nearest neighbor interpolation.Temperature data were interpolated using elevation correction a assuming lapse rate=0.0065K m −1 .
Daily VPD, RH, and Srad were estimated based on the MTCLIM algorithm (Thornton et al., 1999).

Static data
The models needed several ancillary data sets to complete the simulations.All necessary data were derived from the global set of soil data (Global Soil Wetness Project (GSWP) -2 Input data; http://www.iges.org/gswp/input.html),MODIS Land Cover (MOD12Q1; Friedl et al., 2002), and the Global 30 Arc-Second Elevation (GTOPO30) data from the US Geological Survey.Site ancillary information (longitude and latitude) was also included.Different models required different soil information, but all information (soil texture, rooting depth, field capacity, wilting point, saturation point, and albedo) was taken from the GSWP-2 data.

Experiments
The study consisted of two steps for each model run and evaluation: point and spatial analysis.Point runs were conducted for the four flux sites in Japan, and spatial runs were conducted for the entire country.Each step included two model runs: default and improved model runs.The default model run was conducted as a benchmark test, and the improved model run was conducted to analyze the extent of model improvement when the flux observations were used as constraints.For RE and NEP simulations, SVM and TOPS were excluded since these models do not simulate them (see also Appendix A1 and A2).

Point model runs
First, as a benchmark test, we ran all models using the default model settings for each flux site, using input meteorological and satellite data.Then we tuned all models to fit the observed GPP, RE, and NEP data by adjusting the model parameters.All procedures for the adjustments of the model parameters were model-user dependent; however, we tried to apply minimum changes to fit the observed carbon cycles.No models needed further algorithm changes, and drastic model improvements were achieved.Details of model modifications are described in Appendix A. The model initialization or spin-up processes are also described in Appendix A.
Using the results from default and improved models, we first analyzed the modeled seasonality of GPP, RE, and NEP from all models, focusing on how well the models could simulate the observed carbon cycle seasonality and how much the model-tomodel variability could be reduced by the model tuning process using flux observations as constraints.For each model run, we obtained the average monthly variation using the whole observation period for GPP, RE, and NEP, and we calculated its standard deviation among the models for each month for both the original and improved models.
Second, the interannual variations in GPP, RE, and NEP were analyzed for all models.To test how each model reproduces the anomalies of interannual variability in the carbon cycle, we focused on the years 2001-2003 by taking an overlapping period, and we calculated the monthly anomalies by subtracting the mean monthly variations for 2001-2003.The TSE site was excluded because its observation period was only 1 year.8465

Spatial model runs
Using both the original and improved models, we conducted spatial analysis for the grids with forest vegetation cover in Japan (Fig. 1).All spatial information was used to run all the models.Spin-ups were conducted using whole 1990-2006 climate data with the CO 2 concentration fixed at the 1990 level, and we ran the model from 1990 to 2006 with a time-variant CO 2 concentration.Only the results from 2001-2006 were analyzed.As the improved models, we used the models that had been calibrated in the point simulation.To evaluate the outputs from multiple models, we calculated the annual average of GPP, RE, and NEP by taking the average from the 2001-2006 period, and basic statistics (average and standard deviation) were calculated.Then, interannual variability in GPP and NEP were analyzed by calculating the total for all of Japan.We did not use DAYCENT and TRIFFID for spatial analysis due to technical reasons (see also Appendix A3 and A4.).

Default model simulations
Simulated monthly GPP, RE, and NEP data showed large deviations from the observations, and there was large model-by-model variability for all sites (Fig. 2).For GPP, most models underestimate its seasonal amplitude, and seven models (SVM, CASA, TOPS, VISIT, LPJ, SEIB, and TRIFFID) out of eight clearly underestimated the GPP seasonal amplitude.One model (Biome-BGC) overestimated the seasonal amplitude for FJY, three models (CASA, LPJ and TRIFFID) underestimated the seasonal amplitude for TKY and TSE, and most models underestimated the seasonal amplitude for TMK.The timing of the start of the growing season also deviated from the observed start times.Some models estimated a significantly earlier start of the growing season (Biome-BGC for FJY, TOPS for TKY and TMK, and DAYCENT for FJY), while others estimated later start of the growing season (DAYCENT for TKY and TMK).In general, the Biome-BGC model overestimated the GPP seasonal amplitude, the VISIT simulated it well, and other models underestimated it.For RE, similar to GPP, most models underestimated the seasonal RE amplitude.The underestimation of RE was caused by the underestimation of GPP, which led to underestimated biomass and respiration.Since NEP is a difference of GPP and RE, smaller deviations of GPP and RE considerably affect NEP; we detected an underestimation of the seasonal NEP amplitude, especially at the TKY and TMK sites.
Ensembles of multiple models also underestimated the seasonal amplitude of GPP, RE, and NEP (except at the FJY site) with large model-to-model differences in the simulated seasonal carbon cycle (yellow in Fig. 2).Comparison of the observed and modeled (multi-model ensemble mean) monthly variations in GPP, RE, and NEP clearly indicated that the model underestimated the seasonal amplitude (the slopes of the regression lines were less than 1, namely, 0.65, 0.63, and 0.57 for GPP, RE, and NEP, respectively) (Fig. 3), although the correlation coefficients were high.These differences might be one of the reasons that current terrestrial biosphere models are biased and might be the cause of large uncertainties among the model outputs.Both the bias and the large uncertainties indicate that model improvements are necessary.
Commonly, the differences between the model results and the actual observations can be explained by two reasons.First, the seasonal amplitude of GPP, RE, and NEP were underestimated.This underestimation is probably the result of inappropriate modeling of photosynthesis activities, which results in underestimation of RE and NEP.Second, seasonal timing of the start of the growing season shows wide variability among the different models.The phenolgy models included in terrestrial biosphere models are based mainly on empirical modeling, such as the use of temperature summation to determine the threshold for the start of the growing season, and this empirical modeling 8467 may cause some uncertainties in models.Based on these findings for each model, all models were improved so that they more accurately simulate the seasonality of the observed carbon cycle (details in model modification are described in Appendix A).

Improved model simulations
After improving the model using flux observations, we found drastic model improvements in the carbon cycle component for most sites (Fig. 4).Seasonal variations in GPP and RE and the amplitudes of these variations were accurately simulated by most models for all sites.The reductions in model-to-model differences at the FJY site were especially large, and the seasonal variations in GPP, RE and NEP were close to the observations for TKY, TSE, and TMK.Only RE at TKY showed a small overestimation by most models.As a result of GPP and RE refinements, we found that the seasonal NEP amplitude was also much more accurately simulated in the improved model, especially at the TKY and TMK sites.Only SEIB-DGVM showed deviations from the observations, which were probably caused by the difficulty of representing forest dynamics stably.Large disturbances (e.g., death of big trees) might affect carbon dynamics greatly and result in large gaps between the modeled and observation values.
Estimation of the multi-model ensemble mean of monthly GPP, RE, and NEP values across four sites was also greatly improved (Fig. 3).From the model improvement, we obtained a correlation between the modeled and observed carbon cycle components that was closer to 1:1.For example, the slopes of the regression lines became closer to 1, changing from 0.65→0.90for GPP, 0.63→0.86 for RE, and 0.57→0.78for NEP.The R 2 values were also higher than those from the default simulations.These results also show that the improvement processes for each model successfully simulated the terrestrial carbon cycle components in the model on a multi-site scale.
Model-to-model differences, as measured by the standard deviations of multi-model outputs, were also greatly reduced by the model improvement process for GPP and RE simulation (Fig. 5).For example, the standard deviations of the modeled GPP values were reduced by 47%, 3%, 18%, and 19% for FJY, TKY, TMK and TSE, respectively, and the standard deviations of the modeled RE values were reduced by 60%, 25%, 11%, and 30% for FJY, TKY, TMK, and TSE, respectively.On the other hand, the standard deviation of NEP values was not largely changed by the model improvements (Fig. 5).The difficulties with the modeled NEP calibration are potentially the result of several issues.First, small differences in GPP and RE greatly affect NEP; therefore, it is difficult to refine NEP in the model.Second, inclusion of site history is needed to simulate NEP accurately (Friend et al., 2007).Since these forests are not predominantly natural forests, forest regrowth can be one of the important mechanisms for current carbon budgeting.Third, observations sometimes contains biases in the closure of the energy balance, and gap-filling and flux partitioning technique (Massman and Lee, 2002;Foken, 2008).

Interannual variations
Interannual variations in terrestrial carbon cycle processes (GPP and NEP) were generally well reproduced by the models (Fig. 6).Hirata et al. (2007) (Saigusa et al., 2008) at TKY and TMK were captured by most models.In the summer of 2003, negative GPP anomalies for FJY and TKY and a positive anomaly for TMK that were caused by radiation anomalies (Saigusa et al., 2008(Saigusa et al., , 2009) ) were well-captured by the models.NEP anomalies were also generally reproduced by the model.Especially for the TMK site, both GPP and NEP anomalies for the whole period were well reproduced by the models (R 2 =0.86 for GPP and 0.70 for NEP).
Some models showed deviations from observations and from other model outputs, indicating the need for a more detailed model assessment.For example, the year-to-8469 year anomalies in GPP were much higher in the TOPS model simulation, especially for TKY, and some diverse responses in GPP and NEP were found using the DAYCENT model simulation for the spring of 2001 and 2002 for TMK.Some deviations from the observations and from other models were found for FJY and TKY with the SEIB simulation.These deviations were primarily caused by the model responses to anomalies in input parameters such as climate data and satellite data, and these facts need to be analyzed in detail to further constrain the terrestrial biosphere models.

Annual summary
In the default model simulations, we found large differences in the annual average GPP from 2001 to 2006 among the models (Fig. 7).The differences were primarily produced by the site-level differences in the model.For example, underestimation of the GPP seasonal amplitude by CASA and LPJ on the point scale greatly affected the magnitude of GPP on the spatial scale; CASA and LPJ estimated much lower GPP values than did BIOME-BGC and VISIT (e.g., about 1200 gC m −2 year −1 by CASA and LPJ and 2000 gC m −2 year −1 by BIOME-BGC and VISIT), which resulted in large standard deviations among the models (500-600 gC m −2 year −1 ).The spatial patterns in GPP based on the improved models were similar in magnitude among the models, reducing the model-to-model GPP differences (Fig. 8).For the annual GPP, most models estimated about 2000 gC m −2 year −1 for most of Japan, and the standard deviation among the models was significantly reduced (300 gC m −2 year −1 ).The remaining differences in annual GPP are explained by (1) lack of sufficient observation network especially in southern regions, and (2) discrepancy in simulated vegetation type by DGVMs, and should be improved in future analysis.Annual total statistics in Japan were also greatly improved by the model refinements (Table 3).Both total GPP and RE were increased by the model improvements, and their standard deviations among the different models were reduced by about half (e.g., 162→75 for GPP, and 188→95 for RE; Table 3).Standard deviations in NEP were not significantly changed, which is probably the result of a compensating effect by taking differences in GPP and RE.Thus, improvement of the models by using flux observations greatly reduced model-to-model variability and successfully refined the estimation of terrestrial carbon cycles on a spatial scale.

Interannual variability
Simulations of the interannual variability also showed year-to-year variability in the model outputs (Fig. 9).For example, during the 2003 summer, we experienced a strong reduction in solar radiation with heavy rain in 2003 in Japan (Saigusa et al., 2009).For that time, the model simulated large reductions in GPP for most of the study region.
During the spring of 2002, warmer temperatures led to a significant enhancement in GPP in most ecosystems, causing significant increases in NEP.These anomalous patterns were successfully simulated in most of the terrestrial biosphere models used in this study.
Standard deviations of GPP among the models were also reduced by the process of model calibration on the point scale.The reduction in the model-by-model differences in the model experiments largely showed that model testing and calibration using Fluxnet observations was beneficial, and testing the model with Fluxnet observations and evaluating the model improvements is a first step in reducing the variability in the model.

Implications of the improvement of the terrestrial biosphere models
When using flux observations to calibrate model parameters, we found that these observations strongly constrained the modeled seasonal variation and its magnitude in 8471 annual totals.In terms of variability in the modeled seasonal variation, the improved model resulted in standard deviations that were reduced by 22%, 32%, and 2% in magnitude for GPP, RE, and NEP, respectively (values are the average of the standard deviations of the modeled GPP, RE, and NEP at each site).These results show that the flux observations helped to reduce uncertainty in terrestrial biosphere models, and using flux observations as constraints will be one of the important advancements in ecosystem modeling.Previous model intercomparison studies found large variability; however, this variability can potentially be greatly reduced by using flux observations as constraints.Nowadays, many terrestrial observations are available to the terrestrial biosphere modeling communities, and the use of these observation data can reduce variability among models.Previously reported uncertainties in terrestrial biosphere models (Cramer et al., 1999(Cramer et al., , 2001;;Friedlingstein et al., 2006) will be greatly reduced by using up-to-date and well-validated terrestrial biosphere models.
A multi-model ensemble simulation of the terrestrial carbon cycle is effective in estimating seasonal and interannual variability.Using the default model settings, we estimated large variations in the seasonal carbon cycle; however, an average of all models reasonably estimated terrestrial carbon cycles, though with some underestimations.The improved models successfully simulated seasonal and interannual variations in the carbon cycle, with smaller standard deviations compared to the default model settings.These results suggest that the multi-model ensemble approach successfully captured the terrestrial carbon cycle over seasonal and interannual time scales, and the associated uncertainties were greatly reduced by model improvements.
This model intercomparison analysis has several implications for the ground observations.In the process of making improvements to the models, we commonly adjusted the maximum photosynthesis activity, the Q 10 value of respiration, the stomatal conductance, and phenology-related parameters for most models.The fact that these parameter adjustments were required suggests the necessity of further constraints from observations.Owing to further model improvements, the accurate estimation of highlysensitive site-specific parameters and their spatial distributions should be examined in future observations.

Potential limitations and implications on the understanding of the terrestrial carbon budget
Most terrestrial biosphere models used in this study were developed outside of Asia, and we found that direct application of these models may cause problems in the simulation of carbon cycles.This is partially due to the unique ranges in annual climate covered by the AsiaFlux network and caused by the Asian monsoon climate.The flux observations show that AsiaFlux sites cover a wider range of ecosystems, especially in terms of annual precipitation, than other FLUXNET observation networks such as AmeriFlux and CarboEurope (precipitation in most observation sites of AmeriFlux and CarboEurope is less than 2000 mm year −1 ).Therefore, the humid temperate climate in Asia and Japan requires further model calibration studies to accurately simulate the observed carbon cycle.
To improve the models, we need to address several items.First, the site history should be included in any further studies in order to improve the NEP simulation.Many biosphere modeling studies have pointed out that site history can account for most proportions of the current carbon uptake.Including the site history could potentially improve the net ecosystem exchange.Second, the model validation using flux site observation can constrain short-term (daily, seasonally and yearly time scales) processes not long-term processes such as vegetation transition.Model calibration and parameterization using longer-term observation should be conducted in the future.Third, in this study, most of the ecosystems show sensitivities to temperature and radiation variation.Model sensitivity to water availabilities are one of the most unknow facts for ecosystem modeling (e.g.Morales et al., 2005), and should be tested in future studies.Fourth, it is possible to calibrate the model in different ways to get the right answer for the present-day, especially for the seasonal cycle.We need more observations and extract parameters to be additionally observed to avoid the problems.Fifth, in this study, we have one or two observations in each land cover class.It is desirable to use mul-8473 tiple sites to calibrate the model within the same plant function types.Currently, lack of sufficient sites prevents more sufficient validation.Sixth, to estimate the nationwide carbon budget for Japan, we further need observations of cropland, grassland, and evergreen broadleaf ecosystems.In this study, we only analyzed forest cover and did not use any constraints for other ecosystems not covered by the four flux observations.
Including these other ecosystems could potentially increase the terrestrial carbon cycle estimations of each process.

Conclusions
We demonstrated that flux observations related to the terrestrial carbon cycle are useful to improve the terrestrial biosphere models and to help reduce model-to-model variability and uncertainty.Through model intercomparison and model refinement, we found that differences among the models were greatly reduced on both point and spatial scales.We also found that interannual variability in the terrestrial carbon cycle was robust among the models, and the potential problems with the current terrestrial biosphere models were similar among models.This study was the first step in reducing the uncertainty among the terrestrial biosphere models.Through work similar to this study, we can potentially remove the biases included in the biosphere models based on parameter uncertainties.Further analysis is required to constrain the spatial variations and temporal variations.Although there are still uncertainties in the model algorithm, these uncertainties will be fully explored in the future through more rigorous investigations and by adding other constraints to the model, such as biomass carbon and soil carbon, as well as some water cycle estimations.This study will be extended into the whole Asian monsoon region in the next step as a subproject of the A3 CarboEastAsia Program.The use of systematically gapfilled and flux-partitioned data over Asia, covering Siberia, Mongolia, China, Korea, Japan, and southeastern Asia regions, can potentially characterize the similarities or differences between the Asian terrestrial ecosystem and other regions, such as North America and Europe.The Asian monsoon region includes potentially unique terrestrial ecosystems, such as larch forest over the northern region, temperate and humid forest over East Asia, and paddy fields over east and southeast Asia, that should be analyzed in future studies by refining the model using observations.The future responses of these ecosystems to climate change should also be analyzed.

Appendix A Overview of the terrestrial biosphere models and their modification
We used 9 terrestrial biosphere models from four different categories including empirical, diagnostic, prognostic, and dynamic models.Below are brief descriptions of each ecosystem model in each category and the modifications we made to the models.(Yang et al., 2006(Yang et al., , 2007) ) Support Vector Machine-based regression (SVM) is an empirical model based on the regression-type support vector machine driven by inputs of satellite-based surface radiation (Rad), land surface temperature (LST), and enhanced vegetation index (EVI) (Yang et al., 2006(Yang et al., , 2007)).Support Vector Machine regression is a machine learning technique that transforms nonlinear regressions into linear regressions by mapping the original low-dimensional input space to a higher-dimensional feature space using kernel functions (e.g., Cristianini and Shawe-Taylor, 2000).The method was assessed for more than 20 Ameriflux sites over the continental United States to estimate spatial distributions both in evapotranspiration (Yang et al., 2006;Ichii et al., 2009) and in gross primary productivity (Yang et al., 2007).The model calculates GPP only as carbon cycle component.The model output has also been used for inverse estimation of key biosphere model parameters, such as maximum light use efficiency (Yang et al., 2007) and the rooting depth (Ichii et al., 2009), and for the analysis of climate and terrestrial carbon cycles in Asia (Saigusa et al., 2009).

8475
As an original model, we used the model tuned for the AmeriFlux observation sites, which were substantially similar to those studied by Yang et al. (2007).In the improvement model run, we developed the model using the observations from four flux sites: FJY, TKY, TMK, and TSE.The original model significantly underestimated the seasonal amplitude of GPP, and use of flux observation data from Japan significantly improved the model.

A2 Diagnostic models
CASA (Potter et al., 1993) CASA is a diagnostic terrestrial biosphere model driven by climate-and satellite-based data on monthly time-scale (Potter et al., 1993).NPP is calculated as the product of maximum Light Use Efficiency (ε max ), Photosynthetically Active Radiation (PAR), FPAR (Fraction of PAR absorbed by vegetation), and climate-driven regulation factors that are functions of air temperature and soil water content.We changed the model input from NDVI for the original model to the satellite-based LAI and FPAR for the improved model, because the satellite-based LAI/FPAR data were available.We assumed that GPP is 2×NPP in the analysis.The CASA model is widely used for terrestrial carbon and water cycle monitoring from point to global scales (Potter et al., 1993(Potter et al., , 2001;;Hashimoto et al., 2009).
We used needleleaf evergreen tree parameters for FJY, broadleaf deciduous tree parameters for TKY and TSE, and high lat deciduous tree parameters for TMK in point simulations.In the model improvement process, we increased ε max because the simulated GPP seasonality is significantly underestimated.In the original model, ε max was biome-independent, and we adjusted the parameter in each vegetation type.The range of ε max was confirmed by studies in the literature (e.g., Ruimy et al., 1994).Only this adjustment was able to significantly improve the modeled carbon cycle seasonality in this model.
TOPS (Nemani et al., 2003) TOPS is a diagnostic terrestrial biosphere model that simulates the terrestrial water and carbon cycle processes using daily climate and satellite (FPAR/LAI) data (Nemani et al., 2003;White and Nemani, 2004).Simulations of hydrologic states and fluxes were based largely on the Biome-BGC model (Thornton et al., 2002) with the use of the remotely sensed LAI.Calculation of GPP was based on a production efficiency model (PEM) approach with an environmental stress scalar set as the minimum limits for Tmin, VPD, and soil water potential.The snow model was updated using a physically based energy balance model (Ichii et al., 2008).The model calculates GPP only as carbon cycle component.The TOPS model is widely used for terrestrial biosphere status monitoring in North America (e.g., White and Nemani, 2004;Nemani et al., 2009;Ichii et al., 2009).As a default model parameter, we used literature-based parameters (White et al., 2000), with updated maximum light use efficiencies inversely calculated from satellite-based GPP, and a light use efficiency model (Yang et al., 2007).
We used evergreen needleleaf forest parameters for FJY, deciduous broadleaf forest parameters for TKY, TSE, and deciduous needleleaf forest parameters for TMK for the point simulations, and we used deciduous broadleaf forest parameters for mixed forest in the spatial simulations.In the model improvement process, we changed the maximum light use efficiency and the maximum stomatal conductance by comparing the fluxes observed in GPP and ET seasonality.We adjusted the maximum light use efficiency for all ecosystems (e.g., 1.02→1.8for ENF, 1.56→1.8for DBF, and 1.56→2.1 for DNF).In addition, to better simulate the seasonal timing of the growing season, we adjusted the minimum temperature multiplier for the light use efficiency (original model: −8.0→0.0 • C, improved model: 0.0→8.0• C).These modifications drastically affected seasonal carbon cycle variations.

A3 Prognostic models Biome-BGC (Thornton et al., 2002)
Biome-BGC is a prognostic biogeochemical model driven by daily climate data for the prescribed land cover.Biome-BGC uses the Farquhar biochemical photosynthesis model (Farquhar et al., 1980) to calculate GPP, and it then estimates NPP as the remainder of GPP subtracted from autotrophic respiration, which is a function of temperature and biomass.Stomatal conductance in Biome-BGC is modeled using a Jarvis-type model (Jarvis, 1976) as the product of the predefined maximum stomatal conductance and climate regulation factors (shortwave radiation, air temperature, soil water potential, and VPD).The model has been applied widely in regional to global carbon and water cycle studies (e.g., Thornton et al., 2002;Reichstein et al., 2006;Jung et al., 2007;Vetter et al., 2007;Hashimoto et al., 2009).As a default ecophysiological model parameter set, we used literature based one (White et al., 2000).We used evergreen needleleaf forest parameters for FJY, deciduous broadleaf forest parameters for TKY and TSE, and deciduous needleleaf forest parameters for TMK in point simulations, and we used deciduous broadleaf forest parameters for mixed forest in spatial simulations.In the model improvement process, we reduced the maximum stomatal conductance to improve evapotranspiration seasonality.Then, we adjusted the temperature limitation factor for the stomatal conductance and adjusted the phenology model parameter for the growing degree temperature to fit the observed phenology.In addition, we adjusted the Q 10 factor to maintain respiration that fit the observed RE seasonality, which was nearly equal to zero during the winter, and we adjusted the mortality rate for vegetation due to the anomalous high biomass that was calculated.DAYCENT (Parton et al., 1998) The DAYCENT model is the daily version of the CENTURY ecosystem model that was designed to simulate carbon, nitrogen, and phosphorus cycling and the plant produc-tion of ecosystems at a monthly time step (Parton et al., 1996).DAYCENT additionally incorporates more detailed submodels for simulating soil moisture, soil temperature, soil nitrogen, trace gas flux, and soil organic matter on a daily time step, while plant growth is updated weekly (Parton et al., 1998;Del Grosso et al., 2001).DAYCENT has been previously used to simulate long-term responses of grassland production and soil carbon and nitrogen to land use change, climate change, and elevated CO 2 levels (e.g.Del Grosso et al., 2001).DAYCENT has also been included in some model comparison studies (e.g.Gerten et al., 2008).In this study, a modified DAYCENT was used in which the sub-models of photosynthesis and evapotranspiration processes were replaced by models described in a more detailed physical and physiological manner (Table 2).
In the default run, we used temperate evergreen needleleaf parameters for FJY, temperate summergreen broadleaf forest parameters for TKY, temperate mixed forest parameters for TSE, and boreal summergreen needleleaf forest parameters for TMK in point simulations.In the model calibration for improved model run, first, we adjusted maximum LAI and maximum photosynthesis capability related parameter, and stomatal conductance related parameter to reproduce seasonal maximums of ET and GPP, if these were not accurately simulated.Second, to improve seasonal variation of GPP, we adjusted temperature regulation of GPP.For deciduous forests, we also adjusted phenology related temperature parameters.As a result of model improvement, physiological parameters, such as the ratio of photosynthetic capacity to leaf nitrogen, the parameter for stomatal conductance, and the parameter for temperature dependency in photosynthetic process, were adjusted.The phonological threshold and the maximum LAI value were also calibrated for each site.Due to computation time, we did not conduct spatial run.

VISIT (Ito et al., 2007)
The VISIT model is a prognostic terrestrial biosphere model that simulates the budgets of major greenhouse gases (CO 2 , CH 4 , and N 2 O) in atmosphere-ecosystem biogeochemical interactions.The model simulates terrestrial energy, water, carbon, and nitro-8479 gen cycles for the atmosphere-ecosystem biogeochemical interactions at a daily time step.The carbon cycle model is based on Sim-CYCLE (Ito and Oikawa, 2002), including separation of the understory and overstory's canopy and the improved respiration model (Ito et al., 2007).Different biomes are characterized by different physiological parameters defining rates of photosynthesis and respiration, allocation and allometry, leaf phenology, and mortality (for typical parameter values, see Ito, 2008).The model and its previous version (i.e.Sim-CYCLE) have also been widely used for point, regional, and global scales (Ito and Sasai, 2006;Ito et al., 2007;Ito, 2008).
In the default run, parameters for evergreen needleleaf forests and deciduous broadleaf forests were used for the FJY and TKY sites, respectively.For the TSE and TMK sites, parameters for deciduous broadleaf forest (but different from TKY one) and deciduous needleleaf forest were used.The model was mainly calibrated by adjusting physiological parameters, such as the maximum photosynthetic rate and the respiratory temperature dependence (Q 10 ), such that agreement with observational GPP, RE, and NEE was improved in an empirical manner.

A4 Dynamic vegetation models
LPJ (Sitch et al., 2003) LPJ includes a dynamic biogeography sub-model, which determines the land cover implicitly from climate data, in addition to carbon and water cycle processes.The dynamic submodel simulates area-based competition for light and water availability among 11 plant functional types.This model has been widely used for various temporal and spatial scale models (e.g., Sitch et al., 2003Sitch et al., , 2008;;Reichstein et al., 2006;Vetter et al., 2007;Piao et al., 2009;Hashimoto et al., 2009).
We used temperate evergreen needleleaf parameters for FJY, temperate summergreen broadleaf forest parameters for TKY and TSE, and temperate summergreen needleleaf forest parameters for TMK in point simulations (i.e., the dynamic vegetation mode was off).In a spatial model run, the dynamic vegetation module was coupled.
In the model improvement process, we changed the quantum efficiency (the intrinsic quantum efficiency of CO 2 uptake in C3 plants) for photosynthesis to fit observed GPP seasonality.The parameter is one of the most important parameters to control carbon cycle in LPJ model (Zaehle et al., 2005).This modification significantly affected seasonal maximum GPP.Since GPP also affected the vegetation biomass, RE seasonality was also reasonably simulated by the model.

SEIB-DGVM (Sato et al., 2007)
SEIB-DGVM is an individual-based dynamic global vegetation model that simulates alteration of vegetation types and the processes that cause such change, namely establishment, competition, and mortality, in addition to simulating terrestrial carbon and water cycles.The model considers several sample forests or grasslands covering a small area (30 m×30 m) placed at each grid box, and it calculates growth and decay of individual trees by explicitly calculating tree height crown diameter, crown depth, and light availability for each tree.By doing so, we expect that the speed of alteration of one vegetation type relative to another will be represented reasonably without introducing any additional parameterizations.The model adopted plant functional types (PFTs) and parameters needed for some important processes used in LPJ-DGVM.The model is used for both point and global scale modeling (Kawamiya et al., 2005;Sato, 2009).In both point and spatial simulations, the dynamic vegetation module was used.
In the model improvement process, firstly, we fit phenology and photosynthesis related parameters to obtain reasonable composition of vegetation cover.Second, we adjusted photosynthesis related and settlement related parameters toward a better simulation of LAI, GPP, and NEP magnitude and seasonality.In the process of model improvement, we made the following adjustments in the parameters.Maximum, minimum, and optimal temperatures for photosynthesis were lowered by 3 • C for forest ecosystems and by 2 • C for grassland ecosystems.Maximum LAI was lowered for temperate deciduous broadleaf forest and boreal evergreen needleleaf forest, and it was increased for boreal deciduous needleleaf forest and boreal deciduous broadleaf for-8481 est.The specific leaf area was set at 0.6 times that of the original model.The rate of establishment of new plant functional types was halved.For spatial analysis, the maximum photosynthesis rate and light use efficiency were set 5% and 12.5% higher than those of the default setting, respectively.Regulation of the existence of two boreal PFTs (boreal needle-leaved evergreen and boreal broad-leaved summergreen) against air temperatures higher than 23 • C was inactivated.

TRIFFID (Cox et al., 2001)
TRIFFID is a dynamic global vegetation model that simulates the dynamics of the areal coverage of each plant functional type as well as carbon and water cycles at a sub-daily time scale.Sub-daily variation in climate variables is generated internally based on daily climate inputs.The model includes 5 plant functional types as vegetation classes.
The model is coupled with a land surface model (MOSES-2) (Essery and Clark, 2003) that calculates short time scale phenomena such as evapotranspiration and photosynthesis.The dynamic vegetation submodel (changes in the area of the plant functional types) is based on the Lotka-Volterra equation.The TRIFFID model has been used to project global environmental changes coupled with General Circulation Models (GCMs) (Cox et al., 2000;Matthews et al., 2005) and to compare terrestrial models at the continental (e.g.Piao et al., 2009) and global scales (Cramer et al., 2001;Sitch et al., 2008).We used the version included in the University of Victoria Earth System Climate Models (UVic-ESCM) version 2.8 (Cox et al., 2001;Matthews et al., 2005).
We fixed the vegetation type in the point simulations (i.e., the dynamic vegetation model was off) and used needleleaf forest parameters for FJY and TMK and broadleaf forest parameters for TKY and TSE.In a spatial model run, the dynamic vegetation module was coupled and used.In the model improvement processes, we applied six modifications: (1) the snow sublimation model was modified due to anomalous sublimation during the winter in the original model, (2) the efficiency of photosynthesis was enhanced due to underestimation of the seasonal amplitude of GPP in the original model, (3) the temperature sensitivity of photosynthesis was raised, (4) Q 10 for respira-tion was decreased due to high respiration during the summer in the original model, (5) the C/N ratio was changed depending on the vegetation type to closely simulate GPP, and (6) the effective soil depth for energy balance and soil temperature calculations was set shallower.Currently, only the results from the point simulation are available owing to technical problems.
, Saigusa et al. (2008), and Saigusa et al. (2009) identified anomalous carbon budgets for the spring of 2002 and the summer of 2003 through the analysis of climate data and eddy-covariance measurements for 2001-2003.For GPP, the general characteristics of the anomalous pattern in 2002 and 2003 were well reproduced for all sites.In the spring of 2002, positive GPP anomalies caused by warmer temperatures

Figure 1 .
Figure 1.Land cover of the study area based on MODIS land cover data (MOD12Q1; Friedl et al.,4

Fig. 1 .
Fig. 1.Land cover of the study area based on MODIS land cover data (MOD12Q1; Friedl et al., 2002) in year 2001 with the flux observation sites used in this study.Xs show the locations of the flux observation stations used in the study.Abbreviations of land cover classes: Evergreen Needleleaf Forest (ENF), Evergreen Broadleaf Forest (EBF), Deciduous Needleleaf Forest (DNF), Deciduous Broadleaf Forest (DBF), Mixed Forest (MF), Grassland (GR), and Cropland (CR).

Figure 4 .
Figure 4. Same as Fig. 2 but for improved models.

Fig. 7 .
Fig. 7. Average (a) and standard deviations (b) of annual total GPP for seven models and for each model (c-i).GPP from 2001-2006 were used to calculate the annual total.
The site is snow-covered from the end of November to the beginning of April, and the maximum snow depth is from 1.0 to 2.0 m.The TSE site is located in a conifer-broadleaf mixed forest within the Teshio Experimental Forest at Hokkaido University (45• 03 N, 142• 06 E) and is located in northern Japan.In 2002, the maximum snow depth was 1.16 m.The dominant species are deciduous broadleaf trees (Quercus crispula, Betula ermanii, Betula platyphylla var.japonica, and Acer mono) and evergreen needleleaf trees (Abies sachalinensis and Picea jezoensis), and an evergreen dwarf bamboo (Sasa senanensis) forms a dense undergrowth on the forest floor.The TMK site is a planted larch (Larix kaempferi) forest in the Tomakomai National Forest, which is managed by the Hokkaido Regional Office of the Forestry Terrestrial biosphere models are conventionally separated into four categories: empirical models, diagnostic models, prognostic models, and dynamic vegetation models.Empirical models use observations to construct the model with regression or other empirical algorithms.Diagnostic models use satellite-based time-variable observations • 08 N, 137 • 25 E).The site is about 15 km east of Takayama City.The forest is dominated by deciduous broadleaf trees (Quercus crispula, Betula ermanii, and Betula platyphylla var.japonica), and the forest floor is covered by an evergreen dwarf bamboo (Sasa senanensis).such as the vegetation index, Leaf Area Index (LAI) and Fraction of Photosynthetically Active Radiation absorbed by a canopy (FPAR), and climate data to capture accurate spatial and temporal patterns in the status of terrestrial ecosystems.Prognostic models use only climate data as time-variable inputs and are capable of yielding future projections.Dynamic vegetation models usually use only climate data as time-variable model inputs, and can simulate changes in spatial patterns in vegetation cover and vegetation types based on competition among different plant functional types.

Table 1 .
Flux observation sites used in this study.

Table 2 .
Details for the terrestrial biosphere models used in this study.Climate input: T, P, RH, VPD, Rad, and Wind denote air temperature, precipitation, relative humidity, vapor pressure deficit, surface shortwave radiation, and wind speed, respectively.Satellite input: LST, EVI, NDVI, LAI, and FPAR denote land surface temperature, enhanced vegetation index, normalized difference vegetation index, leaf area index and fraction of photosynthetically active radiation absorbed by a canopy, respectively.GR and MR denote growth and maintenance respiration, respectively.*1Theoriginal model uses cloud cover as climate data input instead of radiation.We modified the model to use radiation directly by removing the radiation calculation routine from cloud cover.*2The original model used NDVI to calculate FPAR and LAI.We modified the model to use satellite-based LAI/FPAR directly.* 3 LUE denotes light use efficiency.

Table 3 .
Simulated annual carbon budget in Japan from 2001-2006.Units are TgC year −1 .