A global carbon assimilation system based on a dual optimization method

Ecological models are effective tools to simulate the distribution of global carbon sources and sinks. However, these models often suffer from substantial biases due to inaccurate simulations of complex ecological processes. We introduce a set of scaling factors (parameters) to an ecological model on the basis of plant functional type (PFT) and latitudes. A global carbon assimilation system (GCAS-DOM) is developed by employing a Dual Optimization Method (DOM) to invert 5 the time-dependent ecological model parameter state and the net carbon flux state simultaneously. We use GCAS-DOM to estimate the global distribution of the CO2 flux on 1◦× 1◦ grid cells for the period from 2001 to 2007. Results show that land and ocean absorb −3.63± 0.50Pg C year−1 and −1.82± 0.16Pg C year−1, respectively. North America, Europe and China contribute −0.98± 0.15Pg C year−1,−0.42±0.08Pg C year−1 and−0.20±0.29Pg C year−1, respectively. The uncer10 tainties in the flux after optimization by GCAS-DOM have been remarkably reduced by more than 60 %. Through parameter optimization, GCAS-DOM can provide improved estimates of the carbon flux for each PFT. Coniferous forest (−0.97± 0.27Pg C year−1) is the largest contributor to the global carbon sink. Fluxes of once-dominant deciduous forest generated by BEPS is reduced to −0.78± 0.23Pg C year−1, being the third largest carbon sink. 15


Introduction
The spatiotemporal distribution of carbon sources and sinks has drawn much attention in global carbon cycle research as carbon dioxide is a major greenhouse gas.Techniques used to quantify the spatial pattern of carbon fluxes have evolved during the past decades, among which atmospheric inversion (see, e.g., Enting and Mansbridge, 1989;Law, 1999;Gurney et al., 2002;Rödenbeck et al., 2003;Deng et al., 2007;Deng and Chen, 2011;Jiang et al., 2013;Peylin et al., 2013) is one of the most commonly used techniques.
Atmospheric inversion uses CO 2 observations to infer the distribution of the carbon flux from global (Patra et al., 2005;Rödenbeck, 2005;Rayner et al., 2008;Maki et al., 2010) to regional scales (Gerbig et al., 2003;Peylin et al., 2005;Peters et al., 2007;Schuh et al., 2010).It involves an atmospheric transport model to link the measured CO 2 concentration in the atmosphere to the surface CO 2 flux.However, the measurements from sparsely located observational sites are not sufficient for estimating global carbon sources and sinks in fine grids.Enting (1995Enting ( , 2002) ) suggested using a prior flux to regularize the inverted flux based on the Bayesian theory, which is referred to as Bayesian synthesis inversion method (BSIM).The solution of BSIM usually corresponds to the minimum of a quadratic cost function in the least square sense under the assumption of Gaussian probability distribution functions (PDFs).
In BSIM, prior information is normally precalculated from an ecological model, e.g., the Carnegie-Ames-Stanford ap-proach (CASA) biosphere model (Gurney et al., 2003(Gurney et al., , 2004;;Baker et al., 2006), the Simple Biosphere model (SiB, Sellers et al., 1986) and the Boreal Ecosystems Productivity Simulator (BEPS) model (Deng et al., 2007;Deng and Chen, 2011).These process-based models are constructed to estimate carbon sources and sinks based on the mechanisms of photosynthesis, autotrophic respiration, organic matter decomposition and nutrient cycling.However, their estimates of carbon sources and sinks at regional scales often have substantial biases, and the purpose of atmospheric inversion is to reduce these biases using the additional information of atmospheric CO 2 concentration.Atmospheric inversion methods differ considerably in the inverted carbon flux distribution among large regions of the globe (Peylin et al., 2013), and therefore improvements are still needed in prior flux estimation and in optimization using atmospheric CO 2 data.
In consideration of the possible biases in the prior flux produced by an ecological model, Michalak et al. (2004) used "the model of the mean of the surface flux distribution" with unknown drift coefficients to substitute the prior flux in the BSIM.This geostatistical approach took into account the spatiotemporal correlation of the surface fluxes and hence can recover flux variations on a significantly smaller scale than typical Bayesian inversions.Different from Michalak et al. (2004), the studies of Peters et al. (2007Peters et al. ( , 2010)), Zupanski et al. (2007), Lokupitiya et al. (2008) and Schuh et al. (2010) introduced scaling factors to the prior flux from ecological models (e.g., SiB and CASA) to correct the biases.In these methods, a forecast model for the scaling factors is combined with an atmospheric transport model to realize the flux evolution over time.The choice of the forecast model is usually empirical.Most researchers defined an identity operator as the forecast model for the biases (Zupanski et al., 2007;Lokupitiya et al., 2008), while Peters et al. (2007Peters et al. ( , 2010) ) considered a more complex forecast model which combines the information of biases in two steps before the current time step.An ensemble Kalman filter (Evensen, 2007) is often used for estimating the unknown scaling factors and the posterior flux is the prior flux scaled by the estimated scaling factors.This ensemble-based assimilation method takes a relatively long time to warm up the system to reach a stable estimation of these scaling factors, and the filtering divergence (see, e.g., Houtekamer and Mitchell, 1998) that retards the converge of the estimate towards observations is still a problem.Zheng et al. (2014) proposed a dual optimization method (DOM) to estimate both the scaling factors (hereinafter known as parameters) of an ecological model and gridded carbon fluxes.DOM introduces a scaled ecological model designed by plant functional types (PFTs), and uses CO 2 observations to invert the unknown states of the parameters and net flux simultaneously.This is different from Michalak et al. (2004), which does not need to give prior estimates and hence does not rely on the information of ecological models at all.Moreover, DOM is an objective method which depends just on the information of concentration observations and the structure of the ecological model, but no forecast model is needed.The estimation precision of fluxes can be greatly improved by dual optimization, and the statistical properties of parameters and fluxes also provide useful information about the inversion accuracy.
As DOM inverts the flux for all regions and all times simultaneously using all observations at the same time, it requires substantial computational resources.Therefore, it is inconceivable to use DOM to estimate the global distribution of the carbon flux at high spatial and temporal resolutions.In this study, a moving-window method similar to that of Bruhwiler et al. (2005) is developed.Different from a batch model which uses all observations to invert fluxes for all source regions at all times simultaneously, Bruhwiler et al. (2005) adopted a temporal moving window and used the CO 2 concentration observations at the current time (the end of the window) to estimate carbon fluxes in the entire window.Considering that more observations will provide more information, we propose to use the observations in the entire time window to estimate the fluxes in this window instead of using only the observations at the current time.
Due to the difference in seasonal and meteorological conditions at different latitudes, we redesign the scaling factors by dividing the globe into several latitudinal zones.Each zone shares a set of scaling factors.The number of parameters assigned to each grid equals the number of PFTs in the grid so that one parameter is associated with one PFT.This is different from CarbonTracker (Peters et al., 2007(Peters et al., , 2010) ) in which each grid is assigned to one category based on the dominant vegetation type.On the basis of the above settings, we build a global carbon assimilation system (GCAS-DOM) by combining DOM with an atmospheric transport model (MOZART-4).The forecast of the assimilation system is embodied in updating the background concentration field.At each step, the background CO 2 concentration is updated by running MOZART-4 forced forward with the optimized flux at the last step.Finally we use the GCAS-DOM to estimate the worldwide weekly flux in 1 • ×1 • grid cells for a relatively long period of 7 years.Results show its accuracy in flux estimation and significant effect in uncertainty reduction.
The objectives of this study are (1) to develop a global carbon assimilation system using DOM, i.e., GCAS-DOM, for the purpose of improving the estimation of the global distribution of the carbon flux, (2) to produce with GCAS-DOM a global carbon flux field on 1 • × 1 • grid cells from 2001 to 2007 and analyze the flux in terms of its long-term mean and interannual variations for the globe and selected large regions and (3) to investigate the impacts of atmospheric CO 2 data on the estimation of the carbon flux per PFT for the evaluation of ecosystem models.This paper is organized as follows.Section 2 consists of detailed descriptions on each component of the GCAS-DOM.It begins with the introduction of state variables in Sect.2.1.Then in Sect.2.2, we will show the procedure of building the GCAS-DOM by using a moving-window method.Section 2.3 presents the estimation method of state variables in a window.The calculation of the uncertainties is given in Sect.2.4.In Sect.3, we undertake a process for estimating the global flux in 1 • × 1 • grid cells starting with a detailed introduction to models and data use in GCAS-DOM, followed by estimated quantities and their uncertainties.Finally, we summarize our results and discuss future directions of our work in Sect. 4.

Methodology
GCAS-DOM consists of three major components: an ecological model and an atmospheric transport model, a moving window and the optimization module.The ecological model provides the first guess of the flux before data assimilation.The atmospheric transport model links the flux to the CO 2 mixing concentration ratio.Considering the computational feasibility, we use a temporal moving window in which the flux is optimized using the optimization algorithm DOM.

State variables
The ecosystem model is formed to simulate the variations of carbon sources and sinks based on the mechanism of carbon cycling.As improperly simulated ecological processes could result in biases in the flux, we consider a scaled ecosystem model similar to that of Lokupitiya et al. (2008).But different from Lokupitiya et al. (2008), which adjusts ecosystem respiration (ER) and gross primary productivity (GPP) using separate scaling factors, only the net ecosystem exchange (NEE) defined as the difference between ER and GPP is scaled.This is because both ER and GPP are much larger than the net ecosystem production (NEP) fluxes by approximately 1 order of magnitude; adjusting their separate influence could lead to spurious variations.Moreover, the strong correlation between ER and GPP could result in poor performance in stability.Hence the parametric model can be represented as where x and y denote the spatial coordinates; s is the unknown flux aimed to estimate; s OCE is the first-guess ocean flux computed from an ocean exchange model; s NEE is the first-guess biospheric flux estimated from a terrestrial ecosystem model; s FF and s FIRE are fossil fuel and fire fluxes estimated from inventory-based emissions; λ NEE and λ OCE are scaling factors applied to the land surface flux and the ocean flux, respectively; and ε is the model error.To simplify this expression, we use its vector form, where all the variables are n × 1 vectors and n denotes the number of the grid cells over the globe; the "•" (dot product) represents the element-by-element multiplication of two vectors with the same dimension unless one is a scalar; and ε is the model error with zero mean and covariance matrix Q.
Here, the parameter vectors (λ NEE , λ OCE ) and s are treated as state variables and called parameter states and flux states, respectively.Zheng et al. (2014) suggested specifying the structure of parameters according to PFT to avoid over-adjustment or excessive computation.In consideration of the fact that (1) the seasonal variation in climate in the North Pole is opposite that in the South Pole and (2) the tropical rainforest has high temperature all year around, it is not effective to specify parameter states just according to PFT.In this study, we divide the globe into q zones according to latitude and assume that the vegetation distribution is mapped onto p PFTs.Thus a grid box can contain up to p + 1 different types (p PFTs and one oceanic type) quantified with an areal fraction for each PFT in the grid.
We decompose the flux in each grid box into p +1 components with each denoting the flux generated from one PFT.To facilitate the expression, we use s m,j for the gridded flux in the j th latitude zone computed from land and oceanic models, and it is denoted as follows: where s j OCE is a vector for the oceanic component and s j NEE,i is a vector for the terrestrial component for the ith PFT.Gridded fluxes at the same latitude zone share the same set of parameters and thus the corresponding parameter for the s m,j is where each element is a scalar used to scale the corresponding column vector of s m,j .Then model Eq. ( 2) can be rewritten as where s m , referred to the prior flux, is the reshaped form of the flux computed from the ecosystem model in the order of latitude T is a set of scaling factors with (p + 1)q unknown components; ε is the model error with zero mean and covariance matrix Q.In Eq. (4), as s FF and s Fire are imposed without optimization, their contributions to concentration can be subtracted from the observation concentrations directly.Then model Eq. ( 4) can be written in a simplified expression: (5)

Time-stepping
In the application of GCAS-DOM, one of the major difficulties in estimating the carbon flux is the computational cost at high resolution.For the estimation of weekly fluxes on 1 • × 1 • gird cells, the dimension n in Eq. (2) will be 64 800 (180 × 360) for each week.That is about 3 130 400 (64 800 × 48) unknowns per year, and the relevant cost of matrix operations will be at least 3 130 400 2 which is an immense computational burden.To overcome this difficulty, we adopt a method similar to that of Bruhwiler et al. (2005).At each time t, we use the observations of CO 2 concentration and the carbon flux in the time window between t and t + l − 1, where l is window length which could be in days, weeks or months.This is different from Bruhwiler et al. (2005), where only the observations at time t +l −1 are used.We therefore have a (t, l)-window, which uses the CO 2 concentration observations {c t+k , 0 ≤ k ≤ l − 1} and the carbon flux {s t+k , 0 ≤ k ≤ l − 1} at each time point t, where the column vector c t+k represents the observed CO 2 mixing ratios of a given site at t + k, and the column vector s t+k is the global carbon flux in the time period from t + k − 1 to t + k.
The time-stepping in the assimilation scheme is illustrated in Fig. 1.The light shaded boxes represent the prior flux at each step computed by the ecosystem model.The dark shaded boxes stand for the optimized flux.We now describe one cycle of GCAS-DOM.The first step is to use the background CO 2 concentration C(t −1) as the initial value, which is a 3-D matrix for the spatial distribution of CO 2 concentrations at each latitude, longitude and elevation.Then we run l steps of the transport model forward starting from C(t − 1) to get the spatial distribution of CO 2 concentration in the (t, l)-window.We keep the spatial carbon concentration patterns at all times in this window, which gives {C(t), . .., C(t + l − 1)}, and extract CO 2 mixing ratios at observation sites as {c b t , . .., c b t+l−1 }.The second step is to estimate the optimized parameters { λt+k , 0 ≤ k ≤ l − 1} and fluxes {ŝ t+k , 0 ≤ k ≤ l − 1} using the resulting mixing ratios at sites {c b t , . .., c b t+l−1 }, the observations of CO 2 concentrations in the window {c t , . .., c t+l−1 }, and the prior flux in the window {s m t+k , 0 ≤ k ≤ l − 1}.The estimation method is introduced in the next section.The optimized parameter λt and flux ŝt do not need to be estimated in the next cycle and are therefore used as estimates of the parameter and flux at time t.In the third step, we run the transport model one step forward starting from C(t −1) forced with the optimized flux ŝt to get the updated spatial distribution of concentration C (t).Then we use observed CO 2 concentration to assimilate the C (t) instead of directly using it as the background concentration at time t for the next cycle in previous studies.We extract updated CO 2 concentration at locations of CO 2 ob-servation sites from the C (t) and compare it to the observed concentration c t at time t.A constant adjustment, which is computed from the site-averaged difference between the above two vectors, is imposed on C (t) to get an optimized spatial pattern C(t) at time t. 1 In the fourth step, we move the window one step forward so a new flux s m t+l and a new concentration observation c t+l are read in to the system for the next computational cycle, which begins from background CO 2 concentration C(t).

Adaptive DOM
In this section, we introduce the method for estimating parameters and the carbon flux in a window.Zheng et al. (2014) proposed a DOM to improve the accuracy of the optimized flux and successfully applied it to the inversion of the flux for the globe divided into 50 regions.In this study, we expect to use DOM in each (t, l)-window.As the fluxes computed for different PFTs are often correlated, direct application of the DOM to flux inversion at a high resolution will result in many abnormal estimators of parameters and large uncertainties of both parameters and fluxes.Therefore, we propose an adaptive version of DOM by adding additional regularization of scale factors which is referred to as a stochastically constrained equation (Theil and Goldberger, 1961): where 1 is a vector with all elements equaling 1 and ζ is the random error of the regularization with E(ζ ) = 0, and the dispersion matrix var(ζ ) = W. Then we will present the adaptive DOM in a (t, l)-window.To facilitate the discussion, we first introduce two denotations: (1) the observations of CO 2 concentration in (t, l)window is denoted by a vector and named as the (t, l)-window observation concentration, (2) the flux is denoted as and named as the (t, l)-window flux.The (t, l)-window observation concentration c (t,l) contains information from two sources: the (t, l)-window flux s (t,l)  and concentration transported from the previous time step C(t − 1).We let c w(t,l) be the CO 2 concentration determined by s (t,l) , and refer it as (t, l)-window flux concentration.In fact, c w(t,l) is the difference between window observation concentration c (t,l) and {c b t , . .., c b t+l−1 } (mentioned in Sect.2.2).Then the c w(t,l) follows that 1 The correction is based on the idea that the optimized concentration should match the actually observed concentration.
where ε (t,l) is the error of window concentration observation, and is the (t, l)-window atmospheric transport matrix.It describes the contribution of the window flux to the observation sites.Each submatrix G m,n represents the influence of the flux (normalized to 1 g C) at time n on the concentration at observation sites at time m.
In a (t, l)-window, we minimize the following objective function ( 11) to obtain the optimized (t, l)-window flux.This function is similar to that of DOM but with an extra penalty term, so it is called the adaptive DOM.To simplify the expression, all subscripts (t, l) are omitted here.
where s m = diag s m t , . .., s m t+l−1 is the prior fluxes for the (t, l) window, Q is the error covariance matrix of the corresponding prior fluxes, R is the covariance matrix of the window concentration observation error η and W is the variance of constrained error.
Solving for the minimum of cost function Eq. ( 11) with respect to s and λ is similar to the process in DOM.The solutions are given by the following two equations (see Appendix A for details): where As the estimation of λt and ŝt uses the largest number of observations, it has the highest accuracy.We therefore use λt and ŝt as the optimized parameter and carbon flux at time t.

Calculation of uncertainty
The estimators given by Eq. ( 12) have the following uncertainties (see Appendix A for details): Note that the uncertainty of the parameter estimator is incorporated into the variance of estimated fluxes.
In this section, we use the GCAS-DOM to estimate the weekly carbon flux from 2001 to 2007 on 1 • × 1 • global grid cells.The assimilation system usually needs a spin-up period.Therefore the assimilation is conducted from 2000 to 2007 with the first year as a spin-up period, and the results from 2001 to 2007 are used for analysis.The initial concentration is set as a globally uniform 3-D CO 2 field of site-averaged concentration in the last week of 1999.

Ecological model
We divide the globe into 1 • ×1 • grid cells, 64 800 (360×180) grids in total.GCAS-DOM uses the BEPS (Liu et al., 1997) as the terrestrial ecosystem model.BEPS simulates photosynthetic and carbon cycle processes (Chen et al., 1999;Ju et al., 2006) based on remote sensing, meteorological and soil data with a set of physical and physiological parameters related to PFT.This model is initially developed for North America and then expanded for global applications.The terrestrial prior fluxes are modeled by BEPS at the resolution 1 • × 1 • and the weekly average values are used to avoid the problem of the diurnal cycle.The weekly oceanic flux at 1 • × 1 • spatial resolution is obtained from CarbonTracker 2010 (CT2010 2 ) results (available via http://www.esrl.noaa.gov/gmd/ccgg/carbontracker/download.html).
In BEPS, vegetation is mapped onto six PFTs including coniferous forest, deciduous forest, evergreen forest, shrubland, C4 vegetation and "other vegetation".A grid cell can contain up to seven different cover types (six PFTs and one ocean type) with their corresponding coverage fraction.We divide equally the globe excluding China into 30 zones by latitude and each spreads a range of 6 • .China is separately split into six zones and each spreads a range of 6 • as well.Thus we yield a total of 30 + 6 = 36 zones (see Fig. 2).In each latitude zone, there are six PFTs and one ocean type.As PFTs vary slowly in a short amount of time, we assume that they are time independent within a window.Thus, we have 7 × 36 = 252 parameters (1 parameter corresponds to one PFT in a zone) to be estimated at each time step.The model error covariance matrix Q for the prior flux is treated using the same principle in Zhang (2013) based on the theory of statistics.
The constrained matrix W (Eq. 11) for the scaling factor is defined as a diagonal matrix with each item W ii defining the degree of deviation from 1.The smaller the value is, the closer the parameter and 1 are.Conversely, the parameter can be more influenced by other information such as CO 2 measurements.We set an initial interval of [0.7, 1.3] as the range of the scaling factor λ, as the preferences of BEPS are basically reasonable.According to the 3σ principle, the standard deviation (SD) of parameters is set to be 0.1 (i.e., variance of 0.01).However, the results of regions excluding China (e.g., 2 CT2010 is a earlier version of CarbonTracker released in 2011.

Background fluxes
In the process of making inferences about flux from ecosystems, we need to exclude the contribution of other CO 2 fluxes such as fire and fossil fuel emissions to observed concentrations.They are not perfectly known and but also not the target of this study.Their information is included in the observation data we use.As mentioned in Sect.2.1, we do not include any parameters concerning fossil fuel and fire fluxes in the optimization.So the contribution of fossil fuel and fire emissions needs to be extracted from the window flux concentration.
Then the window flux concentration excluding the influence of fire and fossil fuel is used in the process of ecosystem flux optimization.Although the fire and fossil fluxes are excluded from our optimization, their uncertainties should be considered in the observational error.Therefore, we included an extra contribution of (0.175 ppm) 2 in the observational error (see Eq. 14).
The fossil fuel and fire fluxes are from the CT2010 results on 1 • × 1 • resolution.The annual summary of fossil fuel and fire emissions is listed in Table 1.

Atmospheric transport model
The carbon fluxes of the Earth's surface at a certain time affect the CO 2 concentration observed in a subsequent time period in the atmosphere.Therefore, we can use the atmospheric CO 2 concentration to invert the historical distribution of carbon fluxes.Atmospheric transport models are generally used to describe the process of surface fluxes spreading into the atmosphere.The commonly employed transport mod- els include MUGCM (Law, 1993), NCAR (Erickson et al., 1996), TM5 (Krol et al., 2005) and MOZART-4 (Emmons et al., 2010).We will use MOZART-4 in our study as its implementation is flexible.MOZART-4 divides the space from the Earth surface to a height of 2 hPa into 28 vertical sigmapressure layers, and its horizontal resolution can be adjusted according to the capacity of computers.The highest resolution by far has been 0.7 • × 0.7 • .We use the meteorological data from the National Centers for Environmental Prediction (NCEP) reanalysis data (http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanalysis.html).
This model here is used in two forms.In its full form, the assimilation is done by running forward with the optimized flux state at the previous time step to update the historical space concentration at the current time.In its simplified form, the model is slightly reduced by leaving out the influence of window flux on the site concentrations, and is shown as a transport matrix (see Eq. 10).

Concentration data
Weekly average observations of CO 2 concentration are from GLOBALVIEW-2011 data set (http://www.esrl.noaa.gov/gmd/ccgg/obspack/data.php).These data consist of pseudo-weekly interpolation CO 2 concentration data measured at 312 global sites.The map of stations is shown in Fig. 3.It should be noted that we used 312 sites in our assimilation system while CT2010 only used about 100 sites (available from http://www.esrl.noaa.gov/gmd/ccgg/carbontracker/CT2010/documentation_obs.html#ct_doc).So nearly two-thirds of observational data are independent from the ocean fluxes we use as an input (mentioned in Sect.3.1).
As the residual standard deviation (RSD) of the CO 2 concentration data given by the var files in GLOBALVIEW-2011 data set is in months, we convert it into weekly values by linear interpolation, and impose a floor of 0.175 ppm on the data uncertainty using the equation (Deng et al., 2007) where 0.175 ppm is the system error at each site.

Window length
The choice of the window length is an important issue in assimilation systems.A longer window size means more overlapping of transport integrations and larger calculation demand.However, a small window size will cause significant errors.Peters et al. (2005Peters et al. ( , 2007Peters et al. ( , 2010) used a 5-week smoothing window.Here, we choose a 6-week smoothing window, which is sufficiently long for the fluxes to transmit around the world.
As scale factors vary much more slowly than the fluxes themselves (Zupanski et al., 2007), it is reasonable that the scale factor is time-independent within a 6-week window but varies among different windows.Therefore, the unknown states aimed to estimate involve 252 parameters and 388 800 (64 800 grids × 6 weeks) fluxes for each 6-week step.

Results
In this section, we will firstly show the variations of estimated scaling factors over time.Then the total optimized flux and its uncertainties will be summarized to compare with those of the prior flux and results from previous studies.We focus on the result of three large regions of North America, Europe and China.Moreover, we further study the quantities and seasonal variations of fluxes for six PFTs.The spatial distribution of the optimized flux is shown on a map of 1 • × 1 • grid cells.We also show the fit of the optimized concentrations to the observation concentrations to evaluate the system.

Optimized parameters
Figure 4 shows the results of the scaling factors for six PFTs and an oceanic type in the latitude zone spread from 24 • N to 30 • N excluding China.The estimators fluctuate around 1 with small volatility.If the value is larger than 1, it means that the absolute value of the prior flux is underestimated and therefore needs to be multiplied by a factor of more than 1 to increase its value.On the contrary, an estimator smaller than 1 indicates a decrease of the absolute value of the flux.From the time series of weekly estimates, most of the parameters show annual periodicity and the scaling factors of coniferous type indicate opposite "swings" in contrast to other PFTs.The scaling factors of deciduous and evergreen types have lower amplitudes than those of the remaining types.

Global Carbon Budget
We compare the optimized total flux (excluding fire and fossil fuel emissions, default hereinafter unless otherwise specified) with the prior flux and the results of CT2011_oi which is a newer version of CarbonTracker released on 28 June 2013 (Fig. 5).The terrestrial fluxes make a major contribution for the years 2001 to 2007 before and after optimization.Before optimization, the annual average terrestrial and oceanic fluxes are −3.10 and −1.62 Pg C yr −1 , respectively.GCAS-DOM increases the uptake in land and ocean by a mean value 0.53 and 0.20 Pg C yr −1 , respectively, over 2001-2007.Therefore the total annual ecosystem sinks show a significant increase mainly due to the increase in the terrestrial sink during the 7 years.As the oceanic prior flux is derived from the optimized results of CT2010, the oceanic fluxes before and after optimization are very similar.Even so, the optimized oceanic flux is still closer to the results of CT2011_oi compared to the prior flux.
The optimized result indicates that the terrestrial ecosystems and oceans respectively absorb an average of 3.63  (2011), which are on average 3.63 and 1.94 Pg C yr −1 for terrestrial and oceanic sink, respectively, for the years 2002-2007.We then further compare the net land sink and oceanic sink in our study to that of the Global Carbon Project (GCP, Table 2).The Global Carbon Budget 2013v2.3 (Le Quéré et al., 2014) is the newest version released on April 2014 by the GCP.The net land sink of GCP is calculated by the difference of land sink and land-use change emissions in Global Carbon Budget 2013 v2.3, while that of the GCAS-DOM is computed by the difference between the terrestrial sink (Fig. 5) and fire emission in Table 1.The GCP generates larger oceanic sinks than GCAS-DOM, with the smallest gap of 0.25 Pg C yr −1 in 2001 and largest difference of 0.92 Pg C yr −1 in 2002.For the net land sink, the largest difference occurs in 2002 when the GCP releases 0.52 Pg C yr −1 from land while the GCAS-DOM maintains a land uptake of 1.06 Pg C yr −1 .The 6-year mean of the net land sink excluding the year 2002 in our study is 1.38 Pg C yr −1 , which is close to 1.29 Pg C yr −1 in GCP. Figure 5 shows that the total sink in land and ocean varies considerably between years, and the variation is mostly due to the land sink.GCAS-DOM sink results are usually larger than the prior value, indicating that the prior flux underestimates the land sink.The multiyear mean values of GCAS-DOM and CT2011_oi are about the same, but they differ to some extent in individual years, suggesting that different data assimilation methods can result in considerable difference in the optimized carbon flux.
From the point of interannual variabilities, the ocean flux shows much smaller variability than land flux, revealing that the ocean sink pattern is stable.The interannual variation of the land sink suggests a notable correlation with climate change.The optimized annual flux of GCAS-DOM detects an anomaly in 2005 which shows the smallest sink.This could be mainly attributed to a continuing drought from July to September in the Amazon that affected plant growth, and high temperatures in 2005, the existence of which intensifies ecosystem respiratory activities (Deng and Chen, 2011).The relatively weak sinks in 2002 and 2007 may be related to the El Niño-Southern Oscillation events in 2002-2003 and 2006-2007, respectively, that cause anomalies in precipitation causing droughts in some regions.
Before optimization, we use an uncertainty of 1.98 Pg C yr −1 for the land flux, and an uncertainty of 0.93 Pg C yr −1 for the oceanic flux, resulting in a total uncertainty of 2.18 Pg C yr −1 for the globe.Table 3   see different levels of uncertainty reductions for land and ocean.The uncertainty of the globe is significantly reduced by about 75-80 % and ocean uncertainty has a slightly larger reduction than the global value.It is mostly due to the stronger constraint by the elongated clustered observation sites over the Pacific Ocean (see Fig. 3).The uncertainty reductions of ocean and land respectively stabilize at around 82 and 75 % for the years 2001-2007.

Regional Carbon Budget
We further analyze three large regions: Europe, North America and China.As shown in Table 4, GCAS-DOM respectively increases the sink by 0.14 Pg C yr −1 for Europe and 0.31 Pg C yr −1 for North America compared to the prior flux for the 7-year mean.The uncertainties before optimization (0.44 and 0.86 Pg C yr −1 for Europe and North America, respectively) are reduced to 0.08 and 0.15 Pg C yr −1 , respectively.The uncertainty reductions for these two re-gions are remarkably large at about 80 %, possibly because the atmospheric CO 2 is densely observed in these two regions.In Europe, the carbon sink from our study (−0.42±0.08Pg C yr −1 ) is higher than CT2011_oi (−0.Although the change in the sink in China before and after optimization is small, the uncertainty reduction is about %, which is smaller than of Europe and North America because of relatively limited atmospheric data observed within and around China.The variations of fluxes before and after optimization are shown in Fig. 6.With minor fluctuations, the carbon uptake of Europe has an increasing trend before 2004, and then decreases after 2005.Similar temporal trends are also found in North America.In the first 4 years, the carbon sink in China is stable around −0.22 Pg C yr −1 and slightly decreases from 2005 to 2007.The uncertainties of optimized fluxes for three regions vary slightly from year to year and are remarkably reduced from those of the prior fluxes.Fluxes for each PFT Our gridded inversion system at 1 • resolution affords us the possibility to analyze the impacts of atmospheric CO 2 data on the estimation of the carbon sink by PFT. Figure 7 shows the annual mean terrestrial flux for six PFTs.Prior stands for fluxes simulated by BEPS consisting of six PFT components with corresponding coverage fractions in each grid, while GCAS-DOM represents fluxes optimized by GCAS-DOM and the statistics are based on the principle that each 1 • × 1 • grid box is assigned to a single category according to the locally dominant PFT.As shown in Fig. 7, the or-der of the sink magnitudes of different PFTs is altered after optimization.The carbon flux of once-dominant deciduous forests is reduced from −0.93 to −0.78 Pg C yr −1 .After optimization, largest net uptake is shown in regions dominated by coniferous forest (−0.97 ± 0.27 Pg C yr −1 ) and is increased by 118.20 %.As the coniferous forest is mainly distributed in North America, Europe and part of Russia, this results in the notable increase of the sinks in North American and Europe (Table 4).This large increase in the sink magnitude for conifer from the prior estimate suggests that the ecosystem model considerably underestimates the sink for this PFT.Other vegetation (−0.86 ± 0.20) and deciduous forest (−0.78 ± 0.23 Pg C yr −1 ) are respectively the second and third PFTs in terms of their total sink magnitude.Evergreen forests, mostly located in the Southern Hemisphere, absorb −0.72 ± 0.22 Pg C yr −1 on average.Relatively speaking, shrubland (−0.16 ± 0.12 Pg C yr −1 ) and C4 vegetation (−0.25 ± 0.13 Pg C yr −1 ) make the least contribution to the total global carbon sink.The slight changes in the sink magnitudes of shrubland and C4 vegetation before and after optimization suggest that BEPS provides nearly unbiased sink estimates for these two PFTs.The sink magnitude of the other vegetation is modified greatly by optimization, suggesting BEPS does not work well for all other land cover types  lumped into this PFT.One way to improve BEPS would be to introduce more PFTs.Through this analysis, we show that GCAS-DOM has provided a useful model framework to evaluate an ecosystem model by PFT, and it can poten-tially provide directions for further development of ecosystem models.
To further investigate the seasonal variation of the carbon flux, we compare the optimized weekly fluxes to the prior  fluxes by PFT.For this purpose, we select the results of coniferous forest and other vegetation (Figs. 8 and 9) as fluxes by these two types present the largest change after optimization among all PFTs.All the time series exhibit pronounced seasonality, and the Northern Hemisphere and Southern Hemisphere show the opposite seasonal patterns.In the Northern Hemisphere, the optimized flux of coniferous forest shows a general shift towards larger sinks in all seasons than those of the prior flux.After optimization, greater net uptake is found in the growing season and a smaller net source in au-tumn and winter.In the Southern Hemisphere, the optimized flux shows a smaller seasonal amplitude than the prior flux with departures from the prior occurring in winter and summer.Note that the sink magnitude is much smaller than that of the Northern Hemisphere, and therefore the optimization of the conifer flux in the Southern Hemisphere does not make much difference in the overall sink estimate.For other vegetation, similar deviations of the optimized flux from the prior flux in June through September are observed, but fluxes in other months show good agreements.In the Southern Hemi- sphere, the optimized flux presents larger amplitudes than the prior flux, and this is opposite the case of coniferous forest.

Spatial distribution of fluxes
Figures 10 and 11 show the long-term mean spatial pattern of the flux on 1 • × 1 • grid cells before and after optimization.This flux does not include the carbon emission due to fires, and the net land sink is those shown in Figs. 10 and 11 minus fire emission.The uptakes over boreal Asia, Europe and southeastern Canada have been greatly increased by GCAS-DOM, while the sink in tropical South America is slightly reduced after optimization.For the oceanic flux, a slight decrease of the source is found in equatorial areas of the Pacific, the Atlantic and the Indian oceans.The results of this study show that relatively large sinks are located in the northern hemispheric continents, and tropical continental areas.The northern continental areas from 30-90 • N contribute the largest sink of −2.07 Pg C yr −1 .Next, the continental areas in the range of 30 • S-30 • N contribute a sink of −1.68 Pg C yr −1 .Intense sinks mainly appear in the eastern US, Europe, tropical South America, tropical Asia and central Africa.Southern continental areas (30-90 • S) show an approximately neutral flux.For the ocean, carbon uptake is distributed relatively evenly between north (30-90 • N) and south (30-90 • S), while the region 30 • S-30 • N generates a weak source of 0.33 Pg C yr −1 .

Fit to CO 2 concentrations
The fit of the simulated CO 2 concentration by GCAS-DOM to the observed concentration is an important aspect for overall evaluation of optimization.To evaluate the performance of GCAS-DOM, we run MOZART-4 forced forward by the prior flux and optimized flux and compare the simulated time series of CO 2 concentrations to the observed concentrations.We integrate the concentration data of all the 312 sites for 7 years to a series with 104 832 (312 × 48 weeks × 7 years) elements and draw the simulated vs. observed scatterplot (Fig. 12).The blue points show an upward departure from the one-to-one line, indicating that the simulated concentrations with the prior flux are overestimated.The RMSE between the simulated and observed concentrations of the 119 808 weekly data points items is significantly reduced from 5.58 ppm to 2.76 ppm after optimization.The correlation between the simulated and the observed concentration is also improved after optimization with R 2 increasing from 0.64 to 0.80.This suggests that the optimized flux is a significant improvement over the prior flux.
Generally speaking, the simulated concentration at sites in the Northern Hemisphere shows better agreement with the observed concentration than the sites in the Southern Hemisphere.We present the seasonal cycles fitted to the simulated and observed concentration time series of two sites in Fig. 13.At Dahlen, the simulated concentrations based on the opti-mized flux follow closely the observed values.However, the simulated concentrations based on the prior flux show an upward drift from the observed concentrations especially in the last few years.This indicates that the prior flux is biased and the cumulative effect of this bias will get progressively larger over time.This result is consistent with the viewpoint that the prior sink value is underestimated.Moreover, the green points present a seasonal cycle with smaller amplitudes.This may be due to the shortcoming in the terrestrial biosphere model which may not well describe the seasonal cycle of ecosystem processes.
At Mace Head, the simulated concentrations with the optimized flux deviate less from the observations in winter than in summer.This inability of the optimization procedure to capture the depth of summer carbon drawdown by photosynthesis was also found in CarbonTracker North America and Europe (Peters et al., 2007(Peters et al., , 2010) ) and in a carbon cycle assimilation system based on the Biosphere Energy Transfer Hydrology model (CCDAS, Rayner et al., 2005).One common problem would be that biospheric models tend to underestimate the carbon sink in summer and this bias is not fully rectified in the optimization process because of insufficient atmospheric CO 2 data and the significant model-data mismatch errors in the CO 2 observation.Nevertheless, the optimized concentration is still a large improvement over the case of the prior flux.In addition, it should be noted that some discontinued high anomalies in the simulated concentration with the prior flux have been remarkably ameliorated after optimization.
We also investigate, by week, the overall quality of 312 sites used in our system.In Fig. 14, week-by-week residuals (simulated minus observed) are made to assess the bias of the optimized CO 2 field against the observations.The errors averaged by 312 sites can be controlled to within about ±0.51 ppm, indicating a satisfactory performance of our assimilation system.However, an obvious seasonal cycle is identified in the residual series.This is mainly caused by the generally worse fit to observed concentrations at the sites in the Southern Hemisphere.Although the residual error is small, the clear seasonal pattern of the residual error indicates that there is still some useful information in the CO 2 data that are not fully utilized.The inability of BEPS to simulate the large summer sinks may be part of the reason because the bias in summer is not fully corrected through optimization (as shown in Fig. 13).Our study therefore suggests that efforts should be made to improve the prior flux estimation not only in terms of the annual sink magnitude but also the seasonal sink pattern.years 2001 to 2007.This newly developed system combines the ecological model BEPS, atmospheric transport model MOZART-4 and observations of CO 2 concentration to optimize the optimize the parameter and carbon flux simultaneously.In consideration of errors in climate data and the structure of BEPS, we design a set of inflation parameters for optimization according to latitude and plant function type in BEPS, resulting in 252 parameters at each time step.The 1 • × 1 • for flux estimation at the global scale in our study is higher than those in previous studies and therefore it significantly advances our understanding of regional carbon cycles.To reduce the computational demand, a movingwindow method is used in the system so as to obtain timevarying parameters and fluxes.
Our optimized results show that the mean terrestrial and oceanic carbon fluxes over the period of 2001-2007 are −3.63 ± 0.50 and −1.82 ± 0.16 Pg C yr −1 , respectively.North America, Europe and China contribute −0.98 ± 0.15, −0.42±0.08 and −0.20±0.29 Pg C yr −1 , respectively.Large sinks are mainly located in the Northern Hemisphere and tropical continental areas.Moreover, the uncertainties of carbon fluxes are notably reduced by more than 60 % after optimization for the globe and aforementioned three regions.
Coniferous forest, deciduous forest, evergreen forest, shrubland, C4 plants and other vegetation contribute to the global carbon flux at −0.97 ± 0.27, −0.78 ± 0.23, −0.72 ± 0.22, −0.16 ± 0.12, −0.25 ± 0.13 and −0.86 ± 0.20 Pg C yr −1 , respectively.The optimized flux of conifer differs most from its prior, indicating that the biospheric model BEPS might have the largest error for this PFT.Shrubland and C4 vegetation show only slight changes from the prior after optimization.In terms of seasonal variation, the optimized flux shows larger uptake in growing season than the priors for coniferous forest and other vegetation in the Northern Hemisphere.In the Southern Hemisphere, the optimized flux of coniferous forest shows reduced amplitude from its prior, while the opposite occurs for other vegetation.
After the flux optimization by GCAS-DOM, the agreement between the simulated and observed CO 2 concentrations is greatly improved (R 2 increased from 0.64 to 0.80, and RMSE reduced from 5.58 to 2.76 ppm).However the residual differences between simulated and observed concentrations show some seasonal structure, indicating that some deficiency in the prior flux that has not been rectified in the optimization process.Since atmospheric CO 2 data are sparse, errors in the biospheric model used to produce the prior flux can propagate to the final optimization results.Further efforts are needed to improve photosynthesis and respiration calculation in BEPS in order to reduce the biases in the flux estimation in both summer and winter.

Figure 1 .
Figure 1.Illustration of three cycles in GCAS-DOM in which a state vector composes of the flux at l steps.

Figure 2 .
Figure 2. The partition the globe into zones.

Figure 3 .
Figure 3.The distribution of 312 stations used in this study.The x axis and y axis stand for longitude and latitude respectively.The asterisk symbol (*) represents the location of sites.

Figure 4 .
Figure 4.The results of optimized weekly scaling factors in the 20th latitude zone, where Coni stands for coniferous forest, Deci for deciduous forest, Evgn for evergreen forest, Shrub for shrubland, C4 for C4 vegetation and Other for other vegetation.Blue lines are estimated parameters, while red lines are constants which equal 1.Note that different scales are used.

Figure 5 .Figure 6 .Figure 7 .
Figure 5. Annual fluxes for the land and ocean from 2001 to 2007 in comparison with results of CT2011_oi."Prior" and "GCAS-DOM" are the fluxes before and after optimization.Black horizontal lines denote the uncertainties of global flux.The uncertainties of global flux from CT2011_oi are on average about 6 Pg C yr −1 which is not shown in this figure.All units are in Pg C yr −1 .Fossil fuel and fire emissions are excluded here.

Figure 8 .
Figure 8. Weekly fluxes for coniferous forest.(a) Northern Hemisphere (b) Southern Hemisphere.All units are in g C yr −1 .Note that different scales are used.

Figure 9 .
Figure 9. Weekly fluxes for other vegetation.(a) Northern Hemisphere (b) Southern Hemisphere.All units are in g C yr −1 .Note that different scales are used.

Figure 10 .
Figure 10.The average annual pattern of the prior flux for the years 2001-2007 excluding fossil fuel and fire emissions (in g C m −2 yr −1 ).

Figure 11 .
Figure 11.The average annual pattern of the optimized flux by GCAS-DOM for the years 2001-2007 (in g C m −2 yr −1 ).

Figure 12 .
Figure 12.Comparison between observed concentration and simulated concentration.

Table 1 .
Annual fossil fuel and fire emissions across 2001-2007 (in Pg C yr −1 ).Europe and North America) under this circumstance are irrational compared to previous studies.This may be caused by the larger error in soil carbon estimate of China in BEPS.Then we try to reduce the SD for the other regions and test the values of 0.0707 (i.e., variance of 0.005) and 0.0316 (i.e., variance of 0.001).The results indicate that the setting of 0.0316 for regions outside China and 0.1 for China can get a more reasonable pattern of flux.Therefore, we use the variance of 0.01 for the scaling factors corresponding to grids in China and 0.001 for the rest of the globe.

Table 2 .
Comparison of the optimized carbon sinks in this study with the "Global Carbon Budget 2013 v2.3" (in Pg C yr −1 ).

Table 3 .
The uncertainties of optimized fluxes for the globe, land and ocean by GCAS-DOM (in Pg C yr −1 ).

Table 4 .
Comparison of the long-term mean optimized carbon fluxes by GCAS-DOM with previous studies during 2001-2007.