Drivers of interannual variability in Net Ecosystem Exchange in a semi-arid savanna ecosystem , South Africa

Drivers of interannual variability in Net Ecosystem Exchange in a semi-arid savanna ecosystem, South Africa S. Archibald, A. Kirton, M. van der Merwe, R. J. Scholes, C. A. Williams, and N. Hanan CSIR Natural Resources and Environment, P.O. Box 395 Pretoria 0001 South Africa School of Geography, Clark University, USA Natural Resources Ecology Lab, Colorado State University, USA Received: 8 July 2008 – Accepted: 16 July 2008 – Published: 22 August 2008 Correspondence to: S. Archibald (sarchibald@csir.co.za) Published by Copernicus Publications on behalf of the European Geosciences Union.


Introduction
Carbon dioxide flux measurements using the eddy covariance technique generate a raw dataset with a very high temporal resolution (generally 10-20 Hz).The first step in the analysis of these data is to screen them for spurious values, perform various corrections, and then integrate the fluxes over periods of about 30 min.The half-hourly data provide important insights into many short-term physiological processes, but most ecological and management-relevant questions are framed over even longer timeframes -from days to years.A matter of particular interest to both ecologists and ecosystem managers is the inter-annual variability of primary production and carbon storage (Lauenroth et al., 2006).Semi-arid savannas are characterised by high inter-annual variability, in response to highly variable rainfall.This underlies many features of their ecology, including the likelihood and intensity of fires, the growth and migration of animal populations, and the stability of the tree-grass mixture (Higgins et al., 2000;Tyson, 1986;Reed et al., 1994;Ma et al., 2007;Serneels et al., 2007), and makes savanna systems particularly hard to manage.
Accumulating 30 min flux measurements to longer time periods is not a simple matter of adding them up, for two main reasons.The first is that even the best-run eddy covariance datasets have gaps, due to instrument failure or weather conditions that cause the eddy covariance flux assumptions to be violated.The second is that the eddy covariance measurement, net ecosystem exchange (NEE), is often not what is needed by ecologists who are often more interested in its components, gross primary production (GPP) and ecosystem respiration (R eco ): S. A. Archibald et al.: Inter-annual variability in NEE NEE=GPP+R eco (observing the convention that fluxes from the atmosphere to the ground are given a negative sign).
A model is used to bridge the data gaps in what is intended to be an unbiased fashion.The same or different models can be used to deconvolve the NEE signal into its components.A wide range of standard procedures have been developed for this process, largely for application in temperate ecosystems (Falge et al., 2001;Papale et al., 2006;Moffat et al., 2007).These are not always appropriate for tropical wet-dry systems.They use phenomenological models, neural networks or process-based models to achieve their objectives.The readily-available ones do not work well for data from semiarid sites in Southern Africa.This is because they assume the major controls on flux processes to be solar radiation and temperature, whereas temperatures in the semi-arid tropics are almost always warm enough to permit physiological activity, and insolation is sufficient, at least during non-cloudy days, for light saturation of part or all of the typically-sparse canopy.In arid and semi-arid systems, the main control on the rate and duration of many ecosystem processes is soil moisture.
As a further complication, in low-rain, high-evaporation ecosystems, where the soils dry out between successive rainfall events (so-called pulse-driven systems), the various terms in the carbon budget are highly dependent on the recent history of the system (Huxman et al., 2004).For example, following a rainfall event, respiration increases rapidly whereas it takes several days for the ecosystem to reach maximum photosynthesis (Huxman et al., 2004;Xu et al., 2004).Similarly, the magnitude of the system response depends not only on the size of the current rainfall event, but on the amount and timing of preceding events: after a long drought the response to a rain event is larger than to a similarsized event during the middle of the rainy season, but the time taken to reach the peak response is longer (Veenendaal et al., 2004).Therefore, it is not possible to use instantaneous measures such as the soil moisture content as a sole proxy for the state of the system.Gap-filling requires consideration of indices that have "memory": for instance, accumulators of water deficit.
Moreover, "phenomenological" models will only be appropriate when they truly represent the underlying responses (Falge et al., 2001).Most current respiration models define the relationship between respiration and temperature using an exponential-or logistic-shaped function; i.e. functions that either continually increase, or level off at a maximum value (Moffat et al., 2007).These models were developed in systems where temperature ranges are generally below 30 • C (Fang and Moncrieff, 2001;Lloyd and Taylor, 1994).Physiologically, respiration is expected to decrease once temperature exceeds the optimum for microbial activity (Yamano and Takahashi, 1983), and photosynthesis shows a similar reduction at high temperatures (Hamerlynck and Huxman, 2000).In tropical dry systems, the soil temperature in the top centimetres often exceeds 40 • C. Thus more appropriate functional forms need to be developed before current gapfilling methodologies can be applied globally.
Improving functional relationships to include extreme conditions would also be valuable in the context of climate change.In coming decades, many ecosystems around the world are likely to be exposed to higher temperatures and reduced moisture availability.Information on ecosystem responses to high temperatures and intermittent droughts will be valuable in predicting responses to these changes.
We present a statistical approach to estimating annual NEE for a semi-arid savanna system in Southern Africa.We tested the importance of six environmental drivers of daily photosynthesis (GPP) and respiration (R eco ) at the Skukuza flux tower in the Kruger Park (25.02 • S, 31.50 • E).Predictors commonly used in temperate systems were included, together with a range of environmental predictors chosen to reflect the effect of pulsed rainfall events.Predictive models were then used to interpolate annual fluxes over a 25 year time period, and to investigate the degree and possible causes of inter-annual variation in CO 2 exchange.
Our approach was motivated by the fact that there was a limited amount and duration of flux data (spanning six years with many gaps, which is too short for a reliable estimate of variance), but that a full time series of daily meteorological and phenological data was available for a 25 year period.Working at a daily time-step allowed us to bridge the gap between the half-hourly flux data and the crucial annual timescale, and to use the long-term meteorological data to estimate inter-annual variability.Process-based modelling would be ideal for these systems where previous conditions affect the response of the system to perturbation, but we chose to limit ourselves to a statistical analysis, given our imperfect understanding of the processes driving NEE in these systems.Results from this research will be used to develop more process-based models.
This paper aims to: -Document new procedures for eddy covariance gapfilling that are appropriate for dry, hot ecosystems; -Explore the factors associated with short-term (daily) variation in NPP, GPP and R eco ; -Calculate annual estimates of NEE and explore the main factors driving inter-annual variation in savanna carbon exchange at the Skukuza flux site in South Africa.

Study site
A flux tower situated in a semi-arid savanna near Skukuza, in the Kruger National Park has been collecting data since February 2000.The site is 370 m above sea level with strongly seasonal rainfall occurring between November and April.Mean annual rainfall is 550 ±160 mm.The landscape is gently undulating, consisting of broad-leaved Combretum apiculatum-dominated savanna on the coarse sand crests and fine-leaved Acacia nigrescens savanna on sandy clay loam in the valleys (Scholes et al., 2001).The soils are about 0.6 m deep.The eddy covariance flux tower is situated at the ecotone between the two vegetation types.
The woody vegetation reaches 8-10 m in height and the flux sensors are at 17 m, giving the tower a footprint of about 500 m.The vertically projected tree canopy cover in this area is about 30% and woody basal area is 7 m 2 ha −1 .The grass layer is dominated by Panicum maximum, Digitaria eriantha, Eragrostis rigidor, and Pogonarthria squarrosa.
The tower is instrumented with a sonic anemometer (WindMaster, Gill Instruments Ltd., Lymington, UK) measuring wind velocity in three dimensions and a closed-path infrared gas analyzer (LI-6262, LI-COR Inc., Lincoln, NE, USA) measuring water vapour and CO 2 concentration.The raw high frequency (10 Hz) data was processed following (Lee et al., 2004) to produce half-hourly measures of abovecanopy turbulent fluxes of sensible heat, water vapour, and carbon dioxide.Heat and mass fluxes were calculated based on conventional equations and corrections (see e.g.Moncrieff et al., 1997;Aubinet et al., 2000) and all fluxes are reported as positive upward from the land to the atmosphere.Canopy storage flux was estimated from the half-hourly time derivative of a 16 m column integral based on CO 2 concentrations measured at 0.75, 2.0, 3.5, 5.25, and 16 m, and added to the above-canopy turbulent flux for data analysis.Incoming and outgoing long-and shortwave radiation was measured with a net radiometer, (NR Lite, Zipp & Zonen B.V., Delft, The Netherlands), with the incoming and outgoing shortwave radiation measured with pyranometers (CM 21, Zipp & Zonen B.V., Delft, The Netherlands) mounted at 22 m.
Average half-hourly volumetric soil (θ) water content was estimated with 15 cm long frequency domain reflectometry sensors (CS616, Campbell Scientific Inc., Logan, UT, USA) installed horizontally at soil depths of 3, 7, 16, 30, and 50 cm in the clayey Acacia -dominated soils downhill of the tower, and 5, 13, 29, and 61 cm in the sandier Combretum -dominated soils uphill.Rainfall per half hour was measured with a tipping bucket rain gauge (TE525, Campbell Scientific Inc., Logan, UT, USA) located on the tower top, along with other standard meteorological variables such as air temperature and humidity, wind speed and direction.

The effect of the ecotone
The differences in soil properties and species composition above and below the seepline were expected to be apparent in the flux data from the tower.To test this we separated the half-hourly fluxes into predominantly broad-leafed and predominantly fine-leafed (based on the wind direction (Fig. 1).The only observable difference between the two vegetation  types was a slightly higher night time carbon flux in the broad-leafed savanna during the dry months -this was not considered significant enough to justify separating the fluxes into two datasets with the consequent reduction in sample size.Kutsch et al. (2008) similarly notes that the data "show no significant differences between the savanna types in terms of fluxes".Whether this was due to a lack of ability to differentiate between fluxes from the two sites, or because at the landscape level the differences are not significant, we chose to complete the rest of the analysis using all flux data as one unit.We used a model to create an integrated site-level soil moisture record (For soil moisture modelling methods see Archibald and Scholes, 2007).See Appendix A for a calibration of measured vs modelled soil moisture.

Data processing and gap filling
Flux data were available from February 2000 to December 2005 (the site continues to operate, but with an open-path IRGA).Of the half-hourly data, 41% was missing, which is slightly more than the average among flux sites of 35% (Falge et al., 2001).As rainfall occurs during summer months of November to April the flux data were summarised by rainfall years (July to June) which provided five full years of flux data -with data coverage ranging from 30-74 % annually.Most of the data gaps were for a single half hour interval, but instrument failure due to lightning strikes resulted in six gaps of over two months duration, usually occurring during summer periods.These large, non-random gaps limit the types of gap filling approaches that can be used.When a u * filter of 0.1 m/s was applied to eliminate periods of low turbulence during which eddy covariance measurements are unreliable (Goulden, 1996), the missing flux data increased to 49%.Linear interpolation was used to fill gaps <2 h in duration, which reduced the data gaps to 44%.These half-hourly data were then summed to calculate daily NEE values for all days with unbroken 30-min time series.The result was 698 days of NEE data.These days were not randomly distributed through the year, with the rainy months (particularly December and January) having many fewer data points than the dry months of June through September (Fig. 2).Dry, winter conditions are therefore over-represented in the sample.In addition, one of the periods of most continuous and cleanest observations spans an intense drought, 2002-2003 growing season, further biasing results.
Simple gap-filling techniques using mean daily averages are inadequate for filling gaps in the Skukuza data because the stochastic and variable NEE response over the course of a wetting event would not be well represented by a summary value and because gaps in the data often span several weeks.Non-linear regression methods work well when there is just one main driver of carbon uptake or release (in temperate systems, temperature is normally used to drive respiration, and APAR to drive photosynthesis (Moffat et al., 2007).However, the presence of multiple drivers at the Skukuza site means that single-parameter non-linear methods are unlikely to be sufficient.
Similarly, Marginal Distribution Sampling (MDS: Reichstein et al., 2005) uses a look-up table approach which fills gaps by taking the average value for data collected under similar meteorological conditions within a certain window of the missing data.In this way it accounts for temporal auto-correlation as well as co-variation with meteorological drivers.However, when there are long gaps this method breaks down, and the choice of "similar" meteorological conditions requires that the appropriate hydrological indices are considered in the model -current implementations use only radiation, temperature, and vapour pressure deficit (http://gaia.agraria.unitus.it/database/eddyproc/).
We used artificial neural networks (ANN) as our gapfilling approach, as this method accommodates non-linear relationships between variables but requires few a priori assumptions on the relative importance of different variables or their functional relationships.The usefulness of ANNs depends entirely on the appropriate selection of input variables -and we hoped to improve on standard methods available by choosing variables which would reflect the pulsed response to soil moisture in arid systems.We also ran standard multiple linear regression models on the data to explore interactive effects between the variables.This approach allowed us to investigate the important drivers of NEE, as well as develop models which could be used for prediction using long-term meteorological data.

NEE, photosynthesis, respiration
Half-hourly night-time fluxes were used to estimate the daytime respiration.A stricter u * threshold of 0.25 ms −1 (Kutsch et al., 2008) was used for this analysis, as it was more important to have reliable data than large sample sizes.Respiration is controlled by temperature, which generally varies quite predictably over the course of a day, as well as variables such as soil water content and the amount of actively photosynthesising leaf material, which are relatively constant over a single day, but vary over longer time scales.We therefore took a two-scale approach to determining day-time ecosystem respiration: we derived a temperature response curve by fitting it to "optimum" respiration conditions -i.e. the maximum values measured at a range of temperatures (all valid halfhourly night-time fluxes were used for this).This curve was used to estimate the maximum potential respiration rate for each daylight interval, using the daytime temperature trend as input (see Appendix B for more details on this method).The actual respiration during any particular day was then estimated as the temperature-driven "potential" scaled by the ratio of observed night-time respiration to the potential nighttime respiration for that day.By multiplying the temperature response function of R eco by the scaling parameter, the estimated respiration values are shrunk towards the mean respiration value for that day, thereby resulting in a day-specific temperature response function for R eco .This scaling factor was assumed to account for the effects of soil moisture and physiological activity.Unlike the flux-partitioning method of Reichstein et al. (2005) this method does not require a separate temperature response function to be derived for each day.
Conventional exponential (e.g.Arrhenius, Lloyd-Taylor) temperature functions were not considered appropriate representations of the response functions, as day-time temperatures at the site often exceed that which is optimum for microbial activity (Yamano and Takahashi, 1983).An analysis of independently-collected respiration data from the site, collected using soil chambers, indicated that a generalised Poisson temperature relationship produced the best fit to measurements of soil respiration (Kirton et al., unpublished data).
We therefore used the following equation to describe the optimal temperature response: where values in brackets represent the standard error of the estimate.Only days when there were more than three valid night time flux values with which to estimate the scaling parameter were used to interpolate day-time fluxes.See Appendix B for details on this method and a comparison with other methods.
Negative night time fluxes were excluded from the model fitting, as there was no theoretical justification for negative respiration.Interpolated respiration values that dropped below zero (which can occur at very high or low temperatures, using the parabolic curve) were given a value of zero.This method produces predicted respiration values with similar distributions to those recorded for all conditions of soil moisture and f APAR (Fig. 3).
Daily respiration (R eco ) values were obtained by calculating a half-hourly value (multiplying the per second value by 60 × 30) and summing this over the 48 half-hours.All other daily values were calculated in the same way.Daily Gross Primary Production (GPP) was calculated by subtracting the interpolated day-time respiration values from the recorded daytime NEE values, and summing over the daylight hours.This resulted in a dataset with 372 valid daily records for R eco and 529 for GPP.

Drivers of NEE
In temperate systems incoming solar radiation (PAR) and temperature are the main drivers used to predict photosynthesis and respiration.In some models these are modified by measures of LAI and soil moisture (Moffat et al., 2007).We chose to test six input variables as predictors of GPP and R eco (see Table 1).
Only data that could be derived from standard daily South African Weather Services (SAWS) climate records or longterm low-resolution satellite vegetation indices were used as input predictors, in order that the models could be used in conjunction with the long-term records to estimate NEE over periods much longer than the eddy covariance data would permit.The daily time-course of temperature variables was estimated from daily maximum and minimum air temperature.Soil water content was modelled using a simple bucket model and Penman-Monteith evapo-transpiration functions (Archibald and Scholes, 2007).The half-hourly meteorological data available at the flux tower was used to validate these models (see Appendix A).
Three different measures were used to indicate the hydrological state and history of the ecosystem: Relative Available Water Content (RAWC); water deficit (a function which accumulates the deficit for all days of water stress θ <θ crit until rewetting occurs); and time since wetting (the time since the last big wetting event -i.e.time since θ increased above θ crit ).Equations for these indices can be found in Table 1.Mean air temperature -which correlates well with soil temperature (Appendix A) -was used as the predictor of R eco , whereas mean daytime temperature was used at the predictor for GPP.The European Joint Research Centre 10-day f APAR product (Pinty et al., 2002)  derived NDVI (the "GIMMS data", Tucker et al., 2005) and f APAR was used to define the daily f APAR input for the period before 1998 which was when the Joint Research Centre (JRC) dataset started (see Appendix A).

Modelling approach
Two different artificial neural network (ANN) methods were tested: Generalised Regression Neural Network (GRNN) and Multi-Layer Feed Forward Neural Network (MLF).The GRNN is based on a kernel smoothing approach and has the advantage of using non-parametric regression procedures (which makes no assumptions about the underlying data) and can be trained quickly as only the smoothing parameter needs to be estimated and optimised.As has been found in other studies (Cigizoglu, 2005;Currit, 2002;Kisi, 2006) this method is efficient for modelling non-linear systems and worked as well as the more traditional MLF, which required excessive fine-tuning to optimise the system architecture.Three separate models were developed for predicting R eco , GPP, as well as daily NEE.Models were developed using 80% of the data for training and 20% for testing (proportions of 70-30% were also tried, without substantially changing the results).
Multiple linear regression (MLR) equations with up to three-way interactions were examined for both photosynthesis and respiration.A combination of backward selection and stepwise selection was used to obtain significant predictors in the model.The ability of the MLR to explore the importance of different variables separately and in combination added value to the results of the ANN.However, there are strong theoretical reasons against using ordinary least squares (OLS) regression for data-filling (Richardson and Hollinger, 2005), which is why we restricted their use to exploring the relationships between variables.Many of the meteorological variables, at least over a certain range, are expected to have a near-linear relationship with respiration and photosynthesis.Temperature is an exception: therefore quadratic terms of temperature were also included during the model selection process.

Error estimation
The random error component of the total error in the daily carbon fluxes was considered in an attempt to obtain a confidence interval for the annual estimates of NEE.The systematic component of the error was not assessed for this paper, but this analysis will be carried out at a later stage.To estimate the random error, the method described by Richardson et al. (2008) was used, where the model error was used as a surrogate for the random error.The error of the daily ANN model prediction (difference between the observed and modelled daily fluxes) was calculated for all cases where the observed daily fluxes were available.The distribution of these errors fitted a Laplace distribution better than a normal distribution (Chi-squared tests for goodness of fit were χ 2 =37.37 compared with χ 2 =111.01 for the normal distribution).Richardson et al. (2008) also found the errors in halfhourly flux data to be distributed according to the Laplace distribution.
We assumed the daily errors were independent and identically distributed.This allowed us to use the Central Limit Theorem to assume normality for the annual sum of the errors in the fluxes.The expected value of the errors was assumed to be zero and the variance of the errors was estimated by the sample variance.The approximate standard error for the annual estimates was then calculated to be 12.9 g Cm −2 year −1 , leading to coefficients of variation from Biogeosciences, 6, 251-266, 2009 www.biogeosciences.net/6/251/2009/Maximum CO 2 sequestration occurs when soil moisture is low but green leaves are still present.Wet conditions were defined as periods when the soil moisture was greater than 9% volumetric water content, dry conditions, less than 6%.Periods with green leaves were defined as periods when the f APAR value was greater than 0.2.The average number of days each year for each combination of physiological and soil moisture conditions are shown, together with the average daily sum of NEE (g C/m 2 /day) for these conditions.
8-30%, and therefore the error in the annual NEE estimates is 25.3 g Cm −2 year −1 with 95% confidence.This agrees with the estimate of random error obtained by Richardson and Hollinger (2005), where they used the Monte Carlo simulation to estimate the error in the model parameters and model estimates.Goulden et al. (1996) and Oren et al. (2006) both reported instrument error of approximately 5% for closed path eddy covariance systems.If the same instrument error can be assumed for the Skukuza data, this increases the error value by between 4.1 and 15.5 g Cm −2 year −1 .

Carbon balance
The diurnal time-course of NEE is highly responsive to soil moisture and the presence of green leaves (Fig. 4).Interestingly, maximum CO 2 uptake occurs during periods of low soil moisture when green leaves are still present, because under these circumstances the contribution of soil respiration is low, but a substantial amount of photosynthesis is still occurring using water stored in the plant, or accessed from deeper soil layers that do not contribute much to ecosystem respiration.predict than photosynthesis, and the linear models performed badly in predicting R eco (r 2 of 0.41, MAE of 0.85 g/m 2 /day).
The ANN identified available green leaf material (indexed by f APAR ) to be the most important predictor of both R eco and GPP, but f APAR was relatively more important for predicting GPP than for predicting R eco , as would be expected (Table 3).We interpret the role of f APAR in driving R eco as reflecting the availability of readily-respired sub-strate.For GPP the time since wetting event was the next most important predictor, which corroborates findings of Wohlfahrt ( 2008) and Xu et al. (2004) that there is a delay in the pulse of photosynthetic activity after a rainfall event.In terms of water relations, soil moisture content was the best predictor for R eco , but water deficit and time since wetting were also identified as important.Interestingly, temperature did not prove to be important in predicting either respiration or photosynthesis.This could reflect the daily time-step at which we did the analysis -in this sub-tropical system temperature variation between days and over the growth season is much less important than variation in leaf dynamics and soil moisture in driving NEE.
For respiration models using MLR, f APAR and time since wetting were the most significant single predictors (Table 4).Interactions between various soil moisture parameters and f APAR also significantly improved the fit of the respiration model.As can be seen in Fig. 4, the effect of a parameter like soil moisture greatly depends on the amount of photosynthesising green leaf material, so it is unsurprising that these interaction terms are important.
In the photosynthesis model soil moisture was very significant, and three-way interactions between f APAR , soil moisture, PAR, and time since wetting were important in improving model fit.The importance of the interactive terms perhaps goes some way to representing the delayed photosynthetic response to wetting events identified by Xu et al. (2004).It usually takes 5-7 days in this system before photosynthesis reaches its maximum after a wetting event, and this response depends on how much leaf material is present.Temperature was included in both the GPP and R eco models as it produced significant interactions with other variables, but as a main effect it was not significant.The ANN net ecosystem exchange model had the lowest error (Table 2), so this model was used to gap-fill the six year dataset .

Inter-annual variability
Annually-integrated net ecosystem exchange varied from −138 to +155 gC/m 2 /y over the 5 year period for which there were flux data (Table 5).In drought years limited carbon uptake occurs even during the height of summer, but in years with above average rainfall the site can be a sink of carbon for several months of the year (Fig. 5).Only two of the five years had negative NEE (in other words, were net carbon sinks at the annual timescale).It is possible that our gap filling methods over-estimate the amount of respiration occurring at this site: there was very little data available during the summer months (Fig. 2), so the model was probably not well trained to identify days of maximum GPP in this system.To test this we will need to acquire a more extensive summer dataset for this site.Estimates of random error suggest that years where predicted annual NEE was within ±25 gC/m 2 /y might actually have been close to carbon-neutral.If systematic error is included, this error estimate increases further to up to approximately 40 gC/m 2 /y.When the 25 year NEE sequence is predicted the pattern becomes more obvious (Fig. 6).The site was predicted to be a net sink for carbon in only 6 of the 25 years, but three other years (1989, 1996, and 2000) may have been near-sinks.The data give a long-term mean annual NEE of 75(±105) g C/m 2 /y.Loss of a cohort of aging Acacia nilotica trees at the site, and increased stem damage with increasing elephant populations over the last 20 years might both contribute to making this site appear as a net source in this analysis.Recent field data at the site record high rates of tree turnover: 8%(±3%) per annum -with damage by elephants and senescence of old Acacia nilotica trees being the main cause (Archibald, unpublished data).These turnover rates are high, but not exceptional for southern African savannas (Shackleton, 1997), and it is perfectly feasible that tree growth could match these losses.Therefore, it would be precipitous to speculate further on the implications of the long-term predictions until there is better information on tree productivity, and more peak-growing season flux data with which to calibrate the models.
The most useful information provided by the long-term prediction are estimates of the inter-annual variation for this site.Figure 7a indicates that there is a strong relationship between predicted annual NEE and absorbed photosynthetically active radiation (APAR, which is PAR * f APAR ).This analysis suggests that once annually accumulated APAR exceeds about 675 MJ/m 2 , the system becomes a sink for carbon (Fig. 7a).
It might seem surprising that soil moisture, which was so important at a daily time scale, does not show a stronger relationship with annual NEE.Even when photosynthesis and respiration are considered separately (Fig. 7b, c), by far the best relationship is found with APAR.This result makes sense when one considers that both the ANN and the MLR analyses showed strong interactive effects of soil moisture with f APAR -i.e. the effect of available soil moisture in driving Pn and R eco depends heavily on the amount of photosynthetically active green leaf material.Similarly, soil moisture has been shown to be an important driver of seasonal patterns of leaf display at the site (Archibald and Scholes, 2007).Therefore f APAR can be seen as an integrated measure of Table 5. Summary of NEE over the 5 year period for which there was flux data.Negative values represent an overall sink of carbon.Data gaps were filled using an ANN and predictors f APAR , water deficit, relative soil moisture content, mean day time temperature, time since wetting, and mean soil temperature, in that order of importance.Also reported are annual summaries of rainfall, available photosynthetically active radiation, length of the growing season, and number of growth days (days when soil moisture content is greater than θ crit , 7% by volume).

Rainfall year
Annual hydrological conditions at the site, which is better at predicting annual-scale carbon exchange than any measure derived from short-term measurements of daily soil moisture.For example in the 2003-2004 rainfall year the total annual rainfall was above average (618 mm) but it was heavily skewed towards the last part of the growing season, and the start of grass growth was delayed by several weeks.In this instance integrated values of APAR would represent the growing conditions for a season better than total rainfall, or even number of growing season days.

Other pathways of carbon loss from the system
A savanna carbon budget would be incomplete without a consideration of fire and herbivory.The fluxes of CO 2 to the atmosphere via these two pathways have not been directly measured at the Skukuza site, but can be inferred and constrained from other data.The abundant large mammalian herbivore (>5 kg body mass) community in this landscape consists of 14 species, mostly Bovidae.The combined herbivore biomass is 3155 kg km −2 (Scholes et al., 2004).Taking into account the effect of body mass on metabolic requirements and digestability, this translates to a herbivore respiratory flux of 4.5 g Cm −2 y −1 and a flux from the decomposition of dung of 5.0 g Cm −2 y −1 .The uncertainty range associated with these estimates is unknown, but thought to be around 20%, related mostly to errors in game census.The inter-annual variability is thought to be relatively low.The herbivore respiration and dung decomposition fluxes are subsumed in the ecosystem respiration measured by the eddy covariance system (Table 6).
The mean fire return time in this landscape in the KNP is 4.2 years (Van Wilgen et al., 2000).The most comprehensive set of fuel measurements for this landscape was taken in August 1992 at 10 locations within 30 km of the Skukuza site (Shea et al., 1996).The combusted material was predominantly dry grass (1442±975 SD kg ha −1 ), tree litter (1452±636 kg ha −1 ) and a contribution from dead wood (226±194 kg ha −1 ) giving a total of 3120±1795 kg ha −1 .A multi-site, multi-year mean grass fuel load for the KNP is 3359 kg ha −1 , with a range of 1152-6728 (Trollope and Potgieter, 1985).The emission factor for CO 2 , measured for the same fires as the above fuel loads (Ward et al., 1996) is 1699±33 gCO 2 kg DM −1 .Therefore, the long-term annualised emission of CO 2 through fire is around 136±58 gCO 2 m −2 y −1 .An additional 6.4±3.9 gCO m −2 y −1 and 0.2±0.2g CH 4 m −2 y −1 are also emitted from fires, so the total pyrogenic carbon loses are around 40.0±17.5 g Cm −2 y −1 (Table 6).The flux site has burned five times since 2000 1 , which suggests that the pyrogenic emissions during this period are probably about twice the long-term, landscape-scale averages calculated above.The pyrogenic fluxes are in principle part of ecosystem respiration, but in practice are not measured by the eddy covariance system because they occur briefly, and during that period exceed the measurement range of the infra-red gas analyser.The inter-annual variability is high because a given site does not burn at all in most years, and the fuel load varies greatly in the years when it does burn, in response to the variability of rainfall in the preceding season.The total carbon budget -including fire, herbivory, and plant growth and decomposition -for the skukuza site is therefore 115.0±122.5 g Cm −2 y −1 .S. A.

Conclusions
Inter-annual variability in carbon exchange at the Skukuza flux site is on the same scale as an oak savanna in California.In a seven year study Ma et al. (2007) measured values from −155 to −56 g Cm −2 y −1 in the savanna and from −88 to 141 g Cm −2 y −1 in an adjacent grassland.This compares to −138 to 155 g Cm −2 y −1 from the six year Skukuza dataset.The variability at Skukuza seems to be largely controlled by variations in the length of time that green leaves are displayed by the trees and grasses, and by changes in seasonal patterns of water availability (Fig. 7) -both ultimately driven by variations in rainfall between years.
The flux-partitioning and gap-filling procedures developed in this paper are a distinct improvement on standard methodologies largely because they use more appropriate temperature-response functions and explicitly include a soil moisture control, including indices of the wetting history.Estimates of annual CO 2 flux obtained through gap-filling using an ANN may be slight over-estimates (i.e., slightly biased toward the sink side), because of the paucity of peak growing season flux data.However, it is also possible that this particular savanna site has been a carbon source in recent years due to high tree turnover.Results of the ANN gap-filling procedures and MLR models indicate a large degree of interaction between driver variables and lend support for the development of a process-driven model for this system.Such a model would need to include explicit measures of leaf mass, soil moisture and temperature.
The generalised Poisson function used here to fit an optimum temperature response curve is an effective method for extrapolating day-time respiration in systems where temperatures often exceed 30 • C -provided a scaling factor is used to control for the co-limiting factors of LAI and soil moisture.At a daily to seasonal level, however, temperature was shown to be less important than other factors in influencing NEE.

Comparison of meteorological data
Correlation between the flux tower variables and corresponding variables from other sources appears in Table A1.Strong linear relationships exist between the flux tower daily measurements for the mean soil temperature and the mean daytime temperature and the corresponding temperature variables derived from the minimum and maximum daily temperatures of the South African Weather Services (SAWS) data.There is also a strong linear relationship between the measured mean soil moisture and the modelled soil moisture using the SAWS data, as well as a fairly strong linear relationship between PAR derived from the shortwave radiation from the flux tower and the modelled PAR.The correlation between the flux tower rainfall and the SAWS rainfall is significant, but not as strong as that of the previous comparisons to SAWS derived variables.The peaks of the environmental data are usually slightly higher than recorded from the flux tower, although there are few days when the flux tower recorded higher values.This could be due to localised rainfall events.Peaks in the data do not always correspond and this could be due to the measurements from the SAWS data being taken daily from a rain gauge, whereas the flux tower took instantaneous measurements of rainfall.Therefore daily rainfall events may not always correspond exactly.The pattern of rainfall over time appears to match for the two data sets.The annual sum of rainfall for the environmental data is always more than that for the flux tower data (Table A1).This is due to missing data from the flux tower.
There is a strong linear relationship between Gimms NDVI and f APAR (Table A2).Therefore a linear regression equation was derived to describe this relationship.The linear regression obtained a r 2 -value of 0.71 and an MAE of 0.05.
The standard error for the intercept is 0.004 and the standard error for the slope is 0.009.

Appendix B Interpolating day-time respiration
Fitting an optimal temperature function to the mass of nighttime flux measurements involved making several assumptions about a) the shape of the temperature-respiration curve, and b) the values to use to fit the curve.

B1 Shape of the temperature-response curve
Field data indicate that a generalised Poisson function is the best descriptor of the effect of temperature on respiration, as it describes both the exponential increase of respiration with temperature and the sudden decrease once the temperature optimum has been reached (Kirton et al., unpublished data).However, for this analysis we also tried a simple parabolic function.

B2 Values used to fit the curve
This interpolation method relies on deriving a curve that represents the temperature response under a certain set of environmental conditions.Any deviation from this line by an observed point is then assumed to be due to different environmental conditions.The curve can be pulled up and down to match this point, and thereby adjust for these varying environmental conditions, by the use of a scaling parameter.Missing respiration values (day time points) can then be interpolated on this day (because the environmental conditions other than temperature are going to remain stable at a daily time step) by using the temperature at each point and the adjusted temp/resp equation.
With this in mind, extracting the points to be used could be done in a number of different ways.The easiest way to identify points where all factors other than temperature are constant would be to identify the maximum points for each temperature value (which would represent respiration under completely optimal conditions of soil moisture and LAI).We tried three different methods for extracting these values: manually picking the maximum respiration values, calculating the maximum respiration value for each degree temperature change, and calculating the 95th quantile for each degree temperature change (Fig. B1).We also tried manually picking values at the top of the thickest part of the cloud of respiration points.This approach would exclude any extreme outliers but could also be assumed to represent the same set of other environmental conditions.Because the curve is adjusted up and down based on the respiration values on the day in question, the position of the curve on the y-axis is unimportant.It is the shape of the curve that will affect the interpolation.
Using the 95th quantile was not satisfactory as some temperature categories had orders of magnitude more respiration measurements than others.We therefore abandoned that method and tested six different respiration interpolation methods (Table B1): manually selected maximum points (fitting a parabolic and generalised Poisson), manually selected points at edge of data cloud (parabolic and GDP), and calculated maximum points (parabolic and GDP).

B3 Results
Results indicate that the interpolated values are very resilient to the method used to fit the temperature response curve.The distribution of interpolated points was similar for all six methods (Fig. B2), and linear regression models show similar fits to the observed respiration data (Table B1).A visual assessment of the interpolated points (Fig. B3 Table B1.The six different methods used to fit a temperature response curve to the measured night-time (respiration) fluxes.Two different fitting functions were used, and three different methods for identifying points to fit the curve to.The distributions of the data interpolated with each method were very similar to each other (Fig. B2), and fell well within the bounds of the observed respiration data (Fig. B3).   that the generalised Poisson interpolations fell more clearly within the main data cloud.We therefore chose to use the calculated maximum value method fitted to the generalised Poisson distribution.

Fig. 1 .
Fig. 1.Showing monthly CO 2 (Fc) and H 2 O (LE) flux at the site for the two main vegetation types (calculated from the wind directions).Fc fluxes are separated into daytime and night-time fluxes.The wind rose shows the dominant wind direction, as well as the area of the footprint which is Acacia savanna (dark shaded area), Combretum savanna (open area) or intermediate (light shaded area).

Figure 2 Fig. 2 .
Figure 2 Fig. 2. Seasonal distribution of valid NEE data points from a sixyear long (2000 to 2005) dataset at the Skukuza flux tower.

Fig. 3 .
Figure 3 722 723 Fig. 3. Distribution of observed (black) and interpolated (red) halfhourly respiration values over temperature.Data are presented for all conditions, for periods of low soil moisture, for periods with little leaf material (low f APAR ), and for conditions of low soil moisture and f APAR .Interpolated values lie well within the distribution of observed values for all conditions.It is also clear that respiration drops off at high temperatures, and that temperature-response functions need to include this reduction at high temperatures if they are to be appropriate for this site.

Fig. 4 .
Fig. 4.Daily time-course of NEE averaged over 5 years of measurements and for six combinations of environmental conditions at the Skukuza flux site.Maximum CO 2 sequestration occurs when soil moisture is low but green leaves are still present.Wet conditions were defined as periods when the soil moisture was greater than 9% volumetric water content, dry conditions, less than 6%.Periods with green leaves were defined as periods when the f APAR value was greater than 0.2.The average number of days each year for each combination of physiological and soil moisture conditions are shown, together with the average daily sum of NEE (g C/m 2 /day) for these conditions.

Fig. 5 .Fig. 6 .
Figure 5 732 733 Fig. 5. Annual time course of NEE for two consecutive years (a dry year and a near average year) at the Skukuza flux tower.Red line represents measured daily NEE, blue is modelled using an artificial neural network and inputs of f APAR , soil moisture, temperature, time since wetting, and water deficit.
Relationship between annual NEE (a) R eco (b) and GPP (c) and four potential drivers of inter-annual variability in carbon uptake: annual rainfall, available photosynthetically active radiation, length of the growing season, and number of growth days.Annual rainfall seems to be the least significant, compared with parameters that include seasonal variation in leaf display (APAR and length of growing season), and the seasonal distribution of rainfall.Solid circles represent years 2000-2005 for which flux data were available to constrain the model.

Fig. B1 .
Figure B1: Showing the six temperature response functions fitted to the half-hourly night 896 time fluxes (respiration).Plot A shows the parabolic functions fitted over the manually 897 selected maximum points (top function), the automatically selected maximum points (middle 898 function) and the manually selected top of the data mass (bottom function).Plot B shows the 899 Generalised Poisson function fitted over the same three selections of points.900 901 902 Figure B2: Showing the distribution of the respiration data interpolated using six different 905 methods (solid points: median values, box: ± 25 % quantiles, bar: data range).The median 906 and ± 25 % quantiles are very similar for each method, but the method that calculates the 907 fitted values had slightly lower maxima than the other two methods.All data are well within 908 the range of measured Reco values (u*-corrected half-hourly night-time fluxes).909 910 911 912 913 914 915 ) indicates www.biogeosciences.net/6/251/2009/Biogeosciences, 6, 251-266, 2009

Figure B3 :
Figure B3: The distribution of measured half-hourly night-time fluxes (black circles) and interpolated half-hourly respiration (red crosses) along a temperature axis.Interpolated fluxes represent all half-hour values which had soil temperature data and at least three night-time fluxes to estimate the scaling parameter.

Fig. B3 .
Fig. B3.The distribution of measured half-hourly night-time fluxes (black circles) and interpolated half-hourly respiration (red crosses) along a temperature axis.Interpolated fluxes represent all half-hour values which had soil temperature data and at least three night-time fluxes to estimate the scaling parameter.

Table 1 .
Defining the six input variables used in the models to predict GPP and R eco .All input variables were derived from data available at a daily level from the SA Weather Services, so they could be used to produce long-term predictions.Relative available water content (RAWC) is calculated as in the formula below where θ = volumetric soil moisture content, WP = wilting point and FC = field capacity.Accumulated water deficit (wdef) is the accumulated difference between θ crit and θ while θ is less than θ crit (plants under water stress) and is 0 whenever θ is greater than or equal to θ crit (plants not under water stress).Time since wetting is the number of days that water deficit has been 0 (plants have not been water stressed).

Table 2 .
Comparison of model performance.Artificial neural networks (ANN) generally performed better than multiple linear regressions (MLR), but MLR's still managed to explain a large proportion of the variance in photosynthesis.The MLR estimates for NEE were obtained by summing the MLR estimates for R eco and GPP, and the model fit statistics were then calculated from these values.

Table 3 .
Relative importance (percentage) of the different variables used to predict ecosystem respiration, gross primary productivity, and net ecosystem exchange using an ANN.Variables are defined in Table1.

Table 6 .
Annualised summary of the different contributions to the carbon balance at the Skukuza flux site.

Table A1 .
Summary of comparisons between flux tower derived variables and corresponding variables derived from other sources.

Table A2 .
Annual rainfall over time.