Interactive comment on “ Sensitivity analysis of the GEMS soil organic carbon model to land cover land use classification uncertainties under different climate scenarios in Senegal ” by A

The paper of Dieye et al. addresses an important and relevant issue for the biogeoscience modeling community which make heavily use of satellite data input for process based models. The paper is well written and clear although the numerical experiments are rather complex. I have few comments that I hope will be addressed by the authors in a resubmission paper : 1. There is a complete lack of “in situ” validation besides some general broad values assessment by literature. I understand that the goal was to assess the GEMS sensitivity but I would have been more confident if some more evidence of measured SOC and NPP data corroborate the model outputs at different


Introduction
Africa is experiencing rapid and substantial social, economic, climatic and environmental change (Brooks, 2004;Challinor et al., 2007;IPCC, 2007;Nkonya et al., 2011).Soil carbon is important in West African drylands for soil fertility and agricultural sustainability and the influence of land management under changing climate on soil carbon is of particular interest (Batjes, 2001;Lal, 2004;Tieszen et al., 2004).Biogeochemical model simulations of carbon dy-namics in vegetation and soil in response to changes in land cover and land use (LCLU), land management and climate increasingly use spatially explicit LCLU data derived from satellite remote sensing (Turner et al., 2000;Liu et al., 2004;Kennedy et al., 2006;Liu et al., 2008;Tan et al., 2009).There is a recognition however that errors in satellite derived LCLU data, both in terms of classification errors and the degree of generalization of the landscape into the different LCLU classes, and differences between LCLU data sources and land cover classification approaches, may propagate into model outputs (DeFries et al., 1999;Reich et al., 1999;Turner et al., 2000;Quaife et al., 2008).
Remotely sensed satellite data have been used extensively to map land cover (Tucker et al., 1985;Pickup et al., 1993;Lambin and Strahler, 1994) although human influences are difficult to discern reliably except when using high spatial resolution data (Townshend and Justice, 1988).Consequently, high spatial resolution data, in particular from the Landsat satellite series, have been used for mapping land cover change over decadal periods (Skole and Tucker, 1993;Gutman et al., 2008).Satellite classification by visual photo interpretation is not suited to mapping large areas on the consistent and repeated basis required for long term monitoring.Automated techniques that use digital computer processing and statistical classification approaches largely overcome this issue, but also do not provide error free classifications.Furthermore, it is not usually possible to reliably map land use, i.e. the land's social, economical, and cultural utility, using automated techniques (Turner et al., 1997).In semi-arid areas, such as the West African Sahel, satellite land cover classification is particularly challenging because the vegetation types may be sparsely distributed across variable soil backgrounds and because they frequently transition and mix across the landscape at scales finer than the satellite pixel dimension (Frederiksen and Lawesson, 1992;Prince et al., 1990;Lambin and Ehrlich, 1997).Further, semi-arid vegetation often exhibits a marked seasonality in photosynthetic activity and leaf area in response primarily to seasonal precipitation, making the selection of appropriate satellite acquisitions important (Hiernaux and Justice, 1986).
The General Ensemble biogeochemical Modeling System (GEMS) is a well-established biogeochemical model developed for spatially and temporally explicit simulation of biogeochemical cycles (Liu et al., 2004;Tan et al., 2009).In this paper the sensitivity of GEMS modelled soil organic carbon to satellite LCLU mapping uncertainties is quantified for a semi-arid Sahelian region of Senegal.Supervised decision tree classification approaches are used to map LCLU from multi-temporal Landsat satellite data which are used to drive sapatially explicit maps of GEMS soil organic carbon under different climate change scenarios.A description of the study area (Sect.2), the Landsat data and pre-processing (Sect.3) and the GEMS input data and parameterization (Sect.4) are described.This is followed by description of the LCLU classification (Sect.5) and carbon modeling and sensitivity analysis methodologies (Sect.6).The results are presented and discussed (Sect.7), preceding the concluding remarks (Sect.8).

Study area
The study area is located in the north of Senegal, bordered by the Senegal River to the North and the Atlantic Ocean to the west, with the southern edge 100 km north of Dakar (Fig. 1).It covers 1560 km 2 lying between 15 • 24 to 17 • 00 W and 15 • 00 to 16 • 42 N.The area has a semi-arid climate with a single rainy season from June-July through September-October; average rainfall decreased from 400-600 mm in the 1960s to 200-400 mm in the 1990s, mean monthly temperature varies from 24.5 • C in January to 31.9 • C in May (Fall et al., 2006).
The study area includes a wide range of land covers and land uses, and consequently soil organic carbon, making it appropriate for the sensitivity analysis described in this paper.Most agricultural activities in the study area are undertaken during the rainy season, planting occurs in June followed by harvesting in late October through November.Flood recession farming is practiced in the Senegal River valley and irrigated crop production, largely dominated by vegetable production, is practiced where groundwater is available elsewhere.The dominant natural vegetation species are, trees: Acacia raddiana, Balanites aegyptica, Sclerocarya birrea, Combretum glutinosum, Adansonia digitata (boabab tree); shrubs: Guiera senegalensis, Boscia senegalensis, Calotropis procera; and grasses include primar- ily Cenchrus biflorus, Schoenefeldia gracilis and Dactyloctenium aegyptium.In order to summarize the region succinctly we refer to the Senegalese agro-ecological zones (also known as ecoregion) defined by Tappan et al. (2004).The study area encompasses four zones, and these are illustrated in Fig. 1 and are described below.
The smallest ecoregion (2 % of the study area), is a narrow strip of land (10 to 30 km wide) along the Atlantic coast (120 km) from Saint-Louis to Dakar.The predominant soils are ferruginous tropical sandy soils, deep and well drained, low in organic matter and mineral content (Tappan et al., 2004).The ecoregion is characterized by geomorphological features composed of active littoral and stabilized continental sand dunes that alternate with longitudinal depressions.The sand dunes support shrub savanna used by pastoralists as gazing land.The longitudinal depressions, locally called niayes, have given their name to the region as a whole, and are used for irrigated agriculture owing to the shallow water table accessed by artisanal wells.The main irrigated agricultural land use is market gardening, primarily carrots, onions, and cabbages, for sale in Dakar.Beginning in the early 1980's, coastal sand dune stabilization projects planted drought-tolerant Whispering Pine (Casuarina equisetifolia) which cover much of the coastal zone from Dakar to Saint-Louis (Tappan et al., 2004;CSE, 2005).A second ecoregion, lying east of the smallest ecoregion, and covering 45 % of the study area, includes much of the peanut basin, an area dedicated since the 1880s to groundnut cultivation.The predominant soils are slightly leached ferruginous tropical sandy soils lying in the plateau of the continental sedimentary basin.The main crops are millet, groundnuts, and sorghum in acacia tree parkland, which have replaced all vestiges of the pre-colonial woodland savanna landscape (Tappan et al., 2004).A third ecoregion, lying in the north east (east of Lake Guiers, Fig. 1) and covering 43 % of the study area, is the sandy ferlo.It constitutes Senegal's main sylvopastoral zone, an area that is generally too dry for crop production, with mean annual precipitation less than 200 mm.The vegetation is composed of open grasslands with scattered shrubs and predominantly acacia trees on red-brown sandy and ferruginous tropical sandy soils.The last ecoregion (11 % of the study area) is the Senegal River Valley, a floodplain previously covered by riverine woodland, today used for irrigated-agricultural projects that pump water from the Senegal River onto extensive rice and sugarcane fields.The predominant soils are hydromorphic and vertic with a sandy, clay loam, and clay.The natural vegetation is open steppe, shrub steppe, and riparian acacia woodland.

Landsat data
Landsat Enhanced Thematic Mapper Plus (ETM+) satellite data were used in this study.All six 28.5 m reflective, the two 57 m thermal (low and high gain), and the single 15 m panchromatic bands were used.Each ETM+ scene is approximately 180 × 180 km and is defined in the UTM coordinate system and referenced by a unique Landsat Worldwide Reference System (WRS-2) path and row coordinate (Arvidson et al., 2001).
Multi-temporal satellite data provide improved land cover classification accuracies over single-date classifications if the acquisitions capture seasonal and agricultural differences (Lo et al., 1986;Schriever and Congalton, 1993).Consequently, in this study two Landsat ETM+ scenes, acquired in 2002 in the early wet season (21 June) and the dry season (30 December) over the study area, WRS-2 scene path 205 row 49, were used.These acquisitions were selected because they were the only available scenes with very low (<1 %) cloud cover.They are considered to be representative of the year 2000 in the subsequent GEMS modeling.

Landsat data pre-processing
Landsat data are affected by several factors that need to be corrected before multi-date data can be compared reli-ably (Coppin et al., 2004).In this study, corrections for radiometric, atmospheric and geometric effects were undertaken.The ETM+ reflective bands were converted from digital numbers to at satellite reflectance using the best available ETM+ calibration coefficients and standard correction formulae taking into account the solar constant (Markham and Baker, 1986).The thermal bands were similarly converted from digital numbers to effective at satellite temperature using standard coefficients and Planck function formulae (USGS, 2001).The impact of the atmosphere is variable in space and time and is usually considered as requiring correction for quantitative and change detection applications (Ouaidrari and Vermote, 1999;Coppin et al., 2004).Several Landsat atmospheric correction methods have been proposed, with the dark-object subtraction (DOS) method widely used due to its methodological simplicity (Chavez, 1996).In the DOS approach, atmospheric path radiance is assumed to be equal to the radiance sensed over dark objects, such as dense vegetation or water, and is subtracted from each band.In this study, each Landsat acquisition was normalized using a dark object subtraction method to reduce scene-to-scene and within scene radiometric variations associated with atmospheric, phenological, and sun-sensor-target geometric variations.Surface reflectances were computed independently using inland water bodies and a small number of cloud shadows as dark objects.Clouds and cloud shadows were screen digitized manually and not considered in the subsequent analysis as they preclude optical wavelength remote sensing of the surface and deleteriously contaminate surface reflectance (Roy et al., 2010).
The two ETM+ acquisitions had already been orthorectified following established procedures (Tucker et al., 2004).However, to ensure precise sub-pixel co-registration, an image-to-image registration was performed using 25 ground control points identified in both scenes, and the December image was nearest neighbor resampled into reference with the June acquisition using a first-order polynomial warping transformation.The two 57 m at satellite temperature bands and the six 28.5 m at satellite reflectance bands were resampled in this way to 28.5 m to provide the same image spatial dimensions needed for the subsequent image classification.

GEMS model overview
The General Ensemble biogeochemical Modeling System (GEMS) was developed from the CENTURY model (Metherell et al., 1993) to enable integration of spatially explicit GIS data, including land cover, soils, climate, and land management practice information (Liu et al., 2008 in various ecosystems including grassland, forest, savanna, and crop systems (Metherell et al., 1993;Parton et al., 2004).The input parameters comprise site specific biophysical data, plant characteristics, and management data, including monthly precipitation, monthly maximum and minimum air temperature, soil texture, bulk density, drainage, water holding capacity, cropping systems, fertilization, cultivation, harvesting, grazing, tree removal, and natural disturbances such as fire (Parton et al., 2004;Liu et al., 2004).GEMS couples CENTURY with various spatial databases to simulate biogeochemical cycles over large areas (Liu et al., 2004;Liu et al., 2008).
GEMS consists of three major components: an encapsulated ecosystem biogeochemical model (i.e., CENTURY), a data assimilation system (DAS), and an input/output processor (IOP).GEMS uses a Monte-Carlo based ensemble approach to incorporate the variability of state and the driving variables of the underlying biogeochemical models into simulations.Geographic information system software (ESRI, 2007) are used to group pixels that have the same combination of spatially explicit input data values.Each combination is described by a joint frequency distribution (JFD) that is used by the DAS to relate the spatially explicit data and model input parameters using look-up-tables (Liu et al., 2004).The IOP incorporates the assimilated data to the modeling processes and in return writes the selected output variables to a set of output files after each model run.The main output variable of interest for this study is the total soil organic carbon (SOC) (gCm −2 ) in the top 0-20 cm soil layer.Soil organic matter is a key indicator of soil quality and is most usually determined by application of conversion factors to estimates of the soil organic carbon to some prescribed depth (Lal, 2004).The GEMS model includes three soil organic matter pools (active, slow and passive) with different potential decomposition rates of turnover: fast turnover (active SOM), intermediate turnover (slow SOM) and slow turnover (passive SOM) (Metherell et al., 1993).
In this study, 20 repeat GEMS model runs for each of 1081 JFDs were computed to incorporate the uncertainty of the input data and to provide stable spatially explicit soil organic carbon (SOC) estimates (Liu et al., 2004;Liu et al., 2008).Similarly, above ground net primary production (NPP) (gCm −2 yr −1 ) estimates were derived to check that the SOC and NPP values were plausible and spatially coherent.The GEMS model inputs are described below for the spatially explicit input data and the GEMS look up table parameterizations.In this study only the sensitivity of GEMS modeled SOC to land cover land use (LCLU) classification uncertainties are examined.Errors in the other input data and model parameterizations are not explicitly examined.Although, errors in the vegetation biomass and land management parameterizations are likely to be correlated to LCLU errors, other errors may change in space and time in ways that are only weakly correlated to LCLU.

Land Cover Land Use (LCLU) data
Spatially explicit 28.5 m LCLU maps representing the year 2000 were derived by multiple classifications of the Landsat ETM+ satellite data using a number of approaches described in detail in Sect. 5.

Climate data
Spatially and temporally explicit climate data were defined using 37 years of monthly average precipitation and minimum and maximum air temperature data defined in 0.05 degree grid cells (Hutchinson et al., 1996) nearest neighbor resampled to the 28.5 m Landsat pixel dimensions.These monthly data were available for the period 1960-1996 and were used to "spin-up" the GEMS model to 1900 equilibrium, and then to run the GEMS model from 1990 to 2000 and to run the GEMS model for three future climate scenarios from 2000 to 2052.The future climate scenarios (no change, low and high change) were developed following the approach developed by Hulme et al. (2001) who assessed possible future (2000-2100) changes in temperature and rainfall for Africa using seven global climate models.The Hulme et al. (2001) approach and results are considered (Tan et al., 2009) to be compliant and comparable with those from the IPCC Fourth Assessment Report (Christensen et al., 2007).Monthly climatologies of the 1960-1996 precipitation and minimum and maximum air temperature data were derived (i.e. 12 monthly values per 28.5 m Landsat pixel).The no climate change scenario (NCCS) simply used the same monthly values of these data for each month of 2000 to 2052.The low climate change (LCCS) and high climate change (HCCS) scenarios were defined by weighting the monthly climatology values using the following equations derived from Hulme et al. (2001) for the study area: Low Climate Change Scenario (LCCS): (1) High Climate Change Scenario (HCCS): where year is set from 2000 to 2052.The additive constants in the above equations ensure that the LCCS and HCCS values are equal to the NCCS values in year 2000.In this way under the low climate change scenario by 2052 the temperature is 0.69 • C warmer with 13 % less precipitation, and under the high climate change scenario by 2052 the temperature is 3.12 • C warmer with 28.6 % less precipitation.We note that these scenarios do not model inter-annual variability in precipitation and minimum and maximum air temperature data, which is a limitation but not a concern for the purposes of this sensitivity study, and is the same approach used by Liu et al. (2004) and Tan et al. (2009) to prescribe climate scenarios in studies in Ghana and Senegal.

Soil, drainage and water holding capacity data
A map of static soil information was extracted from a Senegalese 1:500,000 vector soil atlas defined with 168 soil units (Stancioff et al., 1986).Soil characteristics were defined for the 45 soil units falling in the study area using a look up table with respect to texture (i.e., factions of sand, silt, and clay), drainage state, and water holding capacity.Sand fractions varied from 51 % to 87 %, silt fractions from 11 % to 38 %, and clay fractions from 5 % to 15 %.The drainage state varied from poorly drained (= 0) to overly well drained (= 5), and the water holding capacity varied from high (clay = 5) to low (sand = 1).

Potential Natural Vegetation data
A static potential natural vegetation (PNV) map for 1900 was needed to run the GEMS model to equilibrium.In the absence of a PNV for Senegal, the earliest available vegetation map (Stancioff et al., 1986) developed by visual interpretation of 1985 Landsat data supplemented by intensive field survey was used.The map was nearest neighbor resampled to the 28.5m Landsat pixel dimensions, assigning to each output 28.5 m pixel the value in the input data set nearest its centre.This map is considered as the most authoritative in its domain for Senegal for the 1980's (Tappan et al., 2004).

GEMS look-up-table parameterization
Vegetation biomass and land management practices were parameterized using look-up-tables related to the derived Landsat land cover land use (LCLU) classification data.Joint frequency distributions of the look-up-table variables values for each of the Landsat LCLU classes were developed following established GEMS conventions (Liu et al., 2004).

Vegetation biomass parameterization
Vegetation attributes required for the model parameterization were synthesized from an inventory of soil and biomass samplings conducted in Senegal during the last 20 years (CSE, 2004;Woomer et al., 2004b;Tschakert et al., 2004).Aboveground biomass (trees, herbs, and litter) and their carbon stocks were calculated using allometric formulae (Woomer et al., 2004a;Brown, 1997).The root biomass of trees and herbs were estimated as 0.35 and 0.15 of the above-ground biomass, respectively, based on field observations (Woomer et al., 2004a).The proportion of carbon in all biomass pools was set as 0.47 (Woomer et al., 2004a).

Management practices
Management practices that affect carbon dynamics were used: crop composition, crop rotation probability, temporal changes of harvest practices, cropping practices (including plowing and selective cutting), fertilizer use, fallow probability and fallow length, fire frequency, and frequency and intensity of grazing.These practices were compiled from annual agricultural acreage and yield statistics, and livestock census data defined by Senegalese administrative units (départements) (CSE, 2002) and from information collated in previous studies (Touré et al., 2003;Manlay et al., 2002;Tchakert et al., 2004a).The management practices are summarized in Table 1 and were considered in terms of nonarable (including pastoralism) and arable land uses defined by the Landsat classified LCLU class.The main crops grown are millet, sorghum, and groundnuts.Fallow lengths were set as 1-5 years with successive 5-10 years of cropping.Non-subsistence agriculture was assumed to have started in 1920 with current mineral fertilizer use varying from 0 to 300 kg ha −1 (Tschakert et al., 2004).Before this date, the study area was assumed to be savanna with low to moderate grazing (little influence on plant production) that rose to current high grazing rates of 12 to 30 tropical livestock units per km 2 (CSE, 2002), with an assumed linear effect on plant production (Woomer et al., 2004a).

Landsat satellite data classification
The six 28.5 m reflective, and the two 57 m thermal (low and high gain) bands nearest neighbor resampled to 28.5m were classified together as described below.Clouds and cloud shadows were visually identified (<1 % of the image) and masked from both Landsat acquisitions and were not classified.The dry and wet season Landsat data were classified together, rather than independently.

Landsat LCLU classification scheme and training data
The state of the practice for automated satellite classification is to adopt a supervised classification approach where samples of locations of known land cover classes (training data) are collected.The optical and thermal wavelength values sensed at the locations of the training pixels are used to develop statistical classification rules, which are then used to map the land cover class of every pixel (Brieman et al., 1984;Foody et al., 2006).Supervised classification results depend on the appropriateness of the LCLU class nomenclature and on the quality of the training data used.Table 2 summarizes the nine LCLU classes and the number of Landsat training pixels for each class.These nine classes were selected by examination of pre-existing land cover maps including a land cover map of the north of Senegal generated by the Centre de Suivi Ecologique (CSE, 2002)  and were selected to ensure that the classes were mutually exclusive and that every part of the study area could be classified into one and only one class (Anderson et al., 1976).
The CSE land cover map used the Yangambi vegetation classification scheme that contains 25 vegetation classes defined according to their physiognomy (i.e.structure and form of vegetation groups) (Monod, 1956;Trochain, 1957).The Yangambi scheme predates by two decades the availability of satellite data, and the different Yangambi vegetation classes were not always spectrally unambiguous from one another in the multi-date Landsat data.For these reasons several of the Yangambi classes were combined and three vegetation classes, savanna grassland, mangrove and wetlands, were considered.In addition, the study area includes nonvegetated surfaces not considered in the Yangambi scheme, and the classes water, bare soil, rainfed agriculture, mud flats, and irrigated agriculture were identified based on our expert knowledge of the study area and multi-annual field visits.
Training pixels for each class were selected by visual analysis of the co-registered dry and wet season 2002 ETM+ imagery, augmented by our expert knowledge of the study area including information gathered during multi-annual field visits.Only training pixels that could be unambiguously identified were collected.A total of 11 717 Landsat 28.5m training pixels were selected (Table 2).Ideally, the training data should be representative of the area classified and of the classes in the classification scheme, although there is no statistical procedure to define a suitable number and spatial distribution without a priori information concerning the area (Stehman, 1997;Foody et al., 2006).Great care was taken in the training data collection.The land use-related classes (irrigated agriculture, rainfed agriculture, plantation forest) were the most difficult to reliably collect training data for.Irrigated agriculture is a unique characteristic of the Senegal River Valley and was interpretable on the Landsat data owing to the patterns of irrigation channels within and adjacent to agricultural fields.The peanut basin is the foremost rainfed agriculture area of Senegal, and polygonal rainfed agricultural fields were distinguishable by differences between the wet and dry season Landsat acquisitions.Plantation forest in the Niayes ecoregion forms a distinctive strip observable on the Landsat data.
Settlements contain different LCLU classes and consequently are difficult to classify reliably (Barnsely and Barr, 1997;Sun et al., 2003).This was particularly true for the rural villages occurring across the study area, which tended to be small and heterogeneous relative to the Landsat 28.5m pixel size.Consequently, all of the settlements were screen digitized manually and were not considered subsequently in the carbon modeling.

Classification approaches
The Landsat ETM+ data were classified using bagged decision tree approaches.Decision trees are hierarchical classifiers that predict class membership by recursively partitioning data into more homogeneous subsets (Breiman et al., 1984).Trees can accept either categorical data in performing classifications (classification trees) or continuous data (regression trees).They accommodate abrupt and non-monotonic relationships between the independent and dependent variables and make no assumptions concerning the statistical distribution of the data.Currently, bagged decision tree classifiers are the state of the practice approach for supervised satellite data classification (Doan and Foddy, 2007;Hansen et al., 2008).Bagging tree approaches use a statistical bootstrapping methodology to improve the predictive ability of the tree model and reduce over-fitting whereby a large number of trees are grown, each time using a different random subset of the training data, and keeping a certain percentage of data aside (Breiman, 1996).In this study, both hard and soft supervised classification approaches were undertaken.Classifications are described as "hard" when each pixel is classified into a single class category, i.e., full membership of a single class is assumed, and as "soft" when each pixel may have multiple partial class memberships (Foody, 2000).
Thirty bagged classification trees were generated, each time, 25 % of the training data were used to generate a tree, and the remaining 75 % were used to assess the classification accuracy.The 25 % proportions were sampled at ran-dom with replacement.To limit overfitting, each tree was terminated using a deviance threshold: additional splits in the tree had to exceed 1 % of the root node deviance or the tree growth was terminated.For each of the 30 trees, a soft classification result was generated defining for each 28.5 m Landsat pixel the probability of it belonging to each of the nine LCLU classes.
A hard decision tree classification was generated from the 30 soft classifications.Each soft classification was converted to a hard classification by assigning to each pixel the class with the highest probability, and then assigning the single most frequently occurring class category over the 30 classifications (Breiman, 1996;Bauer and Kohavi, 1999).When the maximum probability corresponded to more than one class, one of the classes was selected randomly.The number of unique classes that a pixel was independently classified in this way over the 30 trees was also recorded.

Classification accuracy assessment
The ensemble classification accuracy of the 30 soft decision tree classifications was quantified using a confusion matrix based statistical method.The confusion matrix is a two dimensional matrix composed of n columns and rows, where n is the number of classes, and each column represents the number of instances of a predicted (i.e.classified) class and each row represents the number of instances of an actual true class (Congalton et al., 1983).The diagonal of the confusion matrix records the agreement between the "classified" and the corresponding "truth".The off-diagonal records the disagreement.Conventional confusion matrix accuracy assessment approaches are inappropriate for application to soft classification results (Foody, 2000).Consequently a "soft-tohard" confusion matrix generation methodology was developed following the method of Doan and Foody (2007).
Recall that each of the 30 classification trees was generated from 25 % of the training data sampled at random with replacement.In the accuracy assessment, first each classification tree was used to classify the remaining ("out-of-bag") 75 % of the training data, deriving a vector of class probabilities for each out-of-bag pixel (Breiman, 1996).Then a single confusion matrix was generated from the 30 vectors of class probabilities.Throughout the 30 vectors of probabilities, each pixel was assigned to the LCLU class with the maximum probability.If several classes had the same probabilities then one class was selected at random.
Conventional accuracy statistics were then derived from the "soft-to-hard" confusion matrix.The percent correct, was calculated by dividing the total number of pixels correctly classified by the total number of pixels in the training data.The Kappa coefficient was also calculated as it provides another measure of overall classification accuracy, but that uses all the elements of the confusion matrix to compensate for chance agreement, although kappa values may be biased in areas with uneven proportions of the different classes (Stehman, 1997(Stehman, , 2004;;Foody, 2004).The producer's and the user's accuracies were computed to assess the accuracies of each class (Foody, 2002).The user's accuracy was calculated by dividing the number of all correctly classified pixels of a class by the sum of all pixels which had been assigned to that class; it indicates the probability that a pixel classified to a given class actually represents the reality on the ground (Congalton, 1991).The producer's accuracy was calculated by dividing the number of all correctly classified training pixels of a class by the sum of training data pixels for that class; it indicates the probability of a training pixel being correctly classified (Congalton, 1991).
6 Carbon modelling and sensitivity analysis methodology

Carbon modelling
The GEMS model was used to estimate soil organic carbon SOC (gCm −2 ) in the top 0-20 cm soil layer and also above ground net primary productivity (NPP) (gCm −2 yr −1 ).
In this study we assumed that human disturbances in the study area were negligible before 1900 and that consequently carbon stocks and fluxes were at near equilibrium conditions in 1900.This is primarily justified since colonial impacts on Senegalese land use practices in the early colonial period were limited to small urban settlements and nonsubsistence arable practices had largely not been developed (Gellar, 1976;Tschakert et al., 2004).Estimates of carbon stocks and fluxes in the study area in 1900 were obtained by running the model for 1500 years to a 1900 equilibrium (Liu et al., 2004;Tan et al., 2009) using the potential vegetation map, the 1960-1996 climate data, and the contemporary soil and drainage data described in Sect. 4. The model was run from 1900 to 2000 using the 1900 carbon estimates to initialise the post-1900 model runs.The land cover of the study area was characterized in 1900 by the potential natural vegetation map and in 2000 was characterized by the Landsat classifications.The historical trajectory of land cover and land management between 1900 and 2000 is unknown, and so we assumed a linear change as a best estimate and following the approach used by other researchers (Liu et al., 2004, Liu et al., 2008and Tan et al., 2009).
The GEMS model was run from 2000 to 2052 for the three climate change scenarios described in Sect.4.2.2.The GEMS model was run independently parameterizing the 2000 land cover land use and associated land management parameterization (Table 1) from the 30 Landsat soft classifications and the single hard Landsat classification derived from the 30 soft classifications.These 31 runs were each repeated for the no, low, and high climate change scenarios.
We assumed there was no LCLU change after 2000 in order to assess only the sensitivity of the GEMS model outputs to the LCLU classification uncertainties under the different climate scenarios.Moreover, prediction of future LCLU is difficult, not least because even if appropriate statistical LCLU change trend data existed, it may not capture future changes in LCLU driving forces, such as economic and policy modifications, acting at varying scales (Moss et al., 2010).Further, as LCLU in the study region is extensively soil moisture limited, future LCLU scenarios can only be meaningfully developed when coupled with future climate scenarios.This will be examined in future research that is not described here.

Soil Organic Carbon assessment & sensitivity analysis
Soil organic carbon (SOC) assessment and sensitivity analyses were performed to explore the variability imposed by the different land cover classification approaches for the three different climate scenarios.For the hard Landsat classification, where each 28.5 m Landsat pixel is assigned to only one LCLU class, the SOC for each pixel and simulation year and climate scenario was defined as: SOC year,scenario (i,j ) = C year,scenario,class (i,j ) where SOC year,scenario (i,j) is the SOC estimated at pixel column and row (i,j) and C year,scenario,class (i,j) is the GEMS modeled SOC at that pixel assuming that the pixel is entirely LCLU class class.The net primary productivity (NPP) was similarly derived for each hard classification pixel so that the GEMS NPP could be compared to the SOC data to ensure the estimates were plausible and spatially coherent.
For each soft classification, where the probability of class membership is stored at each pixel, the SOC for each pixel was defined as: SOC year, sceanrio (i,j ) = (6) where SOC year,scenario (i,j) is the SOC estimated at pixel column and row (i,j), C year,class (i,j ) is the GEMS modeled SOC for that pixel assuming all the pixel is entirely class class, and P 2000,class is the soft classification probability of the pixel belonging to class class.

LCLU classification scheme and classification accuracy assessment
Table 3 shows the "soft-to-hard" confusion matrix results for the 9 LCLU classes.The classification accuracies tabulated in Table 3 provide an assessment of the ensemble classification accuracy of the 30 soft decision tree classifications and so also indicate the hard classification accuracy as it is derived from the 30 soft classifications.The percent correct and Kappa were 97.79 % and 0.98 respectively.The producer's and user's classification accuracies were greater than 90 % for all the classes except for the wetlands, irrigated agriculture and mangrove classes.No class was misclassified as another by a significant amount -the greatest misclassification was 0.19 % between the rainfed agriculture and savanna grassland classes.These classification accuracies are high 1.For each pixel the number of unique classes (maximum 9) that it could be independently classified as over the 30 decision tree classification runs is shown.Pixels reporting a value of 1 were always classified as one particular LCLU type, whereas pixels reporting values of 5-7 were variously classified into between 5-7 LCLU types.White shows unclassified (water bodies, clouds, cloud shadows, settlement areas, or no Landsat data).and reflect what we expect is the best classification typically achievable for the study area.
The hard classification was defined from the 30 soft classifications, assigning at each pixel the single most frequently occurring class category over the 30 classifications using a voting procedure.Pixels where all 30 soft classifications agreed are more likely to be reliable than those where there was disagreement.Figure 2 shows the number of unique classes (maximum 9) that a pixel was independently classified as over the 30 decision tree classifications.Approximately 82 % of the pixels were classified into no more than 2 classes with 55 % classified as one class and 27 % as two classes.The least reliable areas, classified into 3 classes or more, occurred predominantly in areas classified as wetlands, mud flats, bare soil, irrigated agriculture, and mangroves; these classes also had the lowest producer's and user's accuracies (Table 3).Varying water levels present in all of these cover types may confound their discrimination, which is not unexpected when passive optical wavelength satellite www.biogeosciences.net/9/631/2012/Biogeosciences, 9, 631-648, 2012  (Ozesmi and Bauer, 2002).In addition, the peanut basin agricultural expansion zone in the South West of the study area, composed of a mix of savanna and rainfed agriculture, was less reliably classified.This is most likely because of the presence of abandoned rainfed agricultural fields in this region that are used for intermittent grazing and can physically resemble grassland (Tappan et al., 2004;Tschakert et al., 2004).

Hard decision tree classification SOC and NPP model results
Figures 3 and 4 illustrate year 2000 GEMS SOC in the top 0-20 cm soil layer and the above ground NPP respectively.The data were estimated as Eq. ( 5) using the 9 LCLU class hard Landsat classification illustrated in Fig. 1 and using the corresponding spatially explicit GEMS model inputs for the 9 classes under the no climate change scenario.Some spatial discontinuities are evident and are due to changes in certain GEMS input data, including the soil and climate data that are defined at coarser spatial resolutions than the 28.5m Landsat pixel dimensions.Table 4 summarizes the mean SOC and NPP for the 9 LCLU classes defined by the hard decision tree classification.The mean class SOC values range from 480.2 gCm −2 (Bare soil) to 1487.5 gCm −2 (Irrigated agriculture) with a mean study area SOC of 1219.3 gCm −2 or 12 193 MgCha −1 which is in general agreement with other worker's Senegalese esti-  mates (Touré, 2002;Manlay et al., 2002;Touré et al., 2003;CSE, 2004).Owing to the spatial differences in GEMS input data, within a given LCLU class, SOC values vary considerably.Thus, for Bare soil, SOC values range from a minimum  of 358 to a maximum of 1491 gCm −2 ; while for Irrigated agriculture they range from 417 to 4138 gCm −2 .In general, higher SOC values (Fig. 3) occur where NPP is higher (Fig. 4).The mean study area NPP is 185.1 gCm −2 yr −1 , which is in agreement with the results of Parton et al. (2004) who estimated NPP values up to 200 gCm −2 yr −1 in this region using the CENTURY model and coarser 10 km resolution input data.Similar differences of NPP values are also noted within LCLU classes.c Table 5 summarizes the LCLU minimum, mean and maximum SOC defined by the hard classification, and LCLU class percentage area, for each agro-ecological zone (Fig. 1).Comparison with the corresponding Table 4 study area LCLU class SOC statistics reinforces that geographic differences in the GEMS input data introduce SOC variability for any given LCLU class.For example, the savanna grassland class is highly prevalent in all four zones (varying from 41 % to 87 %), and although the mean savanna SOC for the entire study area is 1212 gCm −2 (Table 4) the zonal mean savanna SOC varies from 1127 gCm −2 (Senegal River Valley) to 1259 gCm −2 (Peanut Basin) (Table 5).The agro-ecological zone with the highest mean SOC is the Peanut basin (1344 gCm −2 ), followed by the Sandy Ferlo (1214 gCm −2 ), Niayes (1124 g C m −2 ) and the lowest is the Senegal River Valley (1046 gCm −2 ).This pattern reflects the SOC of the predominant LCLU classes.For example, the Peanut basin is predominantly rainfed agriculture (57 %) and savanna (41 %) which have high mean study area SOC (Ta-ble 4) and the Senegal River Valley zone includes the greatest proportion of mud flats (22 %) which has nearly the lowest mean study area SOC (Table 4).

Soft decision tree classification SOC results
There is insufficient space to illustrate the GEMS SOC derived as Eq. ( 6) for each of the 30 soft decision tree classifications for the year 2000.The mean of the 30 soft decision tree SOC estimates has a similar spatial pattern as the hard decision tree SOC illustrated in Fig. 3. Table 6 tabulates summary statistics of the 30 soft decision tree SOC estimates.Over the study area the mean SOC is 1217.4gCm −2 and is very similar to the 1219.3gCm −2 value estimated using the hard classification SOC (Table 4).For each class there is considerable variation between the minimum and maximum mean SOC statistics.For example, the irrigated agriculture class has mean SOC varying the most of all the classes from a minimum mean SOC of 457.9 gCm −2 to a maximum mean SOC of 4138.0 gCm −2 .This is explained in Sect.7.2.3.The class mean SOC values in Table 6 are similar to the hard SOC classification equivalents tabulated in Table 4.For all classes the difference in the mean SOC between the 30 soft and the hard classification SOC results is less than 4 %, except for mud flats (31 %), bare soil (22 %) and irrigated agriculture (8 %), which were the most inconsistently classified over the 30 soft classification trees (Fig. 2).

SOC Sensitivity to Land Cover Classification
The SOC derived from the hard classification (Fig. 3) for a given LCLU class varies spatially due to spatial variation in the GEMS model inputs (soil, climate, land management, etc.).The SOC also varies between the 30 SOC soft decision tree classification estimates due to differences both in the LCLU classifications and to spatial differences in the GEMS model inputs.The 30 soft LCLU classifications are different because of differences in the training data sampling which causes differences in the LCLU class membership probabilities for each soft decision tree classification.For these reasons the sensitivity of the GEMS SOC model is dependent not only on the LCLU classification errors and the degree of generalization of the landscape into the LCLU classes, but also on where the classes occur relative to the other GEMS model inputs.
To examine this sensitivity in more detail, Fig. 5 shows a map of the coefficient of variation (the standard deviation divided by the mean) of the 30 SOC soft decision tree classification estimates.The coefficient of variation, instead of the standard deviation, is used as it enables meaningful comparison between pixels that have markedly different mean SOC values.The SOC coefficient of variation varies from less than 0.15, for the majority of the study area, to more than 0.60.The highest SOC coefficient of variation values occur for the less accurately classified classes described in Sect.7.1 and www.biogeosciences.net/9/631/2012/Biogeosciences, 9, 631-648, 2012 Table 4. Comparison of the minimum, mean and maximum SOC (Fig. 3) and NPP (Fig. 4) simulated for the 9 LCLU classes using the year 2000 hard classification (Fig. 1).Only pixels where SOC and NPP was modeled are considered (i.e., not water bodies, clouds, cloud shadows, settlement areas, or where there was no Landsat data).
LCLU summarized in Table 3, i.e., for the bare soil, mud flats, wetland and rainfed agriculture classes situated along the coast and in the northwest.In addition, higher SOC coefficient of variation values occur in the peanut basin agricultural expan-sion zone in the south west where the hard classification "reliability" results illustrated in Fig. 2 shows several classes per pixel.This is most likely because abandoned rainfed agricultural fields in this region are used for intermittent grazing and can physically resemble other LCLU classes such as savanna grassland (Tappan et al., 2004).Fig. 6 shows histograms of the SOC coefficient of variation values for each land cover land use class defined by the hard decision tree classification (Fig. 1).The less accurately classified classes, i.e., bare soil, mud flats, wetland and rainfed agriculture, have more widely distributed SOC coefficient of variation values with more than 20 % of their pixels with SOC coefficient of variation values greater than 0.1.The results shown in Figs. 5 and 6 illustrate that satellite classification uncertainties impact the GEMS model results not insignificantly.Similar SOC coefficient of variation histograms were observed for the SOC modeled under the low and high climate change scenarios.use, with some perturbations in this trend due to the growth and decay of the modelled vegetation.Figures 8a-c show the mean SOC computed over all the classified pixels in the study area for the no, low, and high climate change scenarios plotted from 2000 to 2052.The SOC is estimated to decline from 2000 to 2052 under all climate change scenarios by approximately 11 %, 14 %, and 24 %, for the no (Fig. 8a), low (Fig. 8b), and high (Fig. 8c) climate change scenarios respectively.This trend has been observed elsewhere in West African drylands when temperature increases and precipitation decreases (Tan et al., 2009;Liu et al., 2004;Touré, 2002;Batjes, 2001).Summary statistics of the mean study area SOC results illustrated in these figures are tabulated in Table 7.These results reflect the spatial variability and uncertainty imposed by the different 2000 Landsat classifications and the spatio-temporal sensitivity of the GEMS model to that variability.
For all three climate scenarios, and for each simulation year, the mean study area SOC obtained running GEMS with the hard decision tree classification (green filled circles), is similar (within 4 gCm −2 ) to the means of the 30 soft decision tree classification model results (orange filled circles) (Figs. 7 and 8).This is not unexpected as the hard decision tree classification is generated by applying a voting procedure to the 30 soft classification trees and demonstrates that the hard decision tree classification approach does provide a  representative single mean study area SOC estimate.
The mean study area SOC for individual soft classifications varies for each simulation due to their different training data sampling which causes differences in the LCLU class membership probabilities and due to spatial differences in the GEMS model inputs as discussed in Sect.7.2.3.In 2000, for the no climate change scenario, the mean study area SOC values vary over the 30 soft decision tree classifications from 1196.6 to 1228.8 gCm −2 (Fig. 8a, Table 7).This 32.2 gCm −2 SOC range corresponds to a variation of 2.6 % of the mean study area hard decision tree classification SOC.This variation decreases in time to 26.7 gCm −2 in 2052, equivalent to 2.5 % of the mean study area hard classification SOC, and similarly it decreases to 31.4 gCm −2 (3 %) and 29.9 gCm −2 (3.2 %) for the low (Fig. 8b, Table 7) and high (Fig. 8c, Table 7) climate change scenarios.These results imply that using a state of the practice hard decision tree classification approach with a 9 class LCLU classification scheme imposes a variability of a maximum of 3.2 % of the mean study area SOC.

Conclusions
Research has attested to the significance of land cover and land use (LCLU) change on carbon dynamics (Scholes and Hall, 1996;Houghton et al., 1999;Lal, 2004;Tieszen, 2004) and on the utility of biogeochemical models to simulate soil and carbon biomass under different land management (Metherell et al., 1993;Batjes, 2001;Liu et al., 2004;Tschakert et al., 2004).However, differences between LCLU data sources and classification approaches, and errors in the LCLU data both in terms of classification errors and the degree of generalization of the landscape into the LCLU classes, may influence model outputs.Despite this, relatively few studies have examined this issue.In this study, state of the practice bagged decision tree approaches for LCLU classification of dry and wet season Landsat satellite data were used to assess the sensitivity of SOC estimated using the spatially explicit Global Ensemble Biogeochemical Modeling System (GEMS) under different climate scenarios.The approach could be utilized by other biogeochemical models that use spatially explicit LCLU parameterizations.This study was undertaken in northern Senegal, where satellite LCLU classification is particularly challenging because of the semi-arid landscape, and where the coupling between future LCLU and climate change is poorly understood.This research provides a new method to estimate the variability of SOC due to satellite LCLU classification errors.The single hard decision tree Landsat classification results, generated by applying a voting procedure to the 30 soft decision tree results, typically provided mean study area SOC values within about 4 gCm −2 of the mean of the 30 soft decision tree classification results.This is not unexpected, and demonstrates that hard decision tree classification provides an appropriate approach to define a single classification appropriate for GEMS modeling.The 30 SOC maps estimated independently using the 30 different soft classifications provide data that were used to quantify the variability of SOC imposed by satellite classification errors.
At the study area scale, considering the mean study area SOC, the variability of SOC imposed by satellite classification errors was not high.In 2000 the mean study area SOC values varied over the 30 soft decision tree classifications by 32.2 gCm −2 and corresponded to 2.6 % of the mean study area hard decision tree classification SOC.In 2052 this relative SOC variation was 2.5 %, 3 % and 3.2 % for the no, low and high climate change scenarios respectively.These variations are much less than the corresponding 11%, 14 % and 24 % declines from 2000 to 2053 in mean study area SOC modeled for the no, low and high climate change scenarios respectively.
At local, pixel, scale the impacts of satellite classification errors can be very apparent.The per-pixel coefficient of variation (the standard deviation divided by the mean) of the 30 SOC soft decision tree estimates was used to quantify the pixel-level spatial variability of SOC imposed by satellite classification errors.The highest coefficient of variations occurred for the least accurately classified classes and were not negligible.In this study, more than 20 % of the bare soil, mud flat, wetland and rainfed agriculture pixels had SOC coefficient of variation values greater than 0.1 with some as great as nearly 0.6.These high local-scale SOC variations are due to differences in the satellite classification training data sampling, which causes differences in the mapped LCLU class membership probabilities, and due to the interaction of these differences with spatial differences in the other GEMS model inputs.

A. M. Dieye et al.: Sensitivity analysis of the GEMS soil organic carbon model
The findings of this study indicate that the high local variability of SOC due to satellite classification errors should be taken into consideration, for example, using the method described here.This is particularly important as local-scale SOC variations imposed by satellite classification errors may obscure modeled temporal changes in SOC due to climate influences that may be highly land cover specific.There are a number of recent and planned spaceborne sensors with very high (<10 m) spatial resolution (Norris, 2011) and, in conjunction with next generation freely available Landsat and similar high spatial resolution systems designed for land cover monitoring (Wulder et al., 2008(Wulder et al., , 2011)), they provide opportunities for high resolution LCLU biogeochemical model parameterization and LCLU mapping uncertainty assessment.
This research has demonstrated a method to estimate the variability of GEMS modeled SOC due to satellite classification errors.The method can be applied to other biogeochemical models that use spatially explicit land cover land use (LCLU) parameterizations by running the model with a single hard and multiple soft LCLU classification inputs to infer model sensitivity.The Senegalese findings described in this paper can only be generalized to other process based models by repeating the described method with the new model.This is because of the non-linear dependency of the GEMS SOC estimates on LCLU and because, as we have demonstrated for specific LCLU classes at the study area scale and for four agro-ecological zones, the SOC uncertainty due to satellite classification errors is dependent not only on the LCLU classification errors but also on where the LCLU classes occur relative to the other biogeochemical model inputs.
As the goal of this study was to examine the sensitivity of GEMS modeled SOC to land cover land use (LCLU) classification uncertainties, the impacts of errors associated with the other GEMS spatially explicit input data and model parameterizations were not considered explicitly.The best available data sets and parameterizations were used.However, the degree to which all input data and model parameterization errors are captured by the GEMS simulations and by the LCLU bagged decision tree classification approach requires further research.

Fig. 1 .
Fig. 1.Landsat 28.5 m hard decision tree classification of the study area in north-western Senegal, covering 1560 km 2 lying 15 • 24 -17 • 00 W and 15 • 00 -16 • 42 N. Dry and wet season 2002 Landsat data were classified using a bagged decision tree classification procedure into 9 land cover land use classes (plantation forest, water, bare soil, rainfed agriculture, wetlands, mangrove, mud flats, irrigated agriculture, and savanna grassland).The study area is shown bounded by a black vector.White shows unclassified (clouds, cloud shadows, settlement areas, or no Landsat data).The boundaries of the four main agro-ecological zones (I: Niayes; II: Peanut Basin; III: Sandy Ferlo; and IV: Senegal River Valley) are shown as red vectors.

Fig. 2 .
Fig. 2.The "reliability" of the hard decision tree classification results shown in Fig.1.For each pixel the number of unique classes (maximum 9) that it could be independently classified as over the 30 decision tree classification runs is shown.Pixels reporting a value of 1 were always classified as one particular LCLU type, whereas pixels reporting values of 5-7 were variously classified into between 5-7 LCLU types.White shows unclassified (water bodies, clouds, cloud shadows, settlement areas, or no Landsat data).

3Figure 3
Figure3GEMS soil organic carbon (SOC) model output for 2000 using the 9 class 28.5m Landsat hard decision tree classification illustrated in Figure1and the corresponding spatially explicit model inputs for the 9 LCLU classes.White shows areas where no SOC was modeled (water bodies, clouds, cloud shadows, settlement areas, or no Landsat data).

Fig. 3 .
Fig. 3. GEMS soil organic carbon (SOC) model output for 2000 using the 9 class 28.5 m Landsat hard decision tree classification illustrated in Fig. 1 and the corresponding spatially explicit model inputs for the 9 LCLU classes.White shows areas where no SOC was modeled (water bodies, clouds, cloud shadows, settlement areas, or no Landsat data).

Figure 4
Figure4GEMS net primary productivity (NPP) model output for 2000 using the 9 class 28.5m Landsat hard decision tree classification illustrated in Figure1and the corresponding spatially explicit model inputs for the 9 LCLU classes.White shows areas where no NPP was modeled (water bodies, clouds, cloud shadows, settlement areas, or no Landsat data).

Fig. 4 .
Fig. 4. GEMS net primary productivity (NPP) model output for 2000 using the 9 class 28.5 m Landsat hard decision tree classification illustrated in Fig. 1 and the corresponding spatially explicit model inputs for the 9 LCLU classes.White shows areas where no NPP was modeled (water bodies, clouds, cloud shadows, settlement areas, or no Landsat data).
Fig.7shows the mean SOC averaged over all the classified pixels in the study area for the no climate change scenario plotted every 4 years from 1900 to 2052.The open circles show the mean SOC from simulation using the 30 independent decision tree soft classifications; the orange filled circles show the mean of the 30 simulations.The green filled circles show the mean SOC derived from the hard decision tree classification carbon assignment approach.It is evident that from 1900 to 2000 the SOC is generally decreasing, by about 32 % from approximately 1800 gCm −2 to approximately 1220 gCm −2 , this is due to human land cover land

Fig. 5 .
Fig. 5.The soil organic carbon (SOC) coefficient derived from the 30 soft decision tree classification model runs.The coefficient of variation (standard deviation divided by mean) is dimensionless.The 2000 Landsat data were classified 30 times into one of more the 9 LCLU classes and the SOC modeled for the corresponding spatially explicit model inputs for those classes.White shows areas where no SOC was modeled (water bodies, clouds, cloud shadows, settlement areas, or no Landsat data).

Figure 6 Fig. 6 .Figure 7
Figure6Histograms of the year 2000 SOC coefficient of variation (Figure5) for each land cover land use class defined by the hard classification (Figure1).

Fig. 7 .
Fig. 7. Mean GEMS modeled soil organic carbon (SOC) computed for the entire study area under the no climate change scenario, from 1900 to 2052 at 4 yearly intervals, using the 9 land cover land uses classes and different Landsat classification approaches.The open circles show the mean SOC for each of the 30 independent bagged decision trees computed using the soft classification-carbon assignment approach; the orange filled circles show the mean across 30 soft classification simulations; the green filled circles show the mean SOC derived simulations using the hard decision tree classification.

8Figure 8 Fig. 8 .
Figure 8 Mean GEMS modeled soil organic carbon (SOC) computed for all the study area for the period 2000 to 2052, under the a) no, b) low, and c) high climate change scenarios.See Figure 7 caption for details.

Table 1 .
Summary of management practices used for the GEMS model parameterization.The crop rotation probabilities should be read horizontally from time 1 to time 2; each row sums to 1.

Table 2 .
Description of the 9 land cover land use (LCLU) classes and the number of training pixels used for the classification.

Table 3 .
Soft-to-hard confusion matrix results for the 9 land cover land use classes.The cell values report percentages of the total area; a total of 305 428 pixels were considered.The percent correct is 97.79 % and Kappa-coefficient is 0.98.Grey fields, along the diagonal, represent for each class, the percentage correctly classified.The classes are: 1. Plantation; 2. Water; 3. Bare soil; 4. Rainfed agriculture; 5. Wetlands; 6. Mangrove; 7. Mud flats; 8. Irrigated agriculture; 9. Savanna grassland (Table2).

Table 5 .
Comparison by agro-ecological zone of the minimum, mean and maximum SOC (gC m −2 ) (Fig.3) for the 9 LCLU classes using the year 2000 hard classification (Fig.1).The LCUC percentage area in each zone is shown in parentheses.Only pixels where SOC was modeled are considered (i.e., not water bodies, clouds, cloud shadows, settlement areas, or where there was no Landsat data).

Table 6 .
Summary statistics of the mean of the 30 soft decision tree SOC estimates for year 2000.The statistics are summarized with respect to the 9 LCLU classes defined by the hard decision tree classification (Fig.1).The mean study area mean SOC is 1217.4gC/m 2 .Only pixels where SOC was modeled are considered (i.e., not water bodies, clouds, cloud shadows, settlement areas, or where there was no Landsat data).

Table 7 .
Summary statistics of the mean study area hard and soft decision tree (DT) soil organic carbon (SOC) (gC/m 2 ) model estimates illustrated in Figs.7 and 8, for the no, low and high climate change scenarios, for selected years.