Integrating field sampling, geostatistics and remote sensing to map wetland vegetation in the Pantanal, Brazil

Development of efficient methodologies for mapping wetland vegetation is of key importance to wetland conservation. Here we propose the integration of a number of statistical techniques, in particular cluster analysis, universal kriging and error propagation modelling, to integrate observations from remote sensing and field sampling for mapping vegetation communities and estimating uncertainty. The approach results in seven vegetation communities with a known floral composition that can be mapped over large areas using remotely sensed data. The relationship between remotely sensed data and vegetation patterns, captured in four factorial axes, were described using multiple linear regression models. There were then used in a universal kriging procedure to reduce the mapping uncertainty. Cross-validation procedures and Monte Carlo simulations were used to quantify the uncertainty in the resulting map. Cross-validation showed that accuracy in classification varies according with the community type, as a result of sampling density and configuration. A map of uncertainty derived from Monte Carlo simulations revealed significant spatial variation in classification, but this had little impact on the proportion and arrangement of the communities observed. These results suggested that mapping improvement could be achieved by increasing the number of field observations of those communities with a scattered and small patch size distribution; or by including a larger number of digital images as explanatory variables in the model. Comparison of the resulting plant community map with a flood duration map, revealed that flooding duraCorrespondence to: J. Arieira (juarieira@ufmt.br) tion is an important driver of vegetation zonation. This mapping approach is able to integrate field point data and highresolution remote-sensing images, providing a new basis to map wetland vegetation and allow its future application in habitat management, conservation assessment and long-term ecological monitoring in wetland landscapes.


Introduction
Wetlands are vulnerable habitats threatened by climatic change, due to their high sensitivity to the hydrological regime (Junk, 2002).They form transitional habitats between aquatic and terrestrial systems and embody different kinds of habitats such as mangroves, peatlands, freshwater swamps and marshes (Mitsch et al., 2009).The ecological importance of these habitats has been recognized worldwide as well as the urgent need to preserve them, as stressed in the Cuiabá Declaration on Wetland elaborated during the 8th International Wetlands Conference of INTECOL, Brazil.However, lack of policy guidance to regulate the sustainable use of wetlands may lead to arbitrary management decisions (Junk et al., 2006).To improve the protection of wetlands, it is imperative to have a thorough understanding of the structuring elements and of the identification of efficient methods to describe and monitor them.
Vegetation communities have distinct spatial and temporal patterns.Understanding the mechanisms that determine these patterns has been an important issue in ecology for decades (e.g., Connell and Slatyer, 1977;Svenning et al., 2004).Two aspects play a key role: spatial interactions in ecological processes (e.g.competition), and environmental Published by Copernicus Publications on behalf of the European Geosciences Union.
factors (e.g.flooding duration) (Tilman, 1988).Ecological processes include interactions between individuals, which may cause particular spatial patterns in the distribution of plants.Spatial variation in environmental factors causes spatial patterns in vegetation communities due to the differences of species requirements.These two factors do not usually operate independently but act together at different spatiotemporal scales (Turner, 1989;Svenning et al., 2004).This multi-scale interaction may lead to complex spatial patterns that are continuously changing (Wagner and Fortin, 2005).Consequently, the ability to distinguish plant communities that arise from multi-scale ecological processes requires an understanding of the processes and parameters causing the heterogeneity (Turner, 1989).
Classical methods describing vegetation distribution patterns along environmental gradients (e.g.altitude, temperature, water, nutrients) are based on sampling field plots, often along transects (McIntosh, 1958;Whittaker, 1967).Such an approach yields detailed insights into the vegetation occurrence and vegetation assemblages but does not provide spatially continuous information required to study mechanistic processes and spatial patterns of the landscape (Austin and Smith, 1989).To retrieve such spatially continuous information requires techniques that consider space explicitly (Gardner and Engelhardt, 2008).One of these techniques is remote sensing.By using the spectral signature of different vegetation states, remote sensing enables description of spatial and temporal patterns of vegetation in a spatially continuous way (Jensen, 2007).A restriction of this approach is the limited level of detail in attribute information that can be mapped by remote sensing, hindering the detection and identification of many ecologically important properties of vegetation communities, such as floral composition (Chambers et al., 2007).
Whereas field plots and remotely sensed data each have their limitations as a basis for continuous vegetation maps, is it possible to combine them through a statistical approach (Guisan and Zimmerman, 2000;Ferrier et al., 2002;Pfeffer et al., 2003;Miller et al., 2007).Point-data or other field data and spatially continuous information from remote sensing are here incorporated by means of statistical methods, such as ordination analysis (Jongman et al., 1995) and spatial interpolation techniques such as kriging.In this way, we can make maps representing the spatial distribution of vegetation across large areas that incorporate detailed information on floral composition (Pfeffer et al., 2003).This approach has become increasingly important in ecological studies as it recognizes the influence of spatial correlation in vegetation patterns (Bascompte and Solé, 1996;Turner et al., 2001).In addition, these techniques allow quantification of the uncertainty in mapped vegetation, which is valuable when vegetation maps are used for further quantitative analysis or for calibration and evaluation of mechanistic vegetation models (e.g., Brzeziecki et al., 1993;Guisan and Zimmerman, 2000;Chong et al., 2001).Here, we will use mapped vegetation (and its uncertainty) to understand the complexity of spatial patterns of vegetation distribution and to study the effect of flood duration on plant community patterns.
In this study, we integrate field data and remotely sensed data through geostatistical methods for a case study in the Pantanal, a 150 000 km 2 floodplain of the upper Paraguay basin in the center-west part of Brazil.The variability in water depth and flood duration are considered to be the preponderant causes of the high diversity of biological communities and plant zonation patterns found in the area (Junk et al., 1989;Wantzen et al., 2005).In this extensive and pristine wetland floodplain, long-term conservation depends on habitat diversity maintenance (Junk et al., 2006).
The aims of this paper are: (1) to indentify and map plant communities of the Pantanal by combining field data and remotely sensed data using advanced statistical techniques; (2) to evaluate the uncertainties in vegetation classification of this novel statistical approach; and (3) to investigate the relation between flood duration and vegetation zonation.

Study area
Our study site covers 60 km 2 and is located within a nature reserve in North Pantanal (16 • 30 -16 • 44 S and 56 • 20 -56 • 30 W) (Fig. 1a).The site is representative of a large part of the Pantanal regarding vegetation and environmental conditions.
The Pantanal contains a large variety of alluvial ecosystems with different drainage patterns, flooding characteristics, geomorphologic aspects and vegetation types (Fig. 1a) (Assine and Soares, 2004).The climate of this region is tropical humid with marked seasonality between winter and summer periods (Köppen, 1948).The summer from November to April is characterized by high temperatures (average day temperature 34 • C) and it is the season with the largest amount of precipitation (Fig. 1b).The precipitation decreases in winter, causing this season to be very dry (De Musis et al., 1997).The water level in the rivers of the Pantanal follows the seasonal trend in precipitation.Due to the poor surface and subsurface drainage and the smooth, low elevation relative to the river level (Alvarenga et al., 1984;Assine and Soares, 2004), large areas of the Pantanal are flooded every summer.Climate oscillations have been shown to be the main cause of the observed multi-year period of cyclic variation in flooding (Junk et al., 2006).The fluctuation in water level of the river Cuiabá, which crosses the north part of the studied area, is the main cause of the flooding of the studied floodplain.
The Pantanal vegetation presents floristic elements of three important morphoclimatic and phytogeographic domains i.e.Cerrado (Brazilian savanna), Amazonia and Chaco (Ab 'Saber, 1988).Savanna vegetation types are dominant physiognomies in the Pantanal (67%).Semideciduous forest, gallery forest, swamp, Chaco, pioneer formations such as monodominant forest of Vochysia divergens Pohl (Silva et  al., 2000) form the remaining components of the vegetation mosaic.The variability in water depth and flooding duration and the temporal connections and disconnection established between different elements of the landscape by means of the flood pulse are considered the preponderant causes of the high diversity of biological communities in the Pantanal (Junk et al., 1989;Wantzen et al., 2005), dictating where and when plant species with different life strategies and flooding tolerance will appear (Junk et al., 2006).

Outline of the approach
Figure 2 shows a diagram with the procedural steps followed to identify vegetation communities, to determine their spatial distribution, and to study their relationship with flooding duration.The first step of the procedure was the extraction of vegetation communities from field sampling using factor analyses and clustering (Fig. 2, top-right).Spatially continuous variables were obtained from remotely sensed imagery and a digital elevation model providing spatial information necessary for vegetation mapping (Fig.  data using regression analysis (Fig. 2, centre).After fitting variograms describing the spatial correlation in the residuals of the regression analysis, universal kriging was performed to combine extracted factor scores and spatially continuous information derived from remote sensing to map vegetation communities (Fig. 2, centre).The second step of the procedure included an extensive uncertainty analysis on this mapping procedure by cross-validation and random simulations to quantify the quality of the vegetation community maps (Fig. 2, bottom-left).Finally, the vegetation map was used to study the vegetation-environment relations by comparing spatial patterns of plant community distribution with spatial patterns of observed flooding duration (Fig. 2, bottom right).

Vegetation data
We have sampled vegetation data based on field sampling of key structural and compositional attributes of the five following plant life forms as defined by Michin (1989): herbaceous species (including gramineous plants), vines, shrubs, and two h Fig. 3. Sampling scheme modified from the RAPELD method (see Magnusson et al., 2005).size classes of trees.Here, we use the term life form to indicate functional groups based on "group of plants that are similar in a set of traits and their association to certain variables" (Pillar and Sosinski, 2003).Shrubs were considered individuals with the trunk bifurcated at the ground level and maximum canopy height of three meters.Palm species were considered either as a shrub life form or as a tree, depending on species morphology.Due to possible phenotypic plasticity found in species living under different micro-environmental conditions, life form of a species was defined according to the predominant morphologic form found in our sampling.Two reasons motivated us to focus on life forms instead of individual species when describing vegetation.First, this approach reduces the data dimensionality (Colosanti et al., 2007), and second, life form and ecology of plants are associated (Grime, 1979), which guarantees that each life form is an ecological unit.Dominant species within each plot, that is, the woody species with the most biomass or the vine and herbaceous species with the most coverage, were identified and included in vegetation observations and analyses to ensure discrimination between structurally similar but floristically distinct communities.Field sampling of vegetation was done in 2006 and 2007.A sampling scheme modified from the RAPELD method (cf., Magnusson et al., 2005) was used (Fig. 3).The adjusted RAPELD method comprised the establishment of 23 transects each of 250 m length distributed over the study site (Fig. 3a).In order to study the effect of flooding duration on vegetation composition and structure, each transect was positioned at a different topographical elevation.Each transect was thus placed along an elevation contour, defined using a tripod-mounted telescope.Each transect was divided into sampling transects of 50 m length in order to capture variation in vegetation over short distances.This partitioning resulted in a regular number of samples with similar sample length suitable for sampling the different life forms (Fig. 3b), producing a total number of 115 sampling units.Data collection and sample dimensions of a sampling unit varied according with plant life form (Table 1, Fig. 3c).Herbaceous and vine species were sampled with the point quadrat method (Bullock, 1996).This method consists of counting the number of times that plant individuals touch on a vertical rod, placed at 2 m intervals along the sampling transect.The coverage value for a sampling transect was calculated as the proportion of 25 points being intercepted by the plant (Table 1).For woody life forms, plots were positioned along the transect (Fig. 3c).The plots have a length equal to the length of the sampling unit along the transect (50 m), and a width depending on the life form size as suggested by the RAPELD method (Fig. 3c).Shrub measurements were taken in plots of 200 m 2 (50×4 m); medium-sized tree measurements in plots of 1000 m 2 (50 × 20 m); and large-sized tree measurements in plots of 2000 m 2 (50×40 m).All species found in the plot were identified and trunk diameter and species height were measured for trees and shrubs.For shrub species, diameters were measured for each individual at 5 cm above the soil www.biogeosciences.net/8/667/2011/Biogeosciences, 8, 667-686, 2011 Table 1.Vegetation data obtained from field sampling and the available remote-sensing and ancillary imagery.Sample size varies with the life form of plant.Biomass of trees and shrubs are calculated using the allometric equation developed by Chave et al. (2005) and by Barbosa and Ferreira (2004) surface and for tree species at breast height.These data were also used to calculate some variables describing the vegetation structure.Aboveground biomass of woody individuals was estimated by two different allometric equations for shrubs and trees, respectively.Aboveground woody biomass (B s , kg) of shrubs was calculated using the allometric model developed by Barbosa and Ferreira (2004): with, Cb is circumference at the ground height (cm) and H the species height (m).
Biomass (B t , Kg) of a tree species was estimated following Chave et al. (2005): with, ρ (g cm −3 ) the wood specific density, H (m) the species height, and d (cm) the species diameter at breast height.Information on species densities was obtained from Schöngart et al. (2008).Canopy height (CH) was considered the average height of the eight highest individuals in a plot.

Remote-sensing and ancillary data
Remotely sensed imagery and ancillary data are frequently used in spatial vegetation modeling due to their capability of providing accurate environmental information related to vegetation patterns (Guisan and Zimmerman, 2000;Pfeffer et al., 2003;Miller et al., 2007).An IKONOS-2 image and a Digital Elevation Model (DEM) (Table 1, Fig. 1c, d) were used in this study to derive variables related to vegetation patterns.The acquisition date of the IKONOS-2 image is 1 October 2003 corresponding to the dry season in the Pantanal and representing an optimal time for detecting spectral signatures of terrestrial vegetation on the floodplain, due to the availability of cloud free images and nonflooded soil conditions.The IKONOS-2 image consists of four spectral bands: three bands in the visible part of the spectrum located at blue (450-520 nm), green (520-600 nm) and red (630-690 nm) and one band in Near Infrared (760-900 nm) (Table 1).The pixel size is approximately 4 × 4 m.The registered radiance values by the IKONOS-2 sensor were converted to reflectance values using the calibration information provided by Bowen (2002).A Normalized Difference Vegetation Index (NDVI) was computed from the spectral bands by taking a ratio of the difference of the near infrared and red spectral bands and the sum of the near infrared and red band (Tucker, 1979).Such an NDVI image shows stronger contrast between vegetation and soil and water surfaces while reducing noise in the image.Furthermore, we applied a principal component (PC) transformation to the IKONOS-2 image to reduce inter-band correlation and extract new spectral information that arises from this transformation.The four original bands, the NDVI image and the four principal component images were used for further data analysis as described in the next section.The 90-m resolution DEM of the study area was obtained from the SRTM (NASA Shuttle Radar Topography Mapping Mission) and used to provide continuous information of canopy height rather than soil surface (Jacobsen, 2006) (Fig. 1d).
The original geodata with cell sizes of 4 m (IKONOS-2) and 90 m (SRTM DEM) were re-sampled to support the field data, i.e., to the plot size used to take measurements of large tree species ( 50 (2) calculating average remote-sensing and elevation values for exactly these digitized plots; and (3) extracting variables from the IKONOS-2 derived images and SRTM DEM for the 115 plots to be used in the analysis.The two last steps were done in the PCRaster interactive raster GIS environment, which is oriented towards spatio-temporal modelling (PCRaster, 2002;Wesseling et al., 1996).
5 Identifying plant communities Velloso et al. (1991) developed a classification system of Brazilian vegetation, which was adapted by Nunes da Cunha et al. (2006), providing a detailed description of the plant communities in the Pantanal.Here, we have used the communities described by Nunes da Cunha et al. (2006), because these can be identified by means of structure and composition (only dominant species) data of different vegetation layers.Communities are represented at a broad level as vegetation formation types rather than plant associations.We used factor analysis (Bray and Curtis, 1957) where the factor scores summarize the structural and compositional characteristics of different vegetation samples.These factors were found in a principal component analysis of the correlation matrix, generating a small number of orthogonal factors explaining the correlation among the vegetation variables (Legendre and Legendre, 1998).The different factor scores are plotted against each other in Fig. 4, and the proximity among point-samples and our field background about the vegetation classes found in these points were used to classify them in vegetation classes/clusters.

Ecological interpretation of ordination space
The first four factor axes explain 46% of the total variance (Table 2).We assumed that the strongest correlations with each axis reflect the main vegetation gradients captured by it.Factor 1 explains a relatively large proportion (22%) of the total variance.It mainly distinguishes communities dominated by a tall and rich tree layer (negative loadings) and those dominated by vine, shrub or herbaceous life forms (positive loadings).Although explaining considerably smaller proportions of the total variance, the remaining factors are still useful for identifying the vegetation classes.Factor 2 separates plant communities by their degree of coverage and richness of herbaceous species.Factor 3 mainly represents variation in biomass of two trees, Brosimum latescens and Mouriri guianensis, and one shrub, Psychotria capitata.Factor 4 mainly represents variation in the biomass of shrubs and of two species, the medium-sized tree Sapium obovatum and the shrub Ruprechtia brachycepala.

Defining plant communities through ecological interpretation of clusters
Plant communities of the study area are identified based on the proximity of point scores on the ordination space, and the general expert knowledge (including existing vegetation community classifications) of vegetation communities in the area.Seven clusters are indicated in a scatter plot of the different factors (Fig. 4  The number of samples in each cluster and their distribution over the ordination space express the structural and floristic variability found within the community and which communities have dominated the floodplain landscape.Table 3 provides a statistical summary of structural and floristic characteristics of communities. A number of communities show overlapping ranges of scores on some of the factor axes, while other factor axes provide clear boundaries between these communities.For instance, the transitions between Alluvial forest and Monodominant forest are smooth (Fig. 4a-c).These two communities are mainly separated through the tree biomass and coverage of herbaceous species in Monodominant forest (Fig. 4a) and the dominance of Brosimum latescens and Mouriri guianensis in Alluvial fores (Fig. 4b).Dense savanna lies between Open savanna and Monodominant forest.Tree biomass distinguishes these communities (Fig. 4a).Dense savanna, Grassland, Open savanna and Monodominant forest have We first examined the relationship between IKONOS images and SRTM DEM and the vegetation patterns captured in the four factorial axes.Pearson correlation was applied to analyze the existence of linear relationships, observed in an exploratory analysis of the data, between image derived variables and field data (scores on factor axes).This facilitates our understanding of the spectral nature of the field data and the ecological interpretation of the variables (James and Mc-Culloch, 1990).Next, the relationship between the image derived variables and the factor axes was found using the following multiple linear regression model: where Y i is the score value, a 0 ,a 1 ,...,a p are the model parameters, x 1i, x 2i , . . ., x pi are the values of the image derived variables and ε i are uncorrelated residuals.The analyses were done with log transformed reflectance values to ensure that the statistical distribution of the data is close to Gaussian (Draper and Smith, 1998).
In the multiple regression analysis, image derived variables were selected to be included in the multiple regression models using the best-subset regression method (Hofmann et al., 2007;James and McCulloch, 1990).In this analysis all combinations of explanatory variables in regressions are tested, and Mallow's C-p statistic (Mallows, 1973) and determination coefficients (R 2 ) are used as eliminatory criterions of variables (Draper and Smith, 1998).We consider the best regression equation for each factor the one with the lowest C-p value, highest R 2 , and lowest number of explanatory variables.

Vegetation patterns captured by digital images
Table 4 shows correlations between explanatory variables and the factor axes.Except for NDVI, all image derived data present significant correlation with Factor 1.The strongest correlations with this first axis are found with blue, green and red bands, PC1 and canopy topography.Lower reflectance values in the three spectral bands and lower score values in the PC1-3 images are linked to areas occupied by communities with high stored tree biomass such as Monodominant forest and Alluvial forest (Table 4).Lower score values in the PC4 reflects communities with lower tree biomass values even though this axis explains the noise from the spectral band transformation.In spite of its weak correlation with Factor 1, NDVI shows an expected spectral behaviour: the values decrease toward areas with lower tree biomass, www.biogeosciences.net/8/667/2011/Biogeosciences, 8, 667-686, 2011  such as those areas covered by Grassland, Open savanna, Shrubland and Dense savanna.The strong negative correlation between canopy topography and Factor 1 shows that the boundaries between communities dominated by trees and those dominated by shrubs, lianas and herbs are detected by differences in canopy height.
The variability in cover degree and richness of herbaceous life forms expressed by the second axis is best described by the PC2 image (61%; Table 4).Communities with higher and richer coverage of herbaceous species such as Grassland, Open savanna and Dense savanna are associated with higher reflectance values in blue, green and red bands and higher score values in the PC2 image.The negative correlations between Factor 2 and infra-red band and NDVI show that communities dominated by herbaceous species present weaker spectral response to these two images.
As observed earlier, Factor 3 mostly justifies the spatial distribution pattern of three tree species that dominate in Alluvial forest.Relatively to Monodominant forest, the lower biomass content and canopy height of Alluvial forest might be the cause for the negative correlations between Factor 3 and NDVI and Factor 3 and canopy topography.
The spatial variability of Factor 4 represents vegetation patterns that are mainly explained by canopy topography (i.e.DEM) showing that areas with higher biomass of shrubs are associated with lower canopy height.
The equations found in the multiple regression analysis are shown in Table 5.The regression models significantly explain 70.4%, 66.3%, 31.3% and 25.6% of the variance in factors 1 to 4, respectively.The use of an automatic variable selection technique to choose regression models resulted in inclusion of autocorrelated predictor variables in the mod-els, such as the principal component images and IKONOS bands.Because the main purpose of using multiple regression analysis in this study is to predict values accurately, and not to test hypotheses about the model parameters, colinearity of explanatory variables was not a matter of great concern (Legendre and Legendre, 1998).

Variogram analysis
We applied variogram analysis to the residuals of the multiple linear regression to derive information on their spatial structure (Wagner and Fortin, 2005).This information was used for two reasons: (1) to investigate the spatial autocorrelation associated with the observed vegetation patterns; (2) to use this information when making spatial predictions (Miller et al., 2007).Sample variograms were estimated and variogram models fit using the function autofitVariogram from the library automap (Hiemstra et al., 2008)  The results indicate that the vegetation gradients represented by the residuals of each factor (Factor 1-4) vary on different spatial scales (Fig. 5).Variograms of the Matérn family, a family of semivariogram models where the degree of smoothness of the random field is controlled through a shape parameter (kappa) (Pardo-Iguzquiza and Chica-Olmo, 2008), were fit for Factors 1, 2 and 4 (Fig. 5a, b, d), whereas an exponential variogram (special case of the Matérn family) was fit for Factor 3 (Fig. 5c).The first and third factors show large-scale patterns as revealed by their ranges of spatial dependence.The variogram of Factor 1 has a range of 3380 m, whereas the variogram of Factor 3 is monotonically increasing within the extent of the sample variogram and consequently has a larger range.The variograms of Factors 2 and 4 show short ranges of spatial dependence (close to a pure nugget effect) suggesting that processes governing their spatial patterns show small scale variability.

Universal kriging
Universal kriging is a spatial interpolation technique that can incorporate environmental data and spatial dependence in the modeled error to predict at locations without observations and generate accurate vegetation distribution maps (Pfeffer et al., 2003;Pebesma and Wesseling, 1998).Universal kriging was done on the regression residuals and the interpolated residuals were added to a trend surface to predict factor scores at unobserved locations.This trend surface was based on the regression equation in Eq. ( 3) (Pfeffer et al., 2003).The predicted scores were used to create four factor score www.biogeosciences.net/8/667/2011/Biogeosciences, 8, 667-686, 2011 Table 5.Multiple linear regression models relating factor axes scores (F1-4) to imagery derived variables: four spectral bands (blue, green, red and infra-red), Normalized Difference Vegetation Index (NDVI), Principal Component transformation to the IKONOS-2 image (PC), and canopy topography derived from DEM-SRTM (DEM).R 2 is the coefficient of determination showing the strength of these relationships.maps.In addition, the universal kriging approach was used to estimate the prediction error (standard deviation), which is typically increasing as a function of the distance to observation locations (Stein and Corsten, 1991).
The score maps in Fig. 5 show the vegetation spatial patterns predicted by universal kriging.The score maps of the first and third factor axes (Fig. 5a and c) show mainly largescale variability.These axes, as mentioned earlier, mostly represent spatial variation of tree life forms.Contrarily, the score maps of the second and fourth axes (Fig. 5b and d) representing the occurrence of herbaceous and shrub layers, respectively, show small-scale spatial variability.
Examining the pattern of the prediction errors of the scores for each factor axis (Fig. 6), one can infer to which extent sample data and image data contribute to predictions.When the range of the semivariogram is large, as seen in the semivariograms of Factor 1 and 3 (Fig. 5a, c), the prediction errors increase slowly with the distance away from samples.On the other hand, a short range in the variogram results in prediction errors increasing rapidly with distance away from samples, as is the case with Factor 2 and 4. Image data will in this case have greater impact on predictions.Nevertheless, the quality of the factor score maps is not only related to differences between small-scale and large-scale spatial variation but rather reflects the explanatory strength of the relationship between factor axes and image derived variables as shown by the mean error in the score maps.We use these mean errors as indicative of the overall quality of the maps.According to these averages, Factor 1 represents the most accurate map (mean standard deviation of 0.51) followed by Factor 2 (mean s.d.= 0.64), Factor 3 (mean s.d.= 0.69) and Factor 4 (mean s.d.= 0.87).

Spatial distribution of plant communities across the floodplain
In the final part of this procedure, we combined score point data and the four kriged maps generated as described in the former sections to create the final map of plant communities.First, the cluster center of each community was calculated as the average of score values of each factor axis.Then, the resulting seven cluster centers were used to assign each location on the map to a community class.This was done by calculating Euclidean distances between centers and predicted scores values.Each location was then assigned to the community whose center was nearest to the predicted score values at that location.The map of plant communities (Fig. 7a) resulting from this classification method shows the predicted spatial distribution of the seven identified plant communities on the floodplain.Grassland (16% of coverage), Shrubland (30% of coverage) and Monodominant forest (32% of coverage) sum up to 78% of the coverage at the studied site.These communities mostly appear as large and contiguous patches across the site.Alluvial forest and Alluvial low forest, as expected, appear as strips covering exclusively places close to water www.biogeosciences.net/8/667/2011/Biogeosciences, 8, 667-686, 2011 bodies: along rivers, channels and surrounding baías, i.e. temporary or permanent lakes seasonally connected to the river.These two communities cover just 4% (2% each) of the studied floodplain.The greatest portion of the 10% of Open savanna that covers the study area is located towards the Northern boundary.The 8% of Dense savanna is found as small patches generally surrounded by Open savanna and as a big patch beside Monodominant forest.

Evaluating uncertainty
Vegetation mapping using statistical approaches carries different sources of uncertainties related to sampling scheme, interpolation errors, sampling support, data quality, lack of data and others, which may compromise the model's capability of accurately predicting vegetation patterns (Guisan and Zimmerman, 2000;Pfeffer et al., 2003;Miller et al., 2007).The predictive success of our mapping approach was evaluated using cross-validation (Efron and Tibshirani, 1986) and random-simulations (Bourennane et al., 2007), both performed in R.

Cross-validation
We have used cross-validation to investigate the sensitivity of vegetation predictions performed by universal kriging as a result of sampling variability (Pfeffer et al., 2003).Two resampling techniques were applied: leaveone-out cross-validation (LOOCV) and leave-five-out crossvalidation (LFOCV).The first technique is the standard procedure (Efron and Tibshirani, 1986) which consists of omitting one sample at a time from the data set and based on the remaining observed values make predictions at this location using the interpolation technique, i.e., universal kriging.
Because samples/plots within the same transect are considerably closer to other observations than the typical distance between prediction locations and observations locations (Miller et al., 2007), LFOCV was used to test the prediction quality of the model when the whole transect, that is, five plots, is left out to make predictions.Vegetation classes were assigned from the predicted scores and compared with the observed vegetation classes at the 115 sample plots.Overall agreement between predicted and observed classes does not differ substantially between the two resampling techniques: leave-one-out results in 52.2% agreement and leave-five-out in 48.7% agreement.Both techniques show that accuracy in classification varies according to the community type (Fig. 7b and c).Communities which have been observed on a large number of plots and occupy large portions of the vegetation map, such as Monodominant forest and Shrubland, are less sensitive to sampling density than those communities which occur in smaller and few patches, such as Alluvial forest and Alluvial low forest.Consequently, communities observed in few of the plots are wrongly classified also for LOOCV (Fig. 7b).Other possible causes of uncertainty in classification from our mapping approach derives from the similarity between community types having a small distance between cluster centers in the ordination space (Fig. 4).Communities such as Alluvial forest and Dense savanna are frequently predicted to be their neighboring communities, namely, Monodominant forest; and Alluvial low forest are frequently predicted to be Shrubland (Fig. 7b and c).

Simulation
A Monte Carlo approach was applied to examine the uncertainty of our method (Legendre and Legendre, 1998).In this approach, we used the same universal kriging equations, however creating equally likely random realizations (i.e., possible random outcomes or scenarios) of score maps instead of predicted values as was done in the original procedure.This was done by simulating 1000 realizations of score maps for each factor, based on the scores at the observation locations and the fit variograms.These realizations reflect the prediction uncertainty at the prediction locations; all realizations are equally probable.For each realization, we calculated the vegetation pattern, using the same Euclidean distance algorithm applied in the original mapping procedure.This was repeated for all 1000 realizations, resulting in 1000 realizations of vegetation community maps.Two realizations are shown in Fig. 8b, c.From these 1000 realizations, we created a map showing the probability, from 0 to 1, that a certain community is found in a 40 m grid cell (Fig. 8).On this map, a value 1 indicates zero prediction uncertainty.
The result shows that the quality of classification varies spatially, even though the proportion and arrangement of communities observed in the original map is preserved to a great extent.The central zone of a community patch is more likely to be classified than border areas, as shown by the increasing probabilities towards the center of patches of communities (Fig. 8a).This might reflect intrinsic uncertainties in classification of natural ecotones reflected in the overlapping of score values of very close communities in the factor space.The quality of classification also varied between communities.Classification of Dense savanna and Open savanna, for instance, exhibit lower probabilities of being in the correct class as indicated by their more random distribution across the landscape (Fig. 8b and c).Here, sampling configuration and distance between clusters in factor space are an important source of errors.

Flood duration-vegetation relationship
The relationship between vegetation distribution and flooding was assessed by comparing the plant community map with a flood duration map as in direct gradient analysis.The flood duration map (Fig. 9) was created from a digital elevation map and 38 yr of daily recordings of the water level in the River Cuiabá (Fig. 1b) provided by the Brazilian National Water Agency (ANA; http://hidroweb.ana.gov.br).The 40-m resolution digital elevation map was created with universal kriging from 81 GPS elevation measurements at the site and using SRTM DEM as an auxiliary variable (Valeriano and Abdon, 2007).A base station was installed for increased precision of the GPS measurements.Flood duration and flood depth data were also monitored by direct reading of staff gauges for two years (2007)(2008)  them (r > 70%; P ≤ 0.05) indicating the possibility of calculating flooding duration values over the floodplain through the indirect relationship between river water depth and elevation.Flood duration of a cell was calculated by comparing the water level in the river and the topographical elevation of the cell for each day as follows: if the elevation value at a cell was lower than the water level in the river on a certain day, the cell was considered flooded that day.This approach ignores spatial variation in water level associated with downstream gradients in water level, local depressions containing water that is only partially connected with the main river, and surface water fed by groundwater.However, the effect of these processes is relatively small as indicated by additional field sampling with the staff gauges.
The flood duration map (Fig. 9) shows the number of flooded days per year in the study area.Flood duration data extracted from this map were classified into monthly intervals and the distribution of the plant communities found in the vegetation map along this flooding gradient was plotted in Fig. 10. Figure 10 shows that the zonation of plant communities along the floodplain is clearly related to the duration of inundation.Alluvial forest and Dense savanna occur in areas with a flooding duration of less than two months.Monodominant forest, although occupying a high proportion of the highest areas, has the highest occurrence at intermediary flooding conditions, with a flooding duration between two and four months.Open savanna is mostly found where flooding lasts for four to six months per year.Grassland is found under almost the whole range of flooding durations, however with peaks of occurrence in areas with a flooding duration below two months and between four and six months of inundation.Alluvial low forest is mostly situated at locations with a flooding duration between 6-8 months.Shrubland dominates the areas with the highest flooding duration.Above eight months of flood duration, there is no suitable condition for tree species establishment and the landscape is occupied mostly by Shrubland, Open savanna and Grassland.The occurrence of monodominant forest in the last flood duration class might be associated with the coarse representation of spatial variation in flood duration, illustrated in Fig. 9.

Discussion
Wetland mapping at scales suitable for conservation can be limited by the lack of temporally and spatially consistent datasets.The present study identifies an efficient methodological approaches to map and understand spatio-temporal patterns of wetland vegetation of the Pantanal.We provide a framework to map wetland habitats based on the integration of field data and remotely sensed data through geostatistic methods and identify possible causes of vegetation zonation.
The vegetation in our study area shows complex spatial patterns.The use of universal kriging for mapping is valuable because it allows combining remote-sensing information and spatially distributed field observations, taking into account the spatial dependence of the variation not explained by the remote-sensing data.Most other classification methods (e.g., maximum likelihood) do not allow this level of sophistication in using composite spatial information.
The plant communities described in an existing classification (Nunes da Cunha et al., 2006) could be identified as clusters in the ordination space, thanks to the floristic properties included in the analyses that differentiated structurally similar but floristically different plant communities.However, sometimes clusters showed overlap on a number of factor axes and boundaries between clusters were not always accurate.Such overlap probably indicates the existence of gradual changes in vegetation (Brzeziecki et al., 1993), which is not represented in our model with sharp boundaries between vegetation communities.Thus, the vegetation community studied deviates slightly from our crisp plant community model.This had two implications for our analysis.One is the subjective determination of cluster boundaries in the ordination space, particularly in cases where boundaries were not crisp.The other is related to the interpretation of the uncertainty analysis.One of the causes of uncertainty of the mapped vegetation is the uncertainty in the assignment of an interpolated point to a cluster in the ordination space.Overlap of clusters in the ordination space may actually represent transition zones between plant communities, and are related to intrinsic uncertainty in classification (see also, Fortin et al., 2000;Hernandez-Stefanoni and Dupuy, 2007).Potential misclassification in these zones appears as uncertainty on the interpolated crisp map, particular in areas close to mapped boundaries between plant communities.However, even though we recognize that plant community is a spatial concept rather than a well-delimited entity (Austin and Smith, 1989), we believe that the abstract definition of crisp boundaries is needed to interpret space-time vegetation patterns over large areas.
Our analysis of the causes of vegetation zonation on the floodplain indicated that flood duration is an important determinant of plant community distribution in space, influencing spatial transitions between different plant communities.Different mechanisms of tolerance to prolonged flooding evolved by species of a community (Parolin, 2009) might be related to vegetation zonation as it controls the expansion of different sets of plants (Damasceno-Junior et al., 2005).On the other hand, non-linear response of communities to the flood duration gradient, as was the case for Grassland, indicates that interaction with neighboring communities might have an influence on the vegetation distribution of these communities (Fig. 10) (Austin, 2002).
The different techniques used here to evaluate the behavior of our statistical model for mapping vegetation, e.g.crossvalidation and Monte Carlo simulation, allowed us to identify possible causes of misclassification and determine spatial prediction uncertainty (Congalton and Green, 1999;Guisan and Zimmerman, 2000;Pfeffer et al., 2003).
The accuracy levels of the vegetation map derived from the mapping procedure described here and assessed by crossvalidation (accuracy between 49% and 52%) were of the same magnitude as to those found by Pfeffer et al. (2003) in their maps of Alpine vegetation (between 50% to 65%).The uncertainties in vegetation classification that resulted from the sampling density and configuration suggest that the map quality may be improved when samples are collected at a higher density (Guisan and Zimmerman, 2000;Pfeffer et al., 2003;Miller et al., 2007).In geostatistical approaches for vegetation mapping as used in this paper, large distances between the observations directly affect the estimated accuracy of the predictions (Miller et al., 2007).Our study showed that the gap of vegetation information due to large-distance separated sampling points can be filled with information contained in the remote-sensing images.As a result of the sampling scheme used here, the systematic sampling produced an uneven number of samples per community type and possible omission of communities was limited to small and scattered patches over the floodplain, as was the case of Savanna forest (personal observation).Consequently, the level of uncertainty in predictions varied among communities and in space.
Monte Carlo simulation is a relatively standard method to evaluate mapping uncertainty (Steele et al., 1998;Hunsaker et al., 2001;Papadopoulos and Yeung, 2001;Cripps et al., 2008).Cripps et al. (2008) have used an approach similar to the one applied here to quantify uncertainty associated with land cover maps.Those authors provided maps of simulated posterior means and standard deviation of vegetation classes and concluded that similarity in vegetation structure may cause larger uncertainty in the assigned class in the map.This is consistent with our findings.
Generally, DEM is included in predictive vegetation models as an environmental layer (Pfeffer et al., 2003).The SRTM DEM used here provides measurement of the top of the trees with limited information about the level of the topographical surface.The use of SRTM DEM in studies interested in the causal relationship between elevation above sea level and vegetation distribution can be constrained by the lack of precision in elevation data.This was not a matter of concern in our study, because remote-sensing imagery and SRTM DEM were included in the model because of their abilities in detecting spatial patterns of vegetation derived from field sampling.
Uncertainty assessment and its cartographic representation is an important tool for management and research, indicating zones of high and low classification confidence and helping to find strategies for mapping improvement (Chong et al., 2001;Guisan and Zimmerman, 2000;Pfeffer et al., 2003;Scheller and Mladenoff, 2007).Some strategies are available in such a statistical approach to improve vegetation map quality.Increasing the number of samples and sample configuration are the most obvious ways to improve classification accuracy (Guisan and Zimmermann, 2000;Pfeffer et al., 2003;Rempel and Kushneriuk, 2003).There are also other image derived predictors that could have been included in this study, such as digital maps of soil properties (e.g.soil texture) or flooding attributes.In fact, the inclusion of flooding duration as an explanatory variable in the regression models could have helped in predictions, because of the relationship between flooding duration and vegetation zonation.However, the gain in quality of the map would be limited because flooding duration is only known with a relatively high uncertainty (as it depends on complicated flow dynamics on the floodplain).Besides, the comparison between inundation patterns and vegetation distribution using continuous data provided by both the inundation and the vegetation maps, as made here, would be compromised if the vegetation map had been created using inundation data.
We assume in this work that plant communities arise from the combination of patterns of distribution of different life forms and that these patterns repeat themselves in other parts of the Pantanal and present a temporal stability.Because the vegetation patterns captured in our field sampling should repeat themselves in time and in other areas, one should be able to monitor vegetation patterns by just providing image data (RS images and the DEM) to create vegetation maps for different years.However, the assumption of temporal stability in vegetation patterns is constrained by long-term dynamics associated with events (e.g.gradual increase in flooding durations due to climate change or land use change in upstream areas) that can change plant associations and interactions.To deal with this, we would suggest that field sampling is redone at a regular time interval of approximately ten years.

Conclusions
This study provided a framework to map wetland habitats based on the integration of field data and remotely sensed data through geostatistical methods and to identify possible causes of uncertainties in the developed map.We showed that it is possible to classify vegetation at locations in the studied floodplain by measuring structural and floristic attributes of different vegetation layers (herbaceous, tree, shrub and vines), and combining these data with remotely sensed imagery and DEM data.
The study shows that the vegetation community studied deviates slightly from a crisp plant community model resulting in a somewhat subjective determination of cluster boundaries in the ordination space and increasing map uncertainty.www.biogeosciences.net/8/667/2011/Biogeosciences, 8, 667-686, 2011 We conclude that spatial patterns of vegetation distribution in the studied floodplain is to a great extent determined by the spatial patterns in inundation, as stressed by a number of other studies in the Pantanal (Junk, 1989;Nunes da Cunha andJunk, 1999, 2000;Zeilhofer and Schessl, 2000).In addition, patterns in vegetation are determined by spatial dependence in biological processes, such as competition between neighbors and dispersal strategies (Tilman, 1994).
The cartographic representation of classification uncertainty gave useful information on the spatial distribution and sources of uncertainty.The simulation and the crossvalidation results showed that uncertainty in classification varied in space and among communities, partly due to the sampling configuration.To avoid bias in sampling and resulting problems of omission of communities in the mapped area, we suggest the use of stratified random sampling method or stratified systematic unaligned sampling in future studies (Lo and Watson, 1998).The inclusion of other remotely sensed images in the model as a strategy for map improvement needs to be taken into account in future studies also.
The significant advantage of the mapping approach described in this paper is that detailed biological information from field observations can be integrated with high spatial resolution remotely sensed data producing accurate maps.Unlike "classical" approaches to vegetation class mapping, our modeling carries quantitative information on vegetation variability and can be used to map vegetation over large areas.We believe that mapping of plant communities by integrating field observations and high-resolution imagery using geostatistics is a promising approach for conservation assessment and long-term ecological monitoring in extensive wetland areas.

Fig. 1 .
Fig. 1.Study site.(A) Natural Reserve SESC Pantanal located at the Pantanal Mato-grossense, Mato Grosso; Brazil; (B) mean annual water depth fluctuation of the River Cuiabá (1963-2000) and mean precipitation near Cuiabá, northern Pantanal.Rainfall data from INMET (National Institute of Meteorology of Brazil), river level data from DNAEE (National Department of Waters and Electric Energy of Brazil); (C) four meter resolution multispectral IKONOS-2 image, of the study site acquired in October 2003, true color.White circles are the sampling locations; (D) 90-m Resolution SRTM (NASA Shuttle Radar Topographic Mission, http://www2.jpl.nasa.gov/srtm/)Digital Elevation Model of the study area.

Fig. 2 .
Fig. 2. Flow diagram describing the procedural steps in the analysis of the data.
Fig.3.Sampling scheme modified from the RAPELD method (seeMagnusson et al., 2005).(A) 23 transects (sampling trail) are regularly spaced over the site; (B) each transect is placed at the same elevation level and divided in five sampling transect each of 50 m length.(C) Sampling along the transect.Herbaceous species: point samples at regular interval along the transect centre line.Shrubs, large-sized trees and medium-sized trees: exhaustive sampling in the indicated zone.The tree size category is based on diameter of the trunk at breast height (DBH).
× 40 m).The disaggregation of the 90 m cells into 40 × 50 m cells was done by overlaying the desired 40 × 50 m cells over the 90 m cells and creating intersects.From these a weighted average was calculated to determine the value for the 40 × 50 m cells.A similar procedure was used to aggregate the 4 m cells of IKONOS-2 images into 40 × 50 m cells.The procedure for extraction of the averaged values from the images consists of: (1) delineating the irregular plot boundaries in the IKONOS-2 image using ARC/INFO GIS software (version 9.0; ESRI, 2006);

Fig. 4 .
Fig. 4.Factor analysis biplots of the axes 1 to 4 on vegetation variables obtained from 115 sampling locations.Seven clusters (symbols) represent the vegetation communities found in the studied floodplain.Factor 1 describes the gradient of tree biomass found in the study site; Factor 2 of herb cover and richness; Factor 3 of tree dominant species; and Factor 4 of shrub biomass.
in the statistical environment R-2.7.2 (R Development Core Team, 2009).The function autofitVariogram automatically selects the variogram model and parameters that best match the observed sample variogram.The function iterates over the variogram models (spherical, Gaussian, Màtern, and exponential) and selects the model and model parameters that result in the smallest residual sum of squares with the sample variogram.

Equation R 2 F1Fig. 5 .
Fig. 5. Maps of the kriged estimates of factor scores and the semi-variograms of the residuals of the regression between factor axes and remotely sensed and ancillary data; fitted variogram models: mat: matheron family, exp: exponential.Values between brackets are nugget effect, structured variance and variogram range, respectively.(A) Factor 1; (B) Factor 2; (C) Factor 3; (D) Factor 4.

Fig. 7 .
Fig. 7. (A) Predicted distribution of the plant communities identified at the study site.(B) Results of leave-one-sample out crossvalidation.Percentage of predicted classes at sampling locations.Each bar shows the results for sampling locations with a certain observed class (indicated by the colors at the bottom of the C panel).(C) Idem, leave-five-out cross-validation.N = 115.

Fig. 8 .
Fig. 8. Results of Monte Carlo simulation, using 1000 random simulations.(A) The colors on the map indicate the vegetation class with the highest probability of occurrence at a cell.A color gradient is used to show the value of this highest probability; (B) maps of two single random realizations, color scale is identical to Fig. 8.
Fig. 9. (A) Map with average number of days flooded per year at the study site, calculated over the period 1969-2007; (B) relationship between flood duration (days yr −1 ) and elevation of the soil surface (m a.s.l.) observed at the 23 study transects; (C) water level fluctuation in the River Cuiabá between 1969 and 2007.Vertical dotted lines indicate the occurrence of drier and wetter years.

Fig. 10 .
Fig. 10.Fraction of occupied sites by the seven identified communities along the flood duration gradient.Flooding gradient is divided in five flood classes representing number of flooded months.
, respectively.ρ is woody density, d is diameter at breast height, H is species height, Cb is circumference at ground height (cm), n is the number of points touched by a species, and N t is the total number of points in a sample (25 points).

Table 2 .
Summary statistics for the factor analysis.Numbers in bold highlight the highest correlation with the factor axis. Sapium obovatum and Ruprechtia brachycepala generates the lowest scores on Factor 4.
may be a small variation in coverage of herbaceous species between these communities.The low tree biomass in Shrubland is responsible for its positive scores on the first factor.Alluvial low forest is distinguished from other forests based on Factor 4. Its high shrub coverage compared to Shrub-land and the dominance of

Table 3 .
Structural and floristic characteristics of plant communities, given as mean and standard deviation.

Table 4 .
Pearson's correlation coefficients between factor axes and image variables: four spectral bands, Normalized Difference Vegetation Index (NDVI), Principal Component transformation to the IKONOS-2 image (PC), and canopy topography derived from DEM-SRTM (DEM).