Technical Note: Calibration and Validation of Geophysical Observation Models

We present a method to calibrate and validate observational models that interrelate remotely sensed energy fluxes to geophysical variables of land and water surfaces. Coincident sets of remote sensing observation of visible and microwave radiations and geophysical data are assembled and subdivided into calibration (Cal) and validation (Val) data sets. Each Cal/Val pair is used to derive the coefficients (from the Cal set) and the accuracy (from the Val set) of the observation model. Combining the results from all Cal/Val pairs provides probability distributions of the model coefficients and model errors. The method is generic and demonstrated using comprehensive matchup sets from two very different disciplines: soil moisture and water quality. The results demonstrate that the method provides robust model coefficients and quantitative measure of the model uncertainty. This approach can be adopted for the calibration/validation of satellite products of land and water surfaces, and the resulting uncertainty can be used as input to data assimilation schemes.


Introduction
Observation models are widely used for estimating geophysical variables of land and water surfaces from remote sensing data.The simplest form is the empirical linear model, whereby coefficients are derived from regressing measured geophysical variables with observed radiation.In most cases, these empirical models have some physical meaning and are often used because of their simplicity.Examples of land remote sensing applications are available from active/passive microwave remote sensing of soil moisture (e.g.Njoku et al., 2002).Similarly, water quality applications make use of the Lambert-Beer law to model the spectral absorption of light by suspended and dissolved materials as a linear function of their concentrations (D'Sa and Miller, 2005;Robinson, 2004;Salama et al., 2004).Currently, such strategies are proposed for NASA's Soil Moisture Active Passive (SMAP) mission combined radar/radiometer soil moisture product (Entekhabi et al., 2010), the Netherlands' automated monitoring network (IN PLACE: Integrated Network for Production and Loss Assessment in the Coastal Environment), and the NASA Moderate Resolution Imaging Spectroradiometer (MODIS) mission ocean colour products (McClain et al., 2004).This type of model is developed from comprehensive sets of concurrent remote sensing observations and field measurements, hereafter referred to as matchups.Ideally, the validity of any model is tested against an independent data set.Therefore, the available matchups are often subdivided into independent sets used for derivation of the model coefficients (calibration) and for accuracy assessment (validation).Most studies subdivide matchups into so-called calibration/validation (Cal/Val) sets based on a statistical or regional resemblance (Devlin et al., 2008), but without a clear directive on its effect for model accuracy.This is most likely the case because there has been until now no objective approach for subdividing Cal/Val sets.Many combinations of matchups can be used, specifically when using a large number of points.Each Cal/Val pair has the same probability of occurrence, but provides different results.As such, the selection procedure not only impacts the model's accuracy, but also the accuracy assessment.On the other hand, the selection of Cal/Val pairs can also be thought of as a stochastic sampling from a known probability distribution (e.g.Wang et al., 2005;Salama and Stein, 2009).Such stochastic treatment of matchups within the Cal/Val context has not yet been investigated in the field of Earth observation, but has the advantage of providing a quantitative uncertainty measure for both the model coefficients and derived geophysical variables.
In this paper we follow a stochastic approach for selecting Cal/Val sets and demonstrate its use for quantifying uncertainty.The proposed approach combines the bootstrapping method of Efron and Tibshirani (1993) with the Jackknife technique (which leaves out one, or more, observation) and adapts the sample size at each iteration.Bootstrapping and Jackknife methods are usually used to provide the standard error of the derived "plug in" estimates (Efron and Tibshirani, 1993) and have been employed for validating observation models (e.g.Petus et al., 2010;Melin, 2010;Salama and Su, 2010).However, the combination of bootstrapping without replacement with Jackknife sampling and changing the sample size at each iteration is novel and provides not only the accuracy of regressed estimates, but also the underlying probability distribution of regressed estimates and their errors.
The developed method samples from a complete matchup set to populate many sets of Cal/Val pairs.Each pair is used to derive the model coefficients and their associated errors, from which the probability distributions of the calibration and validation result is determined.In this paper the method is demonstrated for two data sets: (i) L-band (1.6 GHz) backscatter (σ • ) -soil moisture matchups collected during the 2002 OPE 3 (Optimizing Production Inputs for Economic and Environmental Enhancement) campaign (Joseph et al., 2010a,b), and (ii) matchups of chlorophyll a concentrations and derived absorption coefficients obtained from the NASA bio-Optical Marine Algorithm Data (NO-MAD, version 2a.) (Werdell and Bailey, 2005).

Land application -soil moisture
The 2002 OPE 3 campaign focused on the active and passive microwave remote sensing of soil moisture throughout the corn growth cycle.Part of the field activities consisted of weekly C-(4.75 GHz) and L-(1.6 GHz) band σ • measurements with the NASA/George Washington University (GWU) truck-mounted scatterometer.Further in support of these remote sensing observations, an extensive ground sampling was conducted that included soil moisture.Full details on the data sets collected during the field campaign can be found in Joseph et al. (2010a,b).Here, we only make use of the 75 matchups between the L-band HH polarized σ • observed from a 35 • view angle and the measured soil moisture, hereafter referred to as the OPE 3 matchups.The σ • observations are corrected for vegetation effects through application of method described in Joseph et al. (2008), which results in the σ • representative for a bare soil surface.Many studies (e.g.Ulaby et al., 1984;Champaign and Faivre, 1997;Njoku et al., 2002) have demonstrated the following linear relationship between soil moisture and σ • observed under the same land cover conditions: where sm is the soil moisture content (m 3 m −3 ), a is the slope (m 3 m −3 dB −1 ) representing the σ • sensitivity to soil moisture, and b is the offset (m 3 m −3 ) accounting for the baseline effects, such as surface roughness, topography, and land cover.Both the σ • sensitivity to soil moisture and the baseline effects depend on the sensing configuration (e.g.wavelength, polarization, view angle) as well as the land surface (e.g.surface roughness, land cover, topography).

Water application -chlorophyll a absorption
The NASA bio-Optical Marine Algorithm Data (NO-MAD, version 2a.) set includes measurements of spectral remote-sensing reflectances, spectral marine absorption and backscattering coefficients, and concentrations of water constituents (Werdell and Bailey, 2005).Here, we use only chlorophyll a (chl a) measurements derived from high performance liquid chromatography (HPLC).The observed radiance spectra and matching HPLC-derived chl a concentration consist of 424 matches, hereafter referred to as the NOMAD matchups.The general practice is to derive the absorption coefficients from the observed radiance spectra using semi-analytical inversion models (e.g.Van Der Woerd and Pasterkamp, 2008;Maritorena et al., 2002).Lambert-Beer law is then employed to estimate the absorption per unit mass from derived absorption coefficients and measured concentrations.
The chl a absorption coefficients at the blue band (λ 0 = 440 nm) are derived from the observed radiances using the cross entropy method as reported in Salama and Shen (2010).Following the Lambert-Beer law, the absorption coefficient of chl a is described as a linear function of the concentration (D'Sa and Miller, 2005, Eq. 10): where a chl a (λ 0 ) is the absorption coefficient of chl a (m −1 ) at the wavelength λ 0 (nm); a * chl a (λ 0 ) is the specific absorption coefficient that describes the absorption per unit weight (m 2 mg −1 ); C chl a is the concentration (mg m −3 ); δ(λ 0 ) is an offset related to sensor noise, retrieval error of a chl a (λ 0 ), (m −1 ) and the ratio of accessory pigments that are produced in different conditions of growth (nutrients and irradiance), e.g."xanthine" that acts as sun protection.
The two unknowns in Eq. ( 2), a * chl a (λ 0 ) and δ(λ 0 ), are estimated from regressing a chl a (λ 0 ) versus C chl a using linearregression model.In practice, Eq. ( 2) could deviate from linearity depending on the packaging effect, cell sizes, physiology and species composition of the phytoplankton community (Bricaud et al., 1995).For example, the effect of packaging on the variability of a * chl a (λ 0 ) is smaller in open oligotrophic oceans than in upwelling regions or coastal areas where larger phytoplankton cells are abundant.Hence, the deviation of Eq. ( 2) from linearity can then be understood based on the water body investigated.The linearity of Eq. ( 2) for the used data sets is justified in Sect.4, Fig. 1b.

Method
The method randomly subdivides the data into many sets (or Jackknife samples) of Cal/Val pairs.The Cal set is used to derive the coefficients of the observation model, whereas the Val set is employed to check the accuracy of the model.The results are probability distributions of model coefficients and their prediction uncertainties.
The Cal/Val sets are derived from the n available matchups following two rules: (i) both Cal and Val sets must contain at least 7 samples (k min = 7), and (ii) each sample is used once, either for calibration or for validation (i.e.sampling without replacement).The minimum sample size, (k min = 7), is selected according to the method of Cohen et al. (2003), to achieve about 35 % error in the derived slope at 95 % of confidence.This value, 35 %, corresponds to the desired level of accuracy for satellite-derived Chl a products (McClain et al., 2006;Bailey and Werdell, 2006).
The number of Cal/Val pairs is computed as nr = n − 2k min + 1.Now, for each i = [k min , n − k min ], the method forms a Cal/Val pair by increasing/decreasing the number of data points in the sets (forming the Jackknife sample).The number of possible combinations, npc i , for the i-th Cal/Val pair is where n is the total number of data points, k i is the number of samples in the Cal or Val set during the i-th iteration.For data sets with n > 20 (holds for both the OPE 3 and NOMAD matchups), the number of possible combinations (npc i ) is large (e.g.1.9848 E9 for 75 over 7 in OPE 3 ), and therefore npc i is reduced to the number of used combinations, nuc i , by bootstrapping nuc i = 10 log npc i combinations from npc i .In principle, each combination nuc i has the same probability of occurrence; therefore, the uniform distribution is used to select nuc i unique combinations from npc i (bootstrap method of Efron and Tibshirani, 1993).Each formed Cal/Val set is used for the calibration and the subsequent validation of the empirical model.The validation is always performed using type-II model (Bevington and Robinson, 2003), while the calibration depends on the model, e.g. for linear model we use the type-I regression.The accuracy of the empirical model is assessed using two statistical measures: (i) the mean absolute error between derived and measured values (MAE) and (ii) the determination coefficient (R 2 ).The algorithm produces three probability distributions (PDs): two for the calibration coefficients, PD c , and one for the accuracy measure, PD v .The above method is implemented in the following model, called GeoCalVal: 1. Take k i samples for the Cal set and n−k i for the Val set; 2. Compute npc i from Eq. 3 and nuc i = 10 log npc i ; 3. Use the uniform distribution to generate nuc i unique combinations of Cal sets and their complements for Val sets; 4. Compute model coefficients from the calibration set and store them in PD c ; 5. Use the new model coefficient to estimate the geophysical variables from the Val set; 6. Compute the uncertainty of step 5 and store them in PD v ; 7. Increase k i by one and repeat steps 1 to 7.

Optimal Cal/Val pairs
The determination coefficients, R 2 , of the Cal set are plotted against those of the Val set in Fig. 1 for all possible combinations.The data point position with respect to the x-axis is an indication for the ability of the model to fit the matchups of the Cal set, whereas its position with respect to the y-axis represents the model's performance in deriving the geophysical variables, here soil moisture and Chl a absorption coefficient (a chl a (λ 0 )).Obviously, both the Cal and Val R 2 depend on the number of data points, reaching their maxima when all data points are included, which suggests for the Cal sets that the used observation models in Eqs. ( 1) and ( 2) are indeed linear.
Both OPE 3 and NOMAD matchups produce a narrow region of Cal/Val pairs, for which the calibration R 2 is similar to validation R 2 , about 0.75-0.85(light-grey coloured data points in Fig. 1).In other words, within these Cal/Val subdivisions the model validity and the accuracy assessment are balanced.This region defines the optimal setups for subdividing matchups into Cal/Val sets.The underlying mechanisms of the data points in this region are investigated further.We found that the optimal Cal/Val sets are obtained when the arithmetic mean, µ, and dispersion, σ , of each set are equal to those of the original data set.As such, the optimal Cal/Val pair satisfies the following condition: where the subscripts "cal", "val" and "data" are for the calibration, validation and original data sets, respectively.Equation ( 4) and the equal R 2 for both Cal and Val sets (light-grey coloured points in Fig. 1) are the criteria that should be used to determine the sample size of the optimal Cal/Val set.

The underlying distribution
Figure 2 shows the derived probability distributions (PDs) of model coefficients, PD c , and the associated uncertainties, PD v , for the two matchup sets, OPE 3 and NOMAD.The resulting PDs of model coefficients have high kurtosis (acute peak around the mean) values and flat tails, i.e. more prone to outliers.Different values of k min were tested (not shown), and the results show that all derived PDs from both data sets (OPE 3 and NOMAD matchups) can be described by the t-location-scale probability distribution (the black lines in Fig. 2) of the form (Evans et al., 1993) where µ, σ and ν are the mean, standard deviation and shape factor (or the degree of freedom), respectively.The gamma function is equivalent to the factorial function n! extended to non-integral arguments.The distribution in Eq. ( 5) means that the standard variates of the data points follow the Student t distribution.The function in Eq. ( 5) is fitted to the distributions of derived model coefficients and MAEs by varying the parameters µ, σ and ν, which are listed in Table 1 with their standard errors.(a, b, d, e) and associated uncertainties (c, f) for the OPE 3 data (upper panels) and NOMAD matchups (lower panels).The solid lines are the fits by Eq. ( 5) with coefficients given in Table 1.
The reason for having flat tails in the PDs of Fig. 2 is due to the fact that the accuracy of model coefficients depends on the size of the Cal set.In other words, for a large Cal set we expect to have higher accuracy as most data points are used; however, this makes them also sensitive to outliers in the Val set, because most of the data points have been used to create the Cal set.For a linear observation model, the t-probability density function should, thus, be employed to describe the distributions PD c and PD v , regardless of the original distribution of geophysical measurements or remote sensing observations.For example, the NOMAD matchups set has a log-normal distribution, while OPE 3 is close to uniform distribution (not shown here) for measurements, residuals and observations.Yet, the distribution of derived coefficients follows, for both data sets, Eq. ( 5).This is basically a confirmation of previous statistical studies, for example Singh (1988) showed that the normality distribution is not always a valid assumption for linear models, and the t-distribution is broader and therefore better suited.In this regard, having the result of our sampling scheme reproducing the t-distribution is another validation of the correctness of the GeoCalVal.The proposed method reveals the shape of the underlying probability distribution without any a priori assumption on its parameters (e.g.degree of freedom).For non-linear models there is no straightforward theoretical approximation of the expected probability distribution.If we would follow the theory, we would have no means to justify our assumption on the underlying probability distribution and its parameters.The only objective approach is by evaluating all possible combination sets as is proposed through the GeoCalVal method.

Effect of sample size
Fixing the number of sampling points will result in PD with lower kurtosis, i.e. the PD will be less peaked.That means adapting the sample size will increase the accuracy of the derived parameters (slope and intercept in this case), as the dispersion will also be reduced.The importance of adapting the size of the sample is related to the common practice in calibration and validation of Earth observation products.Here, we search for the optimal division (thus, sample size) of the Cal/Val sets, such that the Cal set produces EO-model coefficients that enable generating EO products (estimated from the Val set) with an accuracy satisfying the mission requirements.Hence, one of the statistical questions addressed within our manuscript is what are the criteria to define the optimal sample size needed for calibrating observation models so that it produces EO products within the designed mission accuracy and within the accuracy of the calibration itself?For example, we can condition the iterative scheme in Geo-CalVal to stop when the criteria defined in Sect.4.1 are met (these are, Eq. 4 and coloured points in Fig. 1).This will however be at the cost of losing information on the probability distributions of regression coefficients, their errors and the shape of the underlying distributions.This information can only be derived if we change the sample size and study its effect on the accuracy of calibration and validation (as shown in Fig. 1).Thus, GeoCalVal makes use of the proposed sampling scheme, because only through this approach we can, in an objective manner, identify the probability distributions of coefficients and associated uncertainties of observation models through optimal divisions of the data set into Cal and Val pairs.

Application of the GeoCalVal model
The detailed knowledge on the PDs of uncertainties and uncertainty sources embedded within the remote sensed geophysical variable (shown in Fig. 2) can be used as input for data assimilation schemes (Reichle, 2008).On the other hand, these PDs can also be employed to derive the probability distribution of uncertainty within the remote sensing observations itself, i.e. one PD per observation.The relationship between measurements and observations is described by a model of the form, Y = f ( , X), in which is the set of n model coefficients, = [φ 1 , φ 2 ...φ n ], X is the set of m geophysical measurements (with m > n) and Y is the corresponding remote sensing observations.Assuming that the fluctuations in the measured quantity, X, and derived model coefficients, , are uncorrelated, we approximate the second moment using the truncated Taylor series expansion: where w is the partial derivative of Y with respect to the measurements X and each model coefficient, φ i .The terms, σ 2 , are the corresponding variances.For the linear model Y = a × X + b, the uncertainty in Eq. ( 6) becomes σ 2 y = a 2 σ 2 x + x 2 σ 2 a + σ 2 b .The coefficient a and the uncertainties terms σ 2 a and σ 2 b are quantified from the derived probability distributions of model coefficients, PD c .Measurement uncertainty, σ 2 x , is either assumed (e.g.NOMAD matchups) or estimated from available measurements (e.g.OPE 3 ).In the NOMAD data set, the concentrations of Chl a were estimated using high-performance liquid chromatography (HPLC) method.Many studies (Claustre et al., 2004;Hooker et al., 2005) found that the error in HPLC estimation of Chl a, on average, varies between 7 % and 25 %.On the other hand, each observation site in the OPE 3 data set contains 21 soil moisture measurements.The standard deviation of these measurements, per observation, can be used as a proxy for σ x .Estimated values of σ x , a, σ a and σ b form the inputs to Eq. ( 6) to produce the PD, quantifying the uncertainty of each remote sensing observation.This results in a PD of uncertainty per data point that has nr i nuc i number of samples, i.e. number of all used combinations.It should, however, be noted that this uncertainty should not be confused with observation errors associated with remote sensing retrievals, which included also other components (e.g.model goodness-of-fit and inversion uncertainty).

Conclusions
In this paper we present the GeoCalVal model for an objective selection of calibration/validation data sets to assess the performance of observation model for geophysical variables.GeoCalVal combines two traditional re-sampling methods (bootstrapping and Jackknife) and adapts the sample size at each iteration.This combination of bootstrapping with Jackknife sampling and changing the sample size at each test iteration is novel and provides not only the accuracy of regressed estimates and associated errors but also their underlaying distributions.The GeoCalVal tests all probable combinations of Cal/Val setup and considers the effect of changing the sample size on the accuracy of regressed estimates.The end results are probability distributions of model coefficients (calibration) and uncertainties in the estimates (validation).
GeoCalVal is applied to two matchups sets, which shows that -GeoCalVal provides an optimal setup for subdividing matchups into Cal/Val sets; -the coefficients and associated uncertainties of linear observation models follow the t-location scale distribution, i.e. the distribution of their standard variate follows the Student t distribution; -the derived PDs provide complete information on the variations of model coefficients, their uncertainties and the accuracy of observations, which can be employed in time series analyses and data assimilation schemes; -the optimal Cal/Val sets are obtained when the arithmetic mean and dispersion of the Cal/Val sets are equal to those of the original data set; -the presented method is applicable to any data set and can be adjusted to any observation model regardless of the application area, e.g.water quality or surface hydrology.

Fig. 1 .
Fig. 1.Determination coefficient, R 2 , between measured and observed values of (a) soil moisture and (b) chl a absorption coefficient.The solid line is the 1 : 1 reference line.Light-grey coloured points represent the optimal Cal/Val pairs.

Fig. 2 .
Fig. 2. Derived probability distributions of model coefficients(a, b, d, e) and associated uncertainties (c, f) for the OPE 3 data (upper panels) and NOMAD matchups (lower panels).The solid lines are the fits by Eq. (5) with coefficients given in Table1.

Table 1 .
Estimated parameters of the best fit t-location-scale distribution to model coefficients and MAE uncertainties.The degree of fit is expressed in standard error.