Assimilation of passive microwave vegetation optical depth in LDAS-Monde: a case study over the continental US

. The land data assimilation system, LDAS-Monde, developed by the Research Department of the French Meteorological service (Centre National de Recherches Météorologiques - CNRM) is capable of well representing Land Surface Variables (LSVs) from regional to global scales. It jointly assimilates satellite-derived observations of leaf area index (LAI) and surface soil moisture (SSM) into the Interactions between Soil Biosphere and Atmosphere (ISBA) land surface model (LSM), increasing the accuracy of the model simulations and forecasts of the LSVs. The assimilation of vegetation variables directly 5 impacts RZSM through seven control variables consisting in soil moisture of seven soil layers from the soil surface to 1 m depth. This capability is particularly useful in dry conditions, where SSM and RZSM are decoupled to a large extent. However, this positive impact does not reach its full potential due to the low temporal availability of optical-based LAI observations, at best, every ten days, and can suffer from months of no data over regions and seasons with heavy cloud cover such as winter or monsoon conditions. In that context, this study investigates the assimilation of low frequency passive microwave vegetation 10 optical depth (VOD), available in almost all weather conditions, as a proxy of LAI. The Vegetation Optical Depth Climate Archive (VODCA) dataset provides near-daily observations of vegetation conditions, far more frequently than optical based product such as LAI. This study’s goal is to convert the more frequent X-band VOD observations into proxy-LAI observations through linear re-scaling and to assimilate them in place of direct LAI observations. Seven assimilation experiments were run from 2003 to 2018 over the contiguous United States (CONUS), with 1) no assimilation, the assimilation of 2) SSM, 3) LAI, 4) 15 re-scaled VODX, 5) re-scaled VODX only when LAI observations available, 6) LAI + SSM, and 7) re-scaled VODX + SSM. This study analyzes these assimilation experiments by comparing to satellite derived observations and in situ measurements and is focused on the variables of LAI, SSM, gross primary production (GPP), and evapotranspiration (ET). Each experiment is driven by atmospheric forcing reanalysis from the European Centre for Medium-Range Weather Forecasts (ECMWF) ERA5. Results showed improved representation of GPP and ET by assimilating re-scaled VOD in place of LAI. Additionally, the joint 20 assimilation of vegetation related variables (i.e. LAI or re-scaled VOD) and SSM demonstrates a small improvement in the representation of soil moisture over the assimilation of any dataset by itself. of every and 2 days. to the LAI and VOD, to VOD into a proxy of LAI and assimilate as LAI in LDAS-Monde. time series, as well as what was in Albergel et al. (2018a), LAI and VOD observations are correlated and this relationship enables us to match the VOD to LAI observations and use the resulting product to assimilate in place of LAI in the model. As a ﬁnal step, a 90-day rolling average is applied after the re-scaling to smooth the results, and allow for better performance of the assimilated data. This study uses the X-band of the newly created Vegetation Optical Depth Climate Archive (VODCA) (Moesinger et al., 2020). VODCA is a synthesis of various satellite sensors since 1987, and uses the Land Parameter Retrieval Model (LPRM) V6, which simultaneously retrieves and calculates soil moisture and VOD from horizontally and vertically polarized microwave observations (Mo et al., 1982; Meesters et al., et al., van der Schalie al., 2017). The dataset is comprised from the AMSR-E, AMSR-2, SSM/I, TMI, and WindSat sensors, and separates the syntheses into C, X, and Ku-Band VOD retrievals. Ku-band VOD from VODCA did not encompass the entire period of interest in this study, stopping in 2017. While both the C and X-band VOD may suitably represent vegetation over a wider array of land cover, X-band VOD was ultimately 210 chosen to be assimilated in this study as Kumar et al. (2020) found large improvements in ET estimations from X-band VOD assimilation relative to C-band VOD. Because the TMI sensor aboard the Tropical Rainfall Measuring Mission (TRMM) is in a 35 ◦ inclination orbit, and thus does not encompass the entirety of the CONUS domain, and because in the X-Band of VODCA, TMI is the only sensor between 1998 and late 2002, the year 2003 was chosen as a starting point in order to have full domain observations.

. Because VOD is a large wavelength microwave product, observations of it are nearly all-weather, able to pass through cloud cover almost unaffected. This allows for far more frequent VOD observations compared to optical LAI observations, which is demonstrated in Figure 1. For the same 2003-2018 period, LAI observations from CGLS are outnumbered by VOD observations from VODCA by approximately a factor of 6. 185 VOD is comprised of attenuation from several components, primarily standing vegetation (which itself is composed of green leaf biomass, green structural biomass such as stems, and woody biomass such as trunks and branches), necromass (primarily litter), and water interception from rain or dew. Recently, VOD has been more closely examined in regards to interacting effects of vegetation dynamics. A deeper look into L-band VOD by Konings et al. (2016) revealed that it is proportional to total vegetation water content and tweaks their retrieval algorithm to more accurately account for vegetation effects on soil moisture 190 observations. Figure 2 shows the time series response of CGLS LAI (green, solid), VODCA VODX (red, dashed), and VODCA VODC (blue, dotted) near Lincoln, Nebraska from 2003-2018. This pixel is composed primarily of C3 and C4 crops. LAI observations have a far more predictable and seasonal pattern. X-band VOD also is a stronger signal compared to C-band. The peaks are relatively close in timing in this case, but can also be offset due to the difference in peak vegetation water content. While this 195 figure demonstrates that there is a correlation between LAI and VOD, it also shows that one cannot be substituted for the other.
As was done in Kumar et al. (2020), VOD observations are linearly re-scaled to match observed LAI over the same period, in this case from the CGLS LAI dataset. A 3-month linear re-rescaling window was selected to best match the VODCA VOD to CGLS LAI over seasonal timescales. Re-scaling is required because the ISBA LSM cannot simulate VOD directly, and thus we cannot assimilate VOD data directly into the model. As shown in the Figure 2 (Diamond et al., 2013;Bell et al., 2013). The network contains 114 stations in the contiguous U.S. and provides high quality, long-term temperature, precipitation, solar radiation, wind speed, humidity, soil moisture, and soil temperature observations. This study uses the soil temperature and soil moisture observations, which are provided sub-hourly.

270
At each site, USCRN places three plots of probe units at five different depths, 5, 10, 20, 50, and 100cm. The soil moisture probe measures the dielectric permittivity of the soil by observing reflected EM waves at 50MHz, which is then converted to volumetic soil moisture (m 3 m -3 ) via a calibration equation. Sensor calibration is also performed annually. A thermistor is also placed alongside the soil moisture sensor at all plots and depths. An average at each depth is calculated from the three plots every 5 minutes and output data are typically publicly available within an hour of the reading. Figure 3 shows the locations of 275 the USCRN in situ observations. Four sensor depths are selected and are matched to ISBA soil layers, 5cm (WG3), 20cm (WG_20), 50cm (WG6), and 100cm (WG8). This comparison uses the ISBA soil layers to directly compare against the point measurements of the USCRN, but it is important to keep in mind that WG3, WG_20, WG6, and WG8 are layers of soil. WG3 is from 5cm to 10cm, WG_20 is a weighted average of WG4 and WG5 (as performed in Mucia et al. (2020)) representing 10cm to 40cm, WG6 is a layer from 280 40cm to 60cm, and WG8 is the 80cm to 100cm layer.
This study compares USCRN data to LDAS-Monde soil moisture between the years of 2011 and 2018. While the network was operational as early as 2005, 2011 was selected as the start of the comparison in order to maximize the number of stations, and homogenize the results of comparisons between stations. The station observations are processed and filtered, with the most notable filters including the removal of any data with corresponding soil temperature observations at or below 4 • C. Stations 285 with less than 100 days of observations are also removed, as the scores proved too variable. Finally, only correlation scores with associated p-values less than or equal to 0.05 are retained.

Experimental Setup and Assessment
The experiments performed and reported in this study occur over the Contiguous United States ( Table 2 provides the experiment names used throughout to reference the assimilation setups, and briefly describes what data are assimilated for each one. Besides the OL, SEKF LAI, SEKF VODX, SEKF SSM, SEKF LAI SSM, and SEKF VODX SSM, several experiments were run at the same time, but using modified observations of VODX. The 295 SEKF VODX10 uses VODX observations from VODCA as before, but has filtered those observations to coincide only where and when LAI observations from CGLS exist. This is used to test whether the changes produced between SEKF LAI and SEKF VODX are truly from the more frequent assimilation, or from the quantifiable differences between matched VODX and LAI.
If the SEKF VODX10 results are closely resembling SEKF LAI results, but SEKF VODX are far different, this indicates the frequency of assimilated observations is the primary cause of those differences.

300
The primary statistical score used in this study is the Pearson's correlation coefficient (R). The average correlation as well as distribution of correlations can allow the quick assessment of improvement or degradation, and are consistent with previous studies of LSMs and LSVs. In addition to the correlation, a normalized information contribution (NIC) is calculated for R as shown in Equations 1. This NICR, following Kumar et al. (2009), is normalized and thus allow for inter-comparison while accounting for differences between variables and regions.

305
When analyzing the statistical scores of the USCRN, several conditions are applied. First, frozen soil conditions are avoided by only calculating scores based from observations when temperature measurements are above 4 • C. As ISBA separately calculates frozen and liquid soil moisture, when conditions are close freezing, there can be significant errors. Second, only stations with more than 100 observations (at the respective depths) are calculated for a sufficient number of data points. Finally, p-values are calculated along side the correlations, and stations without p-values of significance, as defined by p-value > 0.05, 310 are screened out.
Bootstrapping is also used to calculate confidence intervals and thus determine statistical significance between different experiments. Essentially, bootstrapping is the repeated removal of random points in the dataset, and recalculation of the desired score or variable. This study uses a constant 10,000 repeats to calculate the confidence intervals in order to generate a sufficiently large number of samples. In this study, bootstrapping is applied to the statistics versus the USCRN network. For several analyses, probability distribution functions (PDFs) are estimated from the distribution of correlation scores of individual gridcells. These PDFs are derived using a Gaussian kernel density estimation, with "Scott's Rule" calculating the appropriate smoothing bandwidth. These PDFs give a far smoother and readable estimation of correlations when compared to simple histograms. These PDFs are used in order to better visualize and compare the distribution of scores from several experiments at once, specifically to highlight smaller differences between experiments, which may not be easily seen in histograms. In the supplement, Fig. S2a provides an example of a typical histogram along with b,the accompanying PDF for the distribution of LAI correlations for different experiments over CONUS, demonstrating that they are in fact representing the same distribution.

325
Before assimilating VOD observations, the X-band VOD (refered to as VODX from now on) data was compared against LAI observations, as well as LAI from the ISBA OL to determine their respective relationships. Additionally, VODX and LAI observations were analyzed over individual patch types, where more than 50% of the patch represents a single vegetation type.
These analyses provide more information regarding the strength of the VODX-LAI relationship over different vegetation types.  Table 4 displays the each experiment and depth, and the number of stations that were degraded (red), neutral (black), and improved (green). This approach avoids averaging scores, while still providing a performance overview of the whole domain.
In a similar manner, Fig. 9 displays the PDF of the distribution of differences in correlation for each of the four depths compared to the OL, and looks at the responses of SEKF SSM, SEKF LAI, SEKF VODX, SEKF LAI SSM, and SEKF VODX 455 SSM experiments.
In regards to the impact of assimilating re-scaled VOD in lieu of LAI, the NICR scores to the in situ observations provide evidence that SEKF VODX and SEKF LAI are similar and comparable. The SEKF VODX increases the number of improved stations over SEKF LAI at all depths, while also keeping constant or slightly increasing the number of degraded stations.
The improved stations do outnumber the degraded at all depths. Additionally, the SEKF VODX10 experiment shows stronger 460 similarities to SEKF LAI, than to SEKF VODX, indicating that the differences are due to the more frequent assimilation of VODX. This results demonstrate that re-scaled VODX can indeed be a suitable substitute for LAI in LDAS-Monde.

Impact of individual and joint assimilation of vegetation variables and SSM
While previously discussed above when assessing correlations and NICR for the USCRN, this section will go into more detail regarding the effects of jointly assimilating variables of vegetation (LAI or Matched VODX) and SSM in LDAS-Monde. We 465 have already seen that the joint assimilation provides a noticeable increase in improved USCRN stations relative to the OL, over the single assimilation of vegetation variables or SSM. Figure 12 presents the same type of figure as previously, looking at the monthly scores of four LSVs of interest over CONUS, but including the joint assimilation experiments of SEKF LAI SSM and SEKF VODX SSM. The main concern in this figure and the following, is determining the improvement, if any, between the dashed (single assimilation) and solid (joint assimilation) 470 red and green lines. As seen in panel a, there is no discernible difference between the single and joint assimilation for LAI.
But as panels b and c show, jointly assimilating vegetation and SSM produces slightly improved monthly correlations over the whole CONUS domain for GPP and ET respectively. These improvements are primarily seen in the months of June through August, and while the changes are small, they consistent improvements over the single assimilation of LAI or matched VODX.
Regarding the variable of SSM, the joint assimilation strongly improves the monthly correlations from LAI to LAI SSM and 475 from VODX to VODX SSM. To reiterate, because the SSM observations assimilated are the same used to compare in panel d, it is merely an indication that the data assimilation is truly shifting model soil moisture values closer to that of observations. When assessing with in situ observations, both the NICR table and PDFs, the 5cm changes are the strongest seen, with SEKF VODX SSM providing the highest number of improved stations, while having some of the lowest number of degraded stations.
SEKF LAI SSM and SEKF VODX10 SSM perform similarly, but with a reduction in improved stations. Assimilating just 480 LAI, VODX, VODX10, or VODX_Int consistently under-perform their matching joint assimilation experiments by increasing the number of degraded stations, while having fewer improved stations. And finally, SEKF SSM, while showing the fewest degraded stations, also has the fewest improved. As depths get lower, the numbers become closer. It is generally still seen that the joint assimilation of vegetation and soil moisture improve more stations than the individual assimilation, and the number of stations degraded stays similar. These trends go to show that the joint assimilation has distinct added value in soil moisture 485 monitoring, and will be discussed in more detail in the next section.
In order to assess any geographic patterns of the USCRN correlations, maps of the NICR of each station are plotted for the four depths looking at the SEKF SSM, SEKF LAI SSM, and SEKF VODX SSM experiments. Figure 10 displays WG3 maps in panels a through c and WG_20 maps in panels d through f. Figure 11 provides WG6 maps in panels a through c and WG8 maps in panels d through f. It is clear from these figures that stations are strongly improved by using either LAI SSM or VODX 490 SSM assimilation, compared to just SSM. Additionally, a geographic pattern does emerge, primarily at 5 and 20cm. This pattern is that much of the Great Plains and Midwest show strong improvement due to the joint assimilation, while the south and east coasts show little change. Stations in the western US are more sporadic, with improved, degraded, or neutral stations spread throughout. This western region is, however, where most of the stations experiencing degradation are located. At 50 and 100cm, the SEKF SSM show minimal concrete changes compared to the OL, with near-even numbers of improved and 495 degraded sites. The joint assimilation improves upon these values, and while the degraded stations are typically still degraded in SEKF LAI SSM and SEKF VODX SSM, more stations move from neutral to improved.
Overall, the use of the USCRN in situ observations agrees with the hypothesis that the assimilation of vegetation has stronger effects on soil moisture than the assimilation of SSM. It also goes to show that the assimilation of VODX is on par or even an improvement from the assimilation of LAI. In the comparison of VOD and LAI before linear re-scaling, it is immediately apparent that vegetation type plays a large role in their relationship. These values seen and described in the results seem to indicate that heavily forested regions have only weak correlations between VOD and LAI observations. Saatchi et al. (2011) demonstrates that L-band satellite radar estimations 505 of above ground biomass (AGB) are strongly impacted by forest structure, and Mialon et al. (2020) shows poor correlations between L-band VOD and estimated AGB over heavily forested areas of the Northern hemisphere. This assessment, having isolated certain woodlands, lends credence to that same conclusion, that radar retrievals over forests have a more delicate and variable response.