A convolutional neural network for spatial downscaling of satellite-based solar-induced chlorophyll ﬂuorescence (SIFnet)

. Gross primary productivity (GPP) is the sum of leaf photosynthesis and represents a crucial component of the global carbon cycle. Space-borne estimates of GPP typically rely on observable quantities that co-vary with GPP such as vegetation indices using reﬂectance measurements (e.g., normalized difference vegetation index, NDVI, near-infrared reﬂectance of terrestrial vegetation, NIR v , and kernel normalized difference vegetation index, kNDVI). Recent work has also utilized measurements of solar-induced chlorophyll ﬂuorescence (SIF) as a proxy for GPP. However, these SIF measurements are typically coarse resolution, while many processes inﬂuencing GPP occur at ﬁne spatial scales. Here, we develop a convolutional neural network (CNN), named SIFnet, that increases the resolution of SIF from the TROPOspheric Monitoring Instrument (TROPOMI) on board of the satellite Sentinel-5P by a factor of 10 to a spatial resolution of 500 m. SIFnet utilizes coarse SIF observations together with high-resolution auxiliary data. The auxiliary data used here may carry information related to GPP and SIF. We use training data from non-US regions between April 2018 until March 2021 and evaluate our CNN over the conterminous United States (CONUS). We show that SIFnet is able to increase the resolution of TROPOMI SIF by a factor of 10 with a r 2 and RMSE metrics of 0.92 and 0.17 mW m − 2 sr − 1 nm − 1 , respectively. We further compare SIFnet against a recently developed downscaling approach and evaluate both methods against independent SIF measurements from Orbiting Carbon Observatory 2 and 3 (to-gether OCO-2/3). SIFnet performs systematically better than the downscaling approach ( r = 0 . 78 for SIFnet, r = 0 . 72 for downscaling), indicating that it is picking up on key features related to SIF and GPP. Examination of the feature importance in the neural network indicates a few key parameters and the spatial regions in which these parameters matter. Namely, the CNN ﬁnds low-resolution SIF data to be the most signiﬁcant parameter with the NIR v vegetation index as the second most important parameter. NIR v consistently outperforms the recently proposed kNDVI vegetation index. SIFnet are and presented through a series of case studies across the United States. SIFnet represents a robust method to infer continuous, high-spatial-resolution SIF data.


Introduction
Photosynthesis represents the single largest CO2 flux between the atmosphere and the biosphere. At the canopy level, the sum of all leaf photosynthesis is termed Gross Primary Productivity (GPP) and accurate characterization of GPP represents a major uncertainty in the carbon cycle (Friedlingstein et al., 2019). Directly measuring GPP from remote sensing systems (e.g., satellites) is not presently possible. Instead, previous work has utilized stationary measurements of net ecosystem exchange 25 (NEE) from flux towers that can be decomposed into GPP and respiration (e.g., Reichstein et al., 2005). Observable quantities from satellites (e.g., vegetation indices computed from reflectance data) are then related to GPP inferred from flux towers (e.g., Huete et al., 2006;Jung et al., 2019;Sims et al., 2006;Zeng et al., 2020) in light use efficiency (Mahadevan et al., 2008) or machine learning models Jung et al. (2019) to derive global estimates of GPP.
Vegetation indices such as NDVI and NIR v combine two (or more) spectral bands with different absorption characteristics 30 (Hanes, 2013) to infer quantities related to plant physiology and canopy structure. The MODIS instrument was launched on the Terra and Aqua satellites in 1999 and 2002. This instrument has proved particularly useful due, in part, to the long operational lifetime. More recently launched satellites, like Sentinel-5P, carry instruments with the necessary signal to noise ratio and spectral resolution to retrieve solar-induced chlorophyll fluorescence (SIF), which is a measure of re-emitted photons by the chlorophyll during photosynthesis (Köhler et al., 2018). Vegetation indices (also termed greenness) can be regarded 35 as a measure of photosynthetic capacity whereas SIF indicates photosynthetic activity (Sellers, 1985). SIF has been shown to be a powerful proxy for estimating GPP (Magney et al., 2019;Turner et al., 2021), to capture the impact of drought on photosynthetic activities across different vegetation types (Shekhar et al., 2020a;Castro et al., 2020), and to assess the regional source of carbon emissions (Shekhar et al., 2020b).
value that falls into the coarse resolution grid cell. In case of coarser resolutions than 0.05 • it is resampled to the common grid.
Quality control flags and cloud filtering are applied when necessary.
MODIS measures the reflected radiance from the earth surface in seven different spectral bands covering the visible and 90 infrared spectral region. Vegetation indices are computed by combining the near-infrared (where chlorophyll is non-absorbing) and the red band (where chlorophyll is highly absorbing) (Hanes, 2013). Specifically, the normalized difference vegetation index (NDVI) (Tucker, 1979), near-infrared vegetation index (NIR v ) (Badgley et al., 2017), the kernel NDVI (kNDVI) (Camps-Valls et al., 2021), and the enhanced vegetation index (EVI) (Huete et al., 2002) are computed as: kN DV I = tanh(N DV I 2 ) (3) EVI coefficients for MODIS are: L=1, C 1 =5, C 2 =7.5, G=2.5 (Huete et al., 2002). ρ N IR is the near infrared band, ρ RED is the red band, ρ BLU E is the blue band from the MODIS satellites.
Temperature and precipitation is taken from ERA5-Land data at the time step of interest and with a delay of one time step.
Soil moisture has been shown to be a strong driver of global photosynthesis due, in part, to its impact on vapor-pressure deficit 105 (Humphrey et al., 2021). Here we use the coarse resolution NASA USDA SMAP soil moisture (Entekhabi et al., 2010) as a model input and explore its correlation with TROPOMI SIF. The cosine of the SZA is a proxy for photosynthetic active radiation (PAR) under cloud-free conditions Turner et al., 2021).
Time invariant data sets consist of elevation data, fractional land cover classification, and forest fragmentation data. The land cover classification (Buchhorn et al., 2020) is resampled to 11 fractional classes. The forest fragmentation data consists of two 110 channels and has a native resolution of 30 m. One channel describes the share of forest within the grid cell (forest share) and the other how much of that forest is edge forest (defined as a maximum distance to an edge or other land cover type of 30 m).
OCO-2 and OCO-3 have high spatial resolution (2.25 km x 1.29 km) but small swaths (10 km) and a 16-day revisit time.   We are interested in understanding what these different data sets are telling us about SIF and also how they co-vary with each 115 other. We compare all collected time variant data against TROPOMI SIF in the spatial and temporal domain. As a quantitative measure, we compute the Pearson correlation coefficient (r) (Benesty et al., 2009). SIF against the auxiliary data at the lowest resolution of the two corresponding sets. Negative SIF values (on the x-axis in Figure 1) are due to relatively high retrieval errors which scale with radiance levels (Köhler et al., 2018).
In Figure 2 shows spatial patterns of the Pearson correlation coefficients between both N IR v and kN DV I to SIF . In 120 both our spatial ( Fig. 2 and temporal (Fig. 1) analyses, we find N IR v is a better predictor for SIF than kN DV I, which   linear regression methods (De Veaux and Ungar, 1994). We therefore keep all input features in our investigation and expect a longer convergence time during training of the CNN.

Training and Optimization of the Neural Network
Convolutional neural networks (CNNs) are supervised machine learning methods that need matching feature and ground truth data pairs to compute the loss that is back propagated. As such, we being by coarsening SIF data to 0.5 • and used with auxiliary data at 0.05 • as input to SIFnet, allowing us to estimate SIF at 0.05 • . The model output is compared against the measured TROPOMI SIF at 0.05 • . After optimizing the model it can resolve a scaling factor of ten between coarse resolution input SIF and model output SIF. Figure 3 visualizes this method. In the following step of estimating high resolution SIF, the feature SIF data has a resolution of 0.05 • and auxiliary data of 0.005 • , resulting in a model output of SIF at 0.005 • . in Supplemental Section S2. There are five folds used as training data: two folds over Asia, one over Europe, one over southern part of Africa, one over South America. Our validation region is North America ( Figure S4). The hyperparameter tuning is done by training the model on the five folds and computing the loss of the validation data. The parameters are optimized to 155 minimize the loss of the validation data set. Due to computational reasons and the size of the data set we do not apply a cross validation in the optimization process. The final product consists of high resolution SIF at 0.005 • and is validated against independent SIF measurements of the instruments OCO-2 and OCO-3.
We center and scale each feature individually by subtracting the mean and normalizing by the standard deviation. For data augmentation of the training data, we use random crops and random flips. Each day of one fold has a matrix size of 1200x900 160 pixels. We are analyze 69 days in 16 day steps over 3 years. For each input during the training process we randomly crop a matrix with a size of 100x100 pixels. As some areas have a large fraction of missing values (e.g., due to water or clouds), we only use cropped matrices that consist of >80% valid pixels in the SIF product. Further, we randomly flip vertically and horizontally, both with a probability of 0.5. These data augmentation methods provide us a huge database that should avoid overfitting the network parameters. During training water bodies and all missing values are set to zero.  (Brunet et al., 2011). Therefore, we are not only optimizing the overall deviation of the estimated SIF to the measured SIF but also on structural patterns. Supplemental Section S4.4 shows the benefit of including both MSE and SSIM terms in the loss function. Equation 5 shows our loss function: where n is the number of datapoints; y i is the datapoint i in measured (ground truth) SIF; y i : is the datapoint i in estimated SIF; Y are all datapoints of measured (ground truth) SIF; Y are all datapoints of estimated SIF; µ Y is the mean of Y ; µ Y is the this is expected as this metric depends on the magnitude of the signal.

Which features drive SIFnet?
190 We are particularly interested in understanding which features drive our neural net. Here we evaluate the feature importance using the Permutation Feature importance method (Breiman, 2001;Fisher et al., 2019;Gregorutti et al., 2015Gregorutti et al., , 2017 with our North American validation data at a target resolution of 0.05°. The method first computes the RMSE including all input features (RMSE orig. ). We then apply the following three steps: 1. Shuffle all pixels of one input feature randomly in time and space.    Figure 5b shows the spatial feature importance over the validation set in North America for the four most 205 important features. We observe that SIF LR has the biggest impact in the Eastern US, which corresponds strongly to the high vs low productivity regions in the US. N IR v is a strong predictor in Southeastern US and in shrub regions in the Western US.
The contribution of cos(SZA) is highest at high latitudes and weakens at lower latitudes. NIR v is found to be less predictive of SIF at high latitudes. The land mask is the fourth most important input feature and contributes most in shrub regions. These four features consistently stand out as the strongest predictors. Other inputs such as fragmentation and soil moisture were not found 210 to be strong predictors here. In supplemental Section S4.6 we test higher scaling factors between low and high resolution SIF.
Even with scaling factors of 20 and 50 low resolution SIF stays the most and second most important input feature, respectively. We also examined different combinations of inputs such as directly including the MODIS bands, as opposed to vegetation indices derived from MODIS bands (see Supplemental Figure S11). Low resolution SIF remains the most important feature, followed by the NIR band ρ N IR . The land cover products increase in relevance. Interestingly, when low resolution SIF is omitted as input for the model, we observe contrasting results to Figure 5 where NIR v is no longer a leading predictor. We find that cos(SZA), ρ N IR , kN DV I, and N DV I are the four most important features in this case (see Supplemental Figure S12). This may result from the collinearity between input features. This finding was robust to multiple optimizations and permutations. Figure 6 shows the 0.005 • SIF estimated by SIFnet and SIF downscaled SIF from Turner et al. (2020). The difference between 220 the two SIF estimates can be seen in Fig. 6c. SIFnet predicts lower SIF in the western US drylands and higher SIF over forested regions in the eastern US. This prediction of lower SIF in drylands is interesting because Turner et al. (2020) resorted to an ad hoc bias correction in these regions due to a low signal-to-noise ratio. Recent work from Wang et al. (2021) found SIF to perform poorly in the western US drylands. We also observe systematic differences in the predicted SIF in urban areas.

Comparison of SIFnet to Downscaled SIF
These regions are further evaluated in Supplemental Figure S17. Notably,the SIFnet estimate is systematically lower than the where Daily SIF (x, y) is the daily integrated SIF estimate, SIF (τ s , x, y) is the instantaneous SIF at the individual measurement time, SZA is the solar zenith angle, τ s is the time of the satellite measurement, τ o is the time of sunrise, τ f is the time of sunset. This implicitly assumes that both PAR and SIF scale with cos(SZA) under cloud-free conditions and we 250 neglect Rayleigh scattering as well as gas absorption. Although this approach neglects several water or light conditions, it provides our best estimate of daily SIF and enables comparison between multiple SIF products with different measurement times Köhler et al., 2018;Frankenberg et al., 2011). The method is equivalent to the daily correction scheme for OCO-2, OCO-3, and TROPOMI(OCO-2 Science Team/Michael Gunson, 2020; OCO-3 Science Team/Michael Gunson, 2020; Frankenberg et al., 2011). Additionally, we performed a sensitivity study where we trained SIFnet using daily-corrected 255 SIF and found the results to be generally insensitive to the use of instantaneous vs daily-corrected SIF (see Supplemental Figure S18). Following this, we chose to apply the daily correction after deriving the high-resolution SIF . weaker correlations in the Western drylands due, in part, to a lower signal-to-noise ratio. Overall, we find SIFnet to perform systematically better than the downscaled SIF, as shown in the difference plot. Fig. 7b summarizes these spatial patterns in a scatterplot comparison. SIFnet again shows better performance than the downscaled SIF against OCO-2, OCO-3, and OCO-2/3.
The Pearson correlation coefficient r is 0.78 and 0.72 for the SIFnet and downscaled estimate, respectively, when comparing to all OCO data (right column in Figure 7b). The generally high RM SE indicates different scales and variability in the data 265 sets. These also appear in the comparison of TROPOMI SIF at 0.05 • resolution against OCO-2/3 (Supplemental Figure S19).
However, the errors generally increase at 0.005 • compared to the 0.05 • resolution. OCO-3 data on a 0.02°grid multiplied with the L 1 -norm between the SIFnet and the downscaled SIF. The column on the right highlights regions where both the differences in predicted SIF are large and which product is performing better. As such, the right column will show white in areas where the difference in predicted SIF is small or the correlation with OCO is similar.

(b) Validation of SIFnet and downscaled SIF to OCO-2 and OCO
While we observe large differences in predicted SIF for the urban areas (column 3), we don't find one product to perform systematically better in urban areas. This likely indicates the complexity in the SIF signal arising from urban areas. Additionally,   available. In Figure 8 it is also visible that the SIFnet estimate correlates better with the OCO-X data than the downscaled SIF for the region of the Chicago airport.

Conclusions
Here, we develop a convolutional neural network (CNN) model named SIFnet to increase the resolution of TROPOMI SIF by a factor of 10. The novelty of our method consists of using coarse resolution SIF measurements together with high resolution 290 auxiliary data as model input to estimate high resolution SIF. After optimization and hyperparameter tuning of SIFnet, the estimated SIF at 500 m resolution yields an r 2 and RM SE of 0.92 and 0.17, respectively, when comared against validation data ( Figure 4). We further compare the output of SIFnet against a recently developed downscaling method to estimate highresolution SIF (Turner et al., 2020) and evaluate both methods against independent observations from the Orbiting Carbon Observatory 2 and 3 (OCO-2/3). SIFnet is found to perform systematically better than the downscaling approach when comared 295 against indepdent measurements. Through interpretable machine learning methods, we identify the key features that SIFnet utilizes to accurately predict high-resolution spatial patterns of SIF. We find that SIFnet relies heavily on the low resolution SIF feature (SIF LR ) and the vegetation index N IR v ( Figure 5).
SIFnet is a multi-layer CNN that increases the spatial resolution of the TROPOMI SIF by a factor of 10. Our model uses auxiliary datasets related to gross primary productivity and SIF as inputs and yields a high-resolution SIF estimate. The model 300 is trained using three years of data from Asia, Europe, Africa, and South America. North America is used as the validation data set. Our loss function is comprised of two terms: the mean squared error and the structural dissimilarity index. The combination of these two terms improved the performance of our model.
SIFnet was further compared to the recent downscaled SIF product developed by Turner et al. (2020). The two highresolution estimates showed pronouced differences across the western US drylands. This difference is particularly interesting 305 because these drylands tend to be low-productivity regions and traditionally have been difficult for SIF to accurately capture due to the low signal-to-noise ratio. Both high-resolution SIF estimates were compared to independent observations from OCO-2/3. SIFnet performed systematically better than the downscaled SIF (r = 0.78 for SIFnet, r = 0.72 for downscaling).
SIFnet and the downscaling method also yielded differences in urban regions. However, there was substantial hetereogeneity in the performance of SIFnet and downscaling in urban areas. One product did not perform systematically better than the other 310 within urban areas. The mixed results in urban areas likely relates to both the complexity of the photosynthetic activity in urban areas as well as the lack of training data, as urban areas represent a small fraction of the total landmass.
We adapted techniques from the area of interpretable machine learning to assess the key features driving SIFnet. Specifically, we conducted random permutations to input datasets and assessed the impact on the resulting RMSE. From this, we found that SIFnet relies most heavily on the low-resolution SIF feature (SIF LR ). The second most important factor is the MODIS 315 vegetation index N IR v . N IR v is also found to outperform the recently proposed kN DV I vegetation index, in contrast to Camps-Valls et al. (2021). The interpretable machine learning approach also allowed us to spatial regions of importance for the different parameters. Interestingly, SIFnet relies more heavily on N IR v in the western drylands where the SIF signal-tonoise ratio is low. This implies that SIFnet is picking up on key physics that leads to the improved performance relative to the downscaling method. Urban areas represent a region where SIFnet and the downscaling method performed comparatively well.

320
Urban areas, while important in carbon cycle, represent a small percent of the total land mass. This means there is relatively less training data for SIFnet in urban areas. Further investigation into the processes controlling high-resolution GPP and SIF in urban areas is warranted. Overall, SIFnet represents a robust method to infer continous high spatial resolution information about processes related to gross primary productivity.