Controlling factors on the global distribution of a representative marine heterotrophic diazotroph phylotype ( Gamma A )

Heterotrophic diazotrophs emerge as a potentially important contributor to the global marine N2 fixation, while the factors controlling their distribution are unclear. Here, we explored what controls the distribution of the most sampled heterotrophic diazotroph phylotype, Gamma A, in the global ocean. First, we analyzed the relationship between nifH-based Gamma A abundance and climatological biological and environmental conditions. The carrying capacity of Gamma A 10 abundance increased with net primary production (NPP) and saturated when NPP reached ~400 mg C m-2 d-1. The reduction in Gamma A abundance from its carrying capacity was mostly related to low temperature, which possibly slowed the decomposition of organic matter, and high concentration of dissolved iron, to which the explanation was elusive but could result from the competition with autotrophic diazotrophs. Using a generalized additive model, these climatological factors together explained 41% of the variance in the Gamma A abundance. Second, in additional to the climatological background, 15 we found that mesoscale cyclonic eddies can substantially elevate Gamma A abundance, implying that Gamma A can respond to short-term features and benefit from stimulated primary production by nutrient inputs. Overall, our results suggest that the distribution of Gamma A is most likely determined by the supply of organic matters, not by those factors controlling autotrophic diazotrophs, and therefore insight a niche differentiation between the heterotrophic and autotrophic N2 fixation. More samplings on Gamma A and other heterotrophic diazotroph phylotypes are needed to better reveal the controlling 20 mechanisms of heterotrophic N2 fixation in the ocean.


Introduction
Dinitrogen (N2) fixation, mostly conducted by prokaryotic bacteria (termed "diazotrophs"), is an important bioavailable nitrogen (N) source to the ocean (Moore et al., 2018;Karl et al., 2002). Although autotrophic cyanobacteria have been recognized as important diazotrophs in the ocean , heterotrophic non-cyanobacteria diazotrophs (NCDs) have 25 been widely detected (Moisander et al., 2017;Riemann et al., 2010) and sometimes even found to dominate the diazotrophic gene pools in surface oceans (Farnelid et al., 2011). For example, NCD nifH (a gene encoding subunit of nitrogenase enzyme) amplicons were far superior in number to autotrophic diazotrophs at some sampling sites in the South Pacific Ocean (Halm et al., 2012;Moisander et al., 2014), Indian Ocean (Shiozaki et al., 2014;Wu et al., 2019) and South China Sea (Ding et al., https://doi.org/10.5194/bg-2021-349 Preprint. Discussion started: 7 January 2022 c Author(s) 2022. CC BY 4.0 License. 1/(2n), where n was the number of data. Even though they can be reliable, we excluded them in the analyses to avoid possible biases.
In the following analyses, we represented Gamma A abundance using its nifH copies, although we noted that variations in nifH copies in different diazotroph cells were reported (White et al., 2018;Sargent et al., 2016). 95

Environmental and ecological parameters
Monthly climatological environmental and ecological parameters were used as predictors for Gamma A abundance (Table 2).
Temperature and concentrations of nitrate, phosphate and silicate were the products of the World Ocean Atlas (WOA) 2018 100 (www.nodc.noaa.gov) (Locarnini et al., 2018;Garcia et al., 2019), and excess phosphate (P * ) was derived from concentrations of phosphate and nitrate based on the Redfield ratio (P * = [phosphate] -[nitrate]/16). Dissolved iron (Fe) concentrations were obtained from the Community Earth System Model -Biogeochemistry module (Misumi et al., 2014). Dissolved organic carbon concentration used a product estimated by artificial neural network (Roshan and Devries, 2017). Mixed layer depth (MLD) was downloaded from Ifremer (http://www.ifremer.fr/) using the criterion that the potential density of water parcels at the 105 depth was 0.03 kg m -3 higher than that at the surface (De Boyer Montégut et al., 2004 (Lee et al., 2005), and used it to estimate the attenuation coefficient: The PAR at depth can be calculated while we assumed organisms in the mixed layer were exposed to PAR homogenously: where A is the surface PAR. 115 To identify if the Gamma A abundance is sampled in cyclonic or anticyclonic eddies, we used daily sea level anomaly data (SLA) provided by AVISO program (www.aviso.altimetry.fr). Only those cyclonic (negative SLA) and anticyclonic (positive SLA) eddies with clear shapes were recorded.

Statistical analyses
For Gamma A data points sampled in the same months and the same depth bins (defined in WOA), they were binned to 2° ´ 2° grids to help eliminate possible biases caused by concentrated samplings in specific regions, resulting in 893 binned means of log-10 based Gamma A nifH abundance. The corresponding environmental and ecological parameters were also averaged to the same bins when necessary. Univariate Pearson correlation was used to evaluate linear relationship between Gamma A 130 abundance and each environmental or ecological variable.
We also used the generalized additive model (GAM) using R package 'mgcv' (Wood, 2017) to demonstrate non-linear relationships between the multiple predictors and the Gamma A abundance: where y is response variable (Gamma A abundance), N is the ith predictor (i.e., the environmental or ecological variable), 135 is the intercept, ( N ) is a linear combination of smooth functions of predictor N , n is number of predictors and is standard error. To avoid over-fitting to noise, the Restricted Maximum Likelihood (REML) method was selected for the GAM smoothing parameters of every predictor with the basis function number (k) set to 9 (Wood et al., 2016). In the model selection of GAM, a double penalization approach was used to identify and remove those insignificant predictors with large smoothing parameters and set them to zero functions (Marra and Wood, 2011). 140 The scientific color maps are used in several figures to prevent visual distortion of the data and exclusion of readers with colorvision deficiencies (Crameri et al., 2020). https://doi.org/10.5194/bg-2021-349 Preprint. Discussion started: 7 January 2022 c Author(s) 2022. CC BY 4.0 License.

Global distribution of Gamma A nifH abundance
The nifH gene abundance ranges from 1 to 10 7 copies L -1 in the global ocean and shows an approximately log-normal 145 distribution ( Fig. S1). High abundance of Gamma A nifH abundance over 10 5 copies L -1 is prevalent in subpolar North Pacific,

Primary production determines the carrying capacity of Gamma A abundance
The logarithm of Gamma A nifH abundance positively correlated to the logarithm of net primary production (NPP) (correlation = 0.21, p < 0.01) (Fig. 2). More importantly, the upper bound of Gamma A abundance increased with the NPP [log10(Gamma 160 A) = 5.1 Log10NPP -6.3] when NPP < 10 2.6 (≈400) mg C m -2 d -1 , above which the upper bound of Gamma A abundance saturated at ~10 7 nifH copies L -1 (Fig. 2).

Figure 2. The relationship between Gamma A abundance and net primary production. Both Gamma A abundance and 165
net primary production (NPP) are log10-transformed. The NPP-determined Gamma A carrying capacity (red line) in the "low" NPP range [< 10 2.6 (≈400) mg C m -2 d -1 ] is estimated by linearly fitting the highest Gamma A abundance values (red dots) in NPP intervals of 10 0.1 mg C m -2 d -1 . The Gamma A carrying capacity saturates at 10 7.0 nifH copies L -1 for NPP > 10 2.6 mg C m -These results indicated that local NPP could largely determine the carrying capacity of Gamma A abundance, which was 170 expected because Gamma A was heterotrophic bacteria and needed sufficient supply of organic matter from primary producers, particularly for their energetically intensive N2 fixation. This conclusion can also be partly supported by previous experimental studies in which the addition of organic carbon enhanced heterotrophic nitrogen fixation and NCD abundance in oligotrophic seas (Benavides et al., 2015;Rahav et al., 2016;Moisander et al., 2012;Dekaezemacker et al., 2013). Our finding contradicted the hypothesis mentioned above that Gamma A preferred oligotrophic waters based on samples mainly in tropical and 175 subtropical Pacific and Atlantic Oceans, in which Gamma A reached 8 × 10 4 nifH copies L -1 (Shiozaki et al., 2018a;Langlois et al., 2015). However, the new dataset (Cheung et al., 2020) included in the present study showed even higher (over 10 5 nifH copies L -1 ) Gamma A abundance in the subarctic North Pacific (Fig.1) where nutrient concentrations and NPP are generally high.

Univariate linear relationships between environmental factors and Gamma A abundance 180
We then analyzed what environmental factors may limit Gamma A abundance from reaching the carrying capacity. We first defined this disparity for each data point, DGamma-A, as the observed Gamma A abundance minus corresponding carrying capacity in logarithmic space. That is, DGamma-A can be treated as the 'residual' of data to the carrying capacity line in Fig. 2.
Therefore, a positive correlation between DGamma-A and an environmental factor can indicate that Gamma A prefers the increase of this factor, and vice versa. 185 As the Gamma A carrying capacity saturated at an NPP of 10 2.6 (≈400) mg C m -2 d -1 , we then divided our dataset into a lowand a high-NPP groups at this threshold ( Fig. 2) in further analyses to address possibly different controlling factors and mechanisms on Gamma A abundance.
In the univariate linear analyses (Fig. 3), the most correlated variable to DGamma-A was the dissolved Fe concentration in both groups, but surprisingly the relationship was negative. Temperature and DOC were positively correlated with DGamma-A in the 190 low-NPP group, while the relationships turned negative when NPP was high. In terms of inorganic nutrients, DGamma-A correlated negatively to nitrate and correlated positively to the excess P (P * ) in the low-NPP group, while both relationships became insignificant in the high-NPP group. Phosphate had no significant relationship with DGamma-A in either group. Silicate was positively correlated to Gamma A, and the correlation became stronger in high NPP area. The positive correlation between PAR and DGamma-A, particularly when NPP is low, can be a result of a decreasing trend of Gamma A abundance with water 195 depth. Lastly, DGamma-A and the mixed layer depth were negatively correlated only in the high-NPP group.

Multivariate nonlinear relationships between environmental factors and Gamma A abundance using GAM
Although the linear univariate correlations can provide basic information in analyzing relationships between each environmental factor and the Gamma A abundance, false relationships can be generated by intercorrelations existing among 205 the environmental variables. Additionally, the relationships can also be nonlinear. We then used a GAM multivariate analysis to partly avoid these possible problems and to obtain a more reliable relationship (Fig. 4). Note that phosphate was not included in the GAM because it was not correlated to DGamma-A in both the low-and the high-NPP groups (Fig. 3) and its variance can nevertheless be partly represented by P * .

DOC
Although DOC was supposably one of the major carbon sources for Gamma A, it did not impact DGamma-A in the low-NPP group (Fig. 4a) and even showed a negative linear relationship with DGamma-A under high NPP (Fig. 4i). First, DGamma-A was the 220 residual to the NPP-determined carrying capacity and therefore was a collective indicator in which the effects of organic carbon production had been largely removed. Additionally, low DOC concentrations in high-NPP regions may even indicate that the DOC pool is more labile and can be more easily used (Jiao et al., 2014). Lastly, particulate organic matter (POM) can also fuel Gamma A and can even create favorable oxygen-deplete microenvironments for Gamma A (Farnelid et al., 2019), but it was not included in this study because of insufficient data. 225

Temperature
Temperature had a generally positive relationship with DGamma-A (Figs. 4b and 4j). This is consistent with several regional studies in which a strong positive correlation between temperature and Gamma A abundance was also found (Shiozaki et al., 2018a;Moisander et al., 2014). The relationship is expected considering the widely recognized increase in heterotrophic bacterial production with temperature in the ocean because of stimulated bacterial metabolism (Kirchman and Rich, 1997; 230 Pomeroy and Wiebe, 2001). In addition, DGamma-A started to rise at a lower temperature (~15°C) in the high-NPP group (Fig.   4j) than that (~20°C) in the low-NPP group (Fig. 4b). The contribution of temperature to DGamma-A is larger in the low-NPP group (Fig. 4b) than that in the high-NPP group (Fig. 4j). A possible reason is that the consumption rate of less labile DOC produced in less productive regions is more sensitive to temperature (Lønborg et al., 2018;Brewer and Peltzer, 2017;Carlson et al., 2004). This result implied that high temperature could be more important for Gamma A to activate accumulated semi-235 labile DOC in low-productive oligotrophic oceans.

Nitrate and P *
Neither nitrate nor P * had a substantial effect on DGamma-A (Figs. 4c-d and 4k-l). This is consistent with a previous review showing that nitrate did not show an immediate inhibition of Gamma A (Moisander et al., 2017). How and to what extent NCDs are inhibited by nitrate remains unknown (Bombar et al., 2016). Abundant Gamma A was found in oceans with high 240 nitrate concentrations (Bird and Wyman, 2013) or shallow nitracline (Shiozaki et al., 2014). The hypothesis that low-nitrate and high-P * environments favor autotrophic diazotrophs is based on the competition of inorganic nutrients between the diazotrophs and other phytoplankton (Karl and Letelier, 2008;Deutsch et al., 2007), while this competition should not occur directly between NCDs and phytoplankton because the former is heterotrophic while the latter is autotrophic. Nevertheless, high inorganic nutrients may still play a role in Gamma A distribution by indirectly impacting the carrying capacity of Gamma 245

Iron
In both the low-and the high-NPP groups, DGamma-A generally showed a decreasing trend with the increasing dissolved Fe, except for a slight increase in DGamma-A when the dissolved Fe increased in the range of 0.01-0.1 nM (Figs. 4e and 4m). Our dataset showed that high abundance of Gamma A was prevalently observed in the North Pacific Ocean (Fig. 1a), where Fe 250 was considered as the dominant limiting factor for N2 fixation (Sohm et al., 2011). Other Gammaproteobacterial phylotypes such as Gamma 3 and Gamma ETSP2 were also found to dominate the diazotrophic community in the eastern South Pacific (Turk-Kubo et al., 2014;Halm et al., 2012) where Fe heavily limited primary production (Knapp et al., 2016;Bonnet et al., 2008). It has also been suggested that Gamma A and unicellular cyanobacterial diazotroph UCYN-B share niches in Fedepleted western and southern Pacific Oceans (Moisander et al., 2014;Chen et al., 2019a), possibly to avoid competing with 255 other Fe-demanding diazotrophs. Gammaproteobacterial diazotrophs may be equipped with siderophore releasing genes, such as that already reported in another versatile phylotype Gamma 4 (Cheung et al., 2021), and the released siderophores are an efficient tool in scavenging low-level Fe in the ocean (Boyd and Ellwood, 2010). Although more studies are certainly needed to further explore the ecological and physiological mechanisms and evolutionary reasons, the good survival of Gamma A in a low-Fe environment is an intriguing finding that may expand our recognized space of active N2 fixation in the ocean. 260

Silicate
Our GAM results also revealed a positive relationship between silicate and DGamma-A in both the low-and the high-NPP groups (Figs. 4f and 4n), indicating a possible association between Gamma A and diatoms. NCDs have been found on the surface of diatoms (Martínez et al., 1983) or on the diatom mats as discussed above. Diatom-dominant ecosystems tend to produce abundant large particles either from dead diatoms and their aggregates or the fecal pellets generated by zooplankton (Tréguer 265 et al., 2018). The large particles can be a good habitat for NCDs as already discussed. A metagenomic study also indicated the swimming motility gene expression of NCDs and their potential particle-attached lifestyle (Delmont et al., 2018). Our results then provide indirect evidence for this hypothesis.

Light
PAR did not show a substantial contribution to the variance of DGamma-A in our multivariate GAM analysis (Figs. 4g and 4o). 270 The high correlation between PAR and DGamma-A (Fig. 3) was therefore more likely resulted from multicollinearity between PAR and other environmental factors. The decrease in Gamma A abundance with depth found in the Southwestern Pacific ( Fig. S2a) and in other regions by previous studies (Moisander et al., 2008;Langlois et al., 2015;Chen et al., 2019b;Shiozaki et al., 2014;Wu et al., 2019) may therefore be attributed to higher productivity, more released photosynthetic products and higher temperature in the surface ocean, instead of the direct effects of light such as the hypothesized photoheterotrophy of 275 Gamma A (Moisander et al., 2014). The nearly constant Gamma A abundance with depth in the Tropical Atlantic Ocean and the Indian Ocean and (Fig. 1 and Figs. S2b-c) can be the results of active transport of organic matter from the surface that fuels heterotrophic N2 fixation in the dark deeper ocean.

Predictions based on GAM
Overall, the multivariate GAM model explained 45% and 39% of the variance in DGamma-A in the low-and high-NPP groups, 280 respectively (Figs. 5a-b). The predicted DGamma-A generally followed the observed values, although it tended to underestimate the observed high DGamma-A (> -1) or overestimate the low DGamma-A (< -5) (Figs. 5a-b). The moderate explained variance indicated that although the tested environmental factors can substantially influence Gamma A abundance, there must be other untested factors and unknown mechanisms that can also substantially impact the Gamma A distribution. The GAM models for DGamma-A (Figs. 5a-b) were added to the estimated NPP-based carrying capacity (Fig. 2) to form a prediction model for the Gamma A abundance (Fig. 5c). Although a substantial fraction of variance in Gamma A abundance was still unexplained (R 2 = 0.41), the predicted and the observed Gamma A abundances were generally consistent (Fig. 5c).
The predicted Gamma A abundance ranged mostly on the order of 10 1 -10 6 nifH copies L -1 , slightly narrower than that of the 295 observations (10 0 -10 7 nifH copies L -1 ), which was mainly attributed to the performance of the GAM models for DGamma-A as discussed above.
Although the overall R 2 was at a moderate level of 41%, we applied this model to give a first-order estimate of Gamma A abundance in the surface ocean (Fig. 6a) from climatological NPP and environmental factors (Fig. S3), admitting that this demonstration did not fully cover the observed spatial variance in Gamma A abundance. The results suggested that the Gamma areas. The predicted high abundance in the Southern Ocean was mostly caused by its high nitrate concentration (Figs. S3g-h).
However, the largest uncertainties for the predictions also exist in the Southern Ocean (Fig. 6b) as there were no Gamma A samples in this high-nitrate area (Fig. 1). Future sampling in the Southern Ocean can then test our predictions and reduce the uncertainties. The prediction also showed the lowest Gamma A in the South Pacific Subtropical Gyre (Fig. 6a) where however 305 the uncertainty was among the highest in the tropical and subtropical regions (Fig. 6b) similarly because it was under sampled (Fig. 1).

Impact of mesoscale eddies on Gamma A
The root-mean-square error (RMSE) of 0.86 and an R 2 of 41% in the prediction model (Fig. 5c) indicated that there was still substantial unexplained variance in Gamma A abundance. One possible reason can be attributed to the climatological monthly 315 means we used in the environmental factors. We then explored whether the occurrence of short-term phenomena, such as the mesoscale eddies, can impact Gamma A abundance.
In the low-and the high-NPP groups, we identified 39 and 20 data points of Gamma A abundance that were sampled in cyclonic eddies, respectively, while more (204 and 55, respectively) were sampled in anticyclonic eddies. This is consistent with the fact that eddies are more likely anticyclonic in the Northern Hemisphere, where our most (74%) sampling points were 320 located (Chelton et al., 2011).
The results showed that in the high-NPP group, the average Gamma A abundance within cyclonic eddies was one order of magnitude higher than that in anticyclonic eddies or outside eddies (Fig. 7b), while the difference in the low-NPP group was much smaller and statistically insignificant (t-test, p > 0.05) (Fig. 7a). The systematically higher Gamma A abundance is unlikely to be caused by the locations of cyclonic eddies because most of the climatological factors were not significantly 325 different across types of eddies, except for a slightly lower dissolved Fe and DOC in cyclonic eddies in the high-NPP group (t-test, p < 0.05) ( Fig. S4a and S4c). We then further checked the residuals of the predicted Gamma A abundance using climatological factors (i.e., Fig. 5c), still finding that the Gamma A abundance in cyclonic eddies in the high-NPP group was significantly higher (one-tail t-test, p < 0.05) than the climatology-based predictions by on average a half order of magnitude, while this was not the case for samples in anticyclonic eddies or outside eddies (Fig. 7d). Note that the residuals of predicted 330 Gamma A abundance in anticyclonic eddies in the low-NPP group were also significantly but only slightly higher than 0 (onetail t-test, p < 0.05) (Fig. 7c).  These results indicated that cyclonic eddies could stimulate Gamma A abundance, but only in the high productivity oceans (> 400 mg C m -2 d -1 in this study). This finding is opposite to a previous hypothesis on autotrophic diazotrophs that anticyclonic eddies form a nitrate-depleted and well-lit environment favorable to N2 fixation (Davis and Mcgillicuddy, 2006;Fong et al., 2008;Church et al., 2009;Liu et al., 2020). However, a sufficient supply of organic matter can play a prominent role in However, the moderate explanatory power of our prediction model indicates that there must be other unknown factors and 375 mechanisms also impacting heterotrophic diazotrophs. For instance, heterotrophic diazotrophs found in the guts of copepods (Scavotto et al., 2015) imply that they are subject to top-down controls, which was also suggested for marine autotrophic diazotrophs (Landolfi et al., 2021;Wang et al., 2019;Wang and Luo, in press). The uneven spatial samplings of Gamma A may also introduce biases into our analyses. Lastly, a universal primer is still lacking for detecting Gamma A and other NCDs such as Alphaproteobacteria and Cluster III phylotype, which can also be important diazotrophs particularly in previously 380 unrecognized regions for marine N2 fixation (Wu et al., 2019;Langlois et al., 2008;Martínez-Pérez et al., 2018;Chen et al., 2019b). The combination of PCR amplification and metagenomic data can identify a broader NCD community (Delmont et al., 2018) and may help us design a better universal primer targeting major NCDs. More studies are needed in the future to improve our understandings of controlling factors, niches and distributions for heterotrophic diazotrophs, so that their contribution to global marine N2 fixation can be better evaluated. 385

Author contributions
Y.-W.L. conceived and supervised the study. Y.-W.L. and Z.S. designed the study. Z.S. collected and analyzed the data and 390 drafted the first version of the manuscript. Y.-W.L. and Z.S. contributed to the discussion of the results and revised the manuscript.

Competing interests
The authors declare that they have no conflict of interest.