Microbial diversity-informed modelling of polar marine ecosystem functions

Heterotrophic marine bacteria utilize organic carbon for growth and biomass synthesis. Thus, their variability is key to the balance between the production and consumption of organic matter and ultimately particle export in the ocean. Here we investigate a potential link between bacterial traits and ecosystem functions in a rapidly changing polar marine ecosystem 15 based on a bacteria-oriented ecosystem model. Using a data-assimilation scheme we utilize the observations of bacterial groups with different physiological states to constrain the group-specific bacterial ecosystem functions. We also investigate the association of the modelled bacterial and other ecosystem functions with eight recurrent modes representative of different bacterial taxonomic traits. High nucleic acid (HNA) bacteria show relatively high cell-specific bacterial production, respiration, and utilization of the semi-labile dissolved organic carbon pool compared to low nucleic acid (LNA) bacteria. Both 20 taxonomy and physiological states of the bacteria are strong predictors of bacterial carbon demand, net primary production, and particle export. Numerical experiments under perturbed climate conditions show overall increased bacterial activity and a potential shift from LNAto HNA-dominated bacterial communities in a warming ocean. Microbial diversity via different taxonomic and physiological traits informs our ecosystem model, providing insights into key bacterial and ecosystem functions in a changing environment. 25


Introduction
Microbes regulate many key ecosystem functions in the marine food web. Unicellular primary producers fix organic carbon (i.e., an ecosystem function termed primary production), while heterotrophic marine bacteria and archaea (hereafter bacteria) utilize the fixed organic carbon for growth and biomass synthesis (i.e., an ecosystem function termed bacterial production) (Azam et al. 1983). Thus, the variability in the abundance and activity of bacteria is central to understanding the balance 30 between production and consumption of organic matter and ultimately particle export. In flow cytometric analyses, bacteria words, molecular-level changes may not directly translate into a clear picture of changes in community structure or resulting changes to bulk ecosystem functions.
In this study, we explore a potential link between bacterial traits and ecosystem functions in the warming coastal WAP, using a bacteria-oriented ecosystem model modified from the WAP data-assimilative model (Kim et al., In Review).
The bacterial traits here include both physiological and taxonomic traits. For bacterial physiological traits, our model explicitly 70 simulates the dynamics of the two ubiquitous bacterial groups of differing nucleic acid content, a HNA group and a LNA group, by directly assimilating the group-specific biomass observations. For bacterial taxonomic traits, taxonomic modes (calculated by Bowman et al. 2017) are compared to model output values at the corresponding time point, with the assumption that taxonomy provides information about microbial ecosystem process and structure. We note that in contrast to genomescale, metabolic flux, or gene-centric models (Coles et al. 2017;Feist et al. 2009;Reed et al. 2014), our study combines 75 statistical products from genomic analyses (mode derivation from bacterial 16S rRNA gene sequence data) with numerical ecosystem modelling to incorporate molecular information into ecosystem-level dynamics.

Bacteria-oriented ecosystem model
Our bacteria-oriented ecosystem model simulates 12 state variables, including diatoms, cryptophytes, HNA bacteria, LNA 80 bacteria, microzooplankton, krill, labile dissolved organic matter (LDOM), semi-labile DOM (SDOM), ammonium, nitrate, phosphate, and detritus ( Figure 1). Refractory DOM (RDOM) and higher trophic levels are implicitly represented as model closure terms. Distinct to our model compared to the original WAP ecosystem model is the inclusion of HNA and LNA bacterial compartments (as biomass) and the partitioning of the bulk bacterial productivity by each bacterial compartment. The model is forced by mixed layer depth (MLD), photosynthetically active radiation (PAR) at the ocean surface, sea-ice 85 concentration, water-column temperature, and eddy diffusivity, and simulates the stocks and flows of C through the model state variables using a constant time step of 1 hour and a second-order Runge-Kutta scheme. An overview of the ecosystem model (Text S1) and full bacterial model schemes are found in the Supplementary Material (Text S2). In essence, the time rate of change of the group-specific biomass (constrained by observations, see Material and Methods 2.4) is calculated as follows: Text S2) of the HNA group (the same form applies to LNA group below). 95 By contrast, the group-specific production (BPHNA and BPLNA, mmol C m -3 d -1 ) is determined during optimization, given that it is the sum of the group-specific production (i.e., bulk bacterial production = BPHNA + BPLNA) that is constrained by observations: (4) 100

Modelling framework
The modelling framework in this study consists of a dynamic (mechanistic) part and a data-driven part ( Figure 2). The dynamic part represents the processes associated with the data-assimilative model ( Figure 1) that makes predictions of the microbial ecosystem processes based on prognostic, time-evolving ordinary differential equations (Text S2). The data-driven part represents how bacterial modes (Bowman et al. 2017) are compared to optimized model outputs. Our analysis relies on the 105 two types of modes described in Bowman et al. (2017): taxonomic nodes (modes hereafter) were determined from 16S rRNA gene sequence abundance, while functional modes (fmodes) were derived from predicted community metabolic structure.
Briefly, sequence reads were categorized into closest estimated genomes and closest completed genomes via the paprica pipeline (Bowman and Ducklow, 2015) and the high dimensional community and metabolic structure data were reduced to 2-D space via a self-organizing map and K-means clustering of map units (Bowman et al. 2017). The final clustering of map 110 units constitutes the modes, and each sample was assigned the mode of its closest map unit. In this approach the mode is a single categorical variable that succinctly describes key structural attributes of the sample. It is important to recognize the categorical nature of these modes, and to understand that -because of the 2-D nature of the map -there is no linear progression among modes. Thus Mode 1, for example, is not necessarily more similar to Mode 3 than it is to Mode 7. Neither mode nor fmodes is necessarily correlated to physiological traits of the bacteria (i.e., modelled HNA-and LNA compartments). In other 115 words, the relative abundance of HNA or LNA, mode, and fmodes are derived from separate observations of different parameters in the same bacterial samples and therefore independent with each other by design.
We select a nearshore Palmer LTER water-column time-series station, Station B (64.77°S, 64.05°W), in the coastal WAP as the modelling site. The Palmer LTER Station B datasets consist of roughly bi-weekly physical, chemical, and biological profiles collected from small boat via a profiling CTD and discrete water samples. Additional observational data 120 are utilized for bacterial flow cytometric (HNA and LNA) and 16S rRNA gene amplicon data collected from Arthur Harbour Station B at 10 m depth (situated 1 km from the Palmer Station B) or Palmer Station seawater intake at 6 m depth (Bowman et al. 2017). A single upper-ocean layer depth (10 m) is modelled for 4 consecutive Palmer LTER growth seasons, including observations over the Austral spring-summer season, we optimize the model each year separately over the timeframe of 125 available observations. This way, each year possesses its own unique optimized model parameter set, or a model solution. In addition to individual years, we also optimize the model for the climatological year (i.e., climatological model). The climatological year is constructed using four years (2010-11 to 2013-14), rather than the whole Palmer LTER multi-decadal record (since 1991), due to the availability of HNA and LNA biomass data only in those four years. Details on constructing the climatological year and model initialization and spin-up are found in the Supplementary Material (Text S3). 130

Data assimilation and parameter optimization
Our model utilizes a variational adjoint data assimilation scheme (Lawson et al. 1995) to minimize the misfit between observations (i.e., assimilated data, see Material and Methods 2.4) and model output by optimizing a subset of model parameters (Friedrichs 2001;Spitz, Moisan, and Abbott 2001;Ward et al. 2010). The data-assimilation scheme ( Figure 2) consists of four main steps (Glover, Jenkins, and Doney 2011). First, the model is integrated forward in time from prescribed 135 initial conditions and initial model parameter guess values (Table 1) and calculates the model-observation misfits called total cost function or cost (Material and Methods 2.5). Second, an adjoint model constructed using the Tangent linear and Adjoint Model Compiler (TAPENADE) is integrated backward in time and compute the gradients of the total cost with respect to the model parameters. Third, the computed gradients are passed to a limited-memory quasi-Newton optimization software M1QN3 3.1 (Gilbert and Lemaréchal 1989) to determine the direction and optimal step size by which the model parameters should be 140 modified to reduce the total cost. Finally, a new forward mode simulation is performed using the new set of modified parameters from the third step. These four steps are conducted in an iterative manner until the pre-set convergence criteria are satisfied ensuring the convergence of the optimized parameters and a local minimum achieved by the total cost. The pre-set criteria include the low sensitivity (gradients) of the total cost with respect to each optimized model parameter and positive eigenvalues of the Hessian matrix (Material and Methods 2.6). Every assimilation cycle, we ensure that group-specific bacterial 145 model parameters are optimized in the direction to properly represent the dynamics associated with each group (Table 1), in which we assign different magnitudes of each parameter based on our best guesses and literatures (del Giorgio and Cole 1998;Jiao et al. 2010). For instance, maximum bacterial growth rate of the HNA group (μHNA, d -1 ) should be higher than that of the LNA group (μLNA, d -1 ), so if μHNA is optimized to a smaller value than μLNA, μHNA is reset back to the original value instead of being updated. 150

Assimilated data
We assimilate Palmer LTER observational data from 10 m corresponding to compartments and flows in the model, including nitrate, phosphate, phytoplankton taxonomic specific chlorophyll (Chl) for diatoms and cryptophytes (Schofield et al. 2017), microzooplankton biomass , primary production (PP), bulk bacterial production (BP), HNA bacterial biomass, LNA bacterial biomass, semi-labile dissolved organic carbon (SDOC), particulate organic carbon (POC), and 155 particulate organic nitrogen (PON). Climatological observations of Chl (1992-93 to 2009-10) and a single year observation of https://doi.org/10.5194/bg-2020-302 Preprint. Discussion started: 2 September 2020 c Author(s) 2020. CC BY 4.0 License. microzooplankton (2010-11) are assimilated to constrain the model parameters because these data are unavailable for some study years. SDOC is calculated by subtracting the background (RDOC) concentration (40.0 mmol m -3 ) from climatological total DOC concentration. POC (PON) are assimilated to represent the model detrital pool but its measurements also contain living biomass from bottle filter experiments. For example, climatological observations show that living phytoplankton and 160 bacterial biomass account for 74% of total POC and 71% of total PON, so these fractions are used to exclude living biomass from the bulk particulate material pool. When converting Chl to phytoplankton C (N) biomass, the maximum Chl to N ratio is used along with other reference ratios (Tables S2-S6). Due to the discrepancy in the timing and location from our model experiments, the microzooplankton model-observation misfits are not discussed in this study. Krill biomass data are not assimilated due to the strong patchiness of the distribution (many zero values) that would hinder proper model optimization. 165 BP (mmol C m -3 d -1 ) is derived from 3 H-leucine incorporation rate (pmol l -1 h -1 ) data using the conversion factor of 1.5 kgC mol -1 leucine incorporated (H. W. Ducklow 2000). Group-specific bacterial biomass (mmol C m -3 ) is estimated from bacterial abundance measured by flow cytometry (i.e., bulk bacterial biomass multiplied by the fraction of each group, fHNA or fLNA, with the conversion factor of 10 fgC cell -1 ) (Fukuda et al. 1998).

Cost function and portability index 170
The total cost function or cost (J) is defined as follows to represent a misfit between observations (a " m,n ) and model output (am,n) (Luo et al. 2010): where m and n represent assimilated data types and data points, respectively, M and Nm are the total number of assimilated data types and data points for data type m, respectively, and σ m is the target error for data type m. Hereafter, we present the 175 total cost normalized by M (J' = J/M) and normalized costs of individual data types (J'm) throughout this article as the modelobservation misfit equivalent to a reduced Chi-square estimate of model goodness of fit, where J' = 1 indicates a good fit from optimization, J' >>1 indicates a poor fit due to underestimation of the error variance or the fit not fully capturing the data, and J' <<1 indicates an overfitting of the data, fitting the noise, or overestimation of the error variance. The base-10 logarithm of Chl and PP is used in Equation 1 to account for high productivity of the WAP waters and the approximate log-normal 180 distribution of those data types (Campbell 1995;Glover et al. 2018). The target error σ m is calculated for each data type m as: where a " m,n $$$$$ is the climatological mean of the observations and CV m is the adjusted coefficient of variation (CV) of the observations of each data type from 10 m depth (due to observational error and seasonal and interannual variations). The average CV of each data type at a single depth across the modelled years was 3-7 times higher compared to those across every 185 measured depth within the mixed layer over an extended year period (2002-03 to 2011Kim et al. In Review) and is therefore reduced to the level in the mixed layer to avoid an overestimated target error of each data type. The rationale behind using the adjusted CV in the target error calculation is based on Luo et al (2010), in which all properties in the mixed layer https://doi.org/10.5194/bg-2020-302 Preprint. Discussion started: 2 September 2020 c Author(s) 2020. CC BY 4.0 License.
should be completely mixed, a perfect measurement without significant errors should generate similar data values at every measured depth within the mixed layer, and the average CV of all the depth profiles can be used as CV in the target error 190 calculation. The standard deviation is used as target errors of the log-converted data types. The CV of the log-converted data type is estimated as the average of ± 1 standard deviation in log space converted back into normal space (Doney et al. 2003;Glover et al. 2018).
We compute the portability index to evaluate the broader applicability of the optimized model parameter set of each year in predicting dynamics of the other year (Friedrichs et al. 2007): 195 where J'x is the normalized cross-validation cost when a model parameter set optimized for a given year is used to simulate another year, and J'c is the normalized total cost of the climatological model. A portability index close to 1 indicates a more portable model, or a system that is not particularly sensitive to year-to-year variations in optimized model parameters, while an index <<1 indicates a less portable model, or a system that is sensitive to year-to-year variations in optimized model 200 parameters.

Uncertainty analysis
The uncertainties of the optimized parameters are estimated from a finite difference approximation of the complete Hessian matrix during iterative data assimilation processes (i.e., second derivatives of the cost function with respect to the model parameters, see Text S4 for details). When computed at the minimum of the cost function value, the square root of a diagonal 205 element in the inversed Hessian matrix is the logarithm of the relative uncertainty of the corresponding optimized parameter.
The absolute uncertainty of the constrained parameter is calculated as a•e ±σi where a is the value of optimized parameter and σi is the relative uncertainty of the corresponding optimized parameter. We then conduct Monte Carlo experiments to calculate the impact of the optimized parameter uncertainties on the model results. The Monte Carlo experiments consist of 1) creating an ensemble of parameter sets (N = 1,000) by randomly sampling values within the uncertainty ranges of the constrained 210 parameters and 2) then performing a model simulation using each parameter set. All uncertainty estimates are calculated following standard error propagation rules and presented herein as ± 1 standard deviation.

Model performance and validation
The iterative, data assimilation-parameter optimization procedure ( Figure 2) reduces by 87-91% the misfits between 215 observations and model output across all years compared to the misfit with the initial guess parameters ( Table 2). The optimized parameters satisfy the pre-set convergence criteria by reaching local cost function minima with low gradient values with regards to the total costs. The reduction in the total costs is achieved by optimizing a subset of the model parameters (i.e., https://doi.org/10.5194/bg-2020-302 Preprint. Discussion started: 2 September 2020 c Author(s) 2020. CC BY 4.0 License. 7-10 parameters with low uncertainties or constrained parameters and 12-15 parameters with large uncertainties or optimized parameters; Tables S2-5). The constrained parameters in common across all years are αDIATOM (initial slope of photosynthesis 220 vs. irradiance curve of diatoms, mol C (g Chl a) -1 d -1 (W m -2 ) -1 ), Θ (maximum Chl:N ratio, g Chl a (mol N) -1 ), μLNA (maximum LNA growth rate, d -1 ), r A max, HNA (maximum HNA active respiration rate, d -1 ), gHNA (half-saturation density of HNA bacteria in microzooplankton grazing, mmol C m -3 ), and gHNA (half-saturation density of LNA bacteria in microzooplankton grazing, Model skill is further evaluated using point-to-point comparisons (Figures S1-S5) and Taylor diagrams (Figure 3). 225 The Taylor diagrams highlight that the optimized model has better skill for 2010-11 and 2011-12, with more data types exhibiting relatively high correlation coefficients and low centred (bias removed) root-mean-square difference (RMSD). Three core variables of interest in this study, including HNA biomass, LNA biomass, and BP, show overall good model-observation agreements with relatively high correlations and low RMSD.

Cross-validation cost analyses show the overall model-observation misfit increases as expected when a set of 230
parameters optimized for one year is used to simulate another year's dynamics, indicating that each growth season is best modelled using its own unique set of optimized parameters (Table 3)

Bacterial carbon stocks and flows 235
Seasonal progression of the C stocks and flows for each bacterial group shows significant seasonal variability ( Figure S6) and interannual variability ( Figure 4A). Average C stocks and flows in the food-web are calculated over the growth season at 10m depth for each year ( Figure 5) and normalized by NPP (normalized by NPP in 1-day for C stocks) ( Figure S7). The mean HNA biomass is 6 ± 4 times (mean ± propagated standard deviation from the Monte Carlo sensitivity experiments and seasonaveraging; Figure 4B) larger than the mean LNA biomass in 2012-13, while the mean LNA biomass is 9 ± 8 times larger than 240 the mean HNA biomass in 2011-12. Bacterial carbon demand (BCD) is equivalent to total carbon flows from LDOC and SDOC pools to bacteria (BCD = BP + bacterial respiration; Figure 5) and mostly supported by the LDOC pool (77 ± 5% to 97 ± 4%) for both bacterial populations across years. Partitioning of the BP (BP = BCD -bacterial respiration; Figure 5) belonging to each bacterial group is determined during optimization. The mean HNA BP ranged from 0.01 ± 0.05 to 0.23 ± 0.25 mmol C m -3 d -1 , while the mean LNA BP ranged from 0.01 ± 0.02 to 0.08 ± 0.07 mmol C m -3 d -1 . The mean cell-specific BP (BP 245 divided by bacterial biomass or per-cell BP, d -1 ) is higher for the HNA group, ranging from 0.17 ± 0.61 to 0.40 ± 1.69 d -1 for HNA cells and 0.04 ± 0.05 to 0.16 ± 0.20 d -1 for LNA cells. The mean per-cell respiration rate (d -1 ) is also higher for the HNA group compared to the LNA group. The mean cell-specific SDOC uptake rate (d -1 ) is higher for the HNA group, while the mean cell-specific LDOC uptake rate (d -1 ) is higher for the LNA group. The mean bacterial growth efficiency, BGE (BP https://doi.org/10.5194/bg-2020-302 Preprint. Discussion started: 2 September 2020 c Author(s) 2020. CC BY 4.0 License. divided by BCD) of the HNA group is lower than that of the LNA group, ranging from 0.19 ± 0.82 to 0.38 ± 0.38 for HNA 250 cells and from 0.52 ± 1.18 to 0.62 ± 0.62 for LNA cells.
In the model, the time rate of change of the bacterial biomass is determined by the difference between the production term (i.e., total DOC uptake or BCD) and loss terms (i.e., respiration, grazing, viral mortality, and RDOC excretion). In other words, the fate of BCD is formulated as follows: where B is bacterial biomass, the sum of grazing and viral mortality is equivalent to top-down control, and four different bacterial carbon utilization efficiencies sum to 1, including the efficiencies of biomass synthesis, respiration, top-down control (grazing and viral mortality), and RDOC excretion. The largest loss term with regards to the fate of BCD is respiration for 260 HNA cells, accounting for 62 ± 7.7% to 81 ± 4.0% of the BCD (i.e., the mean respiration efficiency of 0.6 ± 0.08 to 0.8 ± 0.04 across years). Top-down control is a relatively large loss term for LNA cells, accounting for 35 ± 0.30% to 60 ± 2.5% (i.e., the mean top-down control efficiency of 0.4 ± 0.003 to 0.6 ± 0.03 across years), equally large or larger than top-down control on HNA cells (i.e., the mean top-down control efficiency of 0.1 ± 0.004 to 0.4 ± 0.03 across years).
The rest of the ecosystem state variables (stocks and flows) fall into one of three categories: 1) the variables for which 265 climatological or a single year values are assimilated (diatom-specific Chl, cryptophyte-specific Chl, and microzooplankton biomass); 2) the variables for which observational values for the given year are assimilated (nutrients, POC or detritus, and SDOC), and 3) the variables that are not directly assimilated (krill, LDOC, carbon export flux). There is a little interannual variability in the average C stocks of the first category except relatively large microzooplankton biomass in 2011-12 ( Figure   5B) and cryptophyte biomass in 2012-13 ( Figure 5C). Although nitrate is not assimilated in 2010-11 and detritus and SDOC 270 are not assimilated in 2012-13 and 2013-14, their simulated values in those unassimilated years are comparable to other assimilated years.

Bacterial physiological and taxonomic association with ecosystem functions
A property map of the emergent self-organizing map nodes (as generated by Bowman et al. 2017) shows the mode association with community structure ( To explore a potential link between mode and the key model ecosystem functions, we first extract the modelled net primary production (NPP), POC export, and BCD from our ecosystem model at the time of bacterial samples that are placed into a single mode (observed). We then perform a linear regression with mode as a factor (i.e., mode as a categorical predictor with 8 modes rather than an ordinal or continuous variable; equivalent to a one-way ANOVA with 8 different categories).
fmode does not show a significant relationship with any of the model ecosystem functions examined (all p > 0.05; not shown). 285 By contrast, 19%, 52%, and 66% of the total variance in the modelled NPP, C export, and BCD is explained by mode (Figures 7a-c). Mode 4 is associated with low NPP and low POC export, while Modes 3, 5, and 7 are associated with high NPP and high POC export. Mode 4 is associated with low BCD, while Mode 7 is associated with high BCD. Mode 6 occurs during relatively high NPP but low POC export. Mode is positively correlated to fHNA (r 2 = 0.52, p < 0.001; not shown). Similarly, we examine a potential link between fHNA and the key model ecosystem functions as described above (i.e., linear regression 290 where an observed fHNA is a predictor and the modelled ecosystem functions are dependent variables). fHNA is positively correlated to NPP (r 2 = 0.12, p = 0.01; Figure 7d) and POC export to a moderate extent (r 2 = 0.28, p <0.001; Figure 7e) and to BCD to a strong extent (r 2 = 0.45, p <0.001; Figure 7f). The stepwise addition of one predictor variable to the other predictor variable (i.e., fHNA adding to mode or vice versa) does not improve the model performance (not shown). These results suggest a clear link between modelled ecosystem functions and observed bacterial molecular data (i.e., modes). 295

Climate change experiments
We explore the response of the modelled bacterial dynamics (Results 3.2-3.3) to changing climate along the WAP (Figure 8).
Due to a varying range of portability of the optimized model solution for each year, we perform this experiment with the climatological model parameter set (Table S6) to examine an overall system response under perturbed ocean temperature (+1°C and +2°C relative to observed temperatures) and sea-ice forcing fields (10% and 20% loss of sea-ice relative to observed 300 sea-ice concentrations). The experiments are conducted under each condition separately (i.e., warming or sea-ice melting alone scenario; Figures S7-8) and simultaneously (i.e., climate change scenario; Figure 7), but the results from only the climate change scenario are discussed below, as despite different impacts of each physical forcing changes (i.e., temperature affects a variety of rate processes, while sea-ice concentration affects light levels and photosynthesis but not the mixed layer depth) climate change would cause simultaneous changes in sea ice and water temperature along the WAP. 305 The climate change experiments exhibit a combination of changes in overall ecosystem stocks and rates integrated over the growth season and shifts in the seasonal timing or phenology ( Figure 8A), compared to the base state (first row as the base state and second and third rows as anomalies in Figure 8B). Compared to other stocks and flows, HNA bacterial rates respond most to the perturbed climate conditions. In response to 2°C warming and 20% less sea ice, the maximum values increase for LDOC uptake (69 ± 2.2%; the maximum value ± standard deviation from Monte Carlo errors; results of sensitivity 310 experiments not shown), SDOC uptake (81 ± 1.9%), RDOC excretion (48 ± 1.6%), grazing (97 ± 5.0%), viral mortality (48 ± 1.6%), and respiration (69 ± 1.7%) of HNA cells, and for LDOC uptake (49 ± 3.4%), SDOC uptake (67 ± 3.9%), RDOC excretion (33 ± 2.6%), grazing (63 ± 3.9%), viral mortality (33 ± 2.6%), and respiration (49 ± 3.4%) of LNA cells. The maximum values also increased for HNA biomass (48 ± 1.6%) and LNA biomass (33 ± 2.6%). In response to 2°C warming and 20% less sea ice, the events of high NPP, POC export flux, and diatom-, microzooplankton-, and krill accumulations shift 315 to earlier in the season, with the maximum values of 50 ± 0.10%, 50 ± 0.20%, 20 ± 0.10%, and 25 ± 0.20%, and 38 ± 0.48%, respectively. More SDOC accumulates in the early spring (the maximum increase of 10 ± 0.10%) as a result of increased plankton accumulations and SDOC excretion during the same period. By contrast, LDOC becomes strongly limited throughout the growth season (the maximum decrease of 24 ± 1.0%).

Model performance and validation
In our study, only a subset of the model parameters is optimized to best simulate bacterial and other ecological patterns for each year, in accordance with other data-assimilative ecosystem model studies (Friedrichs 2001;Friedrichs et al. 2007;Friedrichs, Hood, and Wiggert 2006;Luo et al. 2010Luo et al. , 2012. In general, optimization of this class of marine ecosystem models requires adjustment of only a small number of independent model parameters to achieve well-posed model solutions, due to 325 the highly cross-correlated nature of parameters in the inherently nonlinear model equations (Fennel et al. 2001;Harmon and Challenor 1997;Matear 1996;Prunet, Minster, Echevin, et al. 1996;Prunet, Minster, Ruiz-Pino, et al. 1996). Most of the constrained parameters in our study are directly associated with bacterial processes, with overall better model-observation fits for bacterial data types, giving confidence in the simulated bacterial C stocks and flows. Despite the important biogeochemical role that heterotrophic marine bacteria play in the ocean, the vast majority of marine ecosystem models neither include bacteria 330 as a model compartment nor explicitly simulate bacterial processes. Most existing models parameterize the complex bacterial remineralization processes of the (sinking) organic matter with depth as a function of POC concentration and temperature, or by fitting with power law functions. Cellular functions, taxa, and functional gene expression of other prokaryotes, such as cyanobacteria (Hellweger 2010;Martín-Figueroa, Navarro, and Florencio 2000;Miller et al. 2013), or a diverse suite of microbial functional groups (Coles et al. 2017;Dutkiewicz et al. 2020) have been modelled so far; however, our study is the 335 first to explicitly model bacterial groups of different physiological traits.
Here, optimization sheds light on major unknown parameters associated with the bacterial grazing process, including gHNA and gLNA (half-saturation density of HNA and LNA bacteria in microzooplankton grazing, respectively; Text S1; Equation S11). Microzooplankton grazing of a given bacterial group in the model is simulated using Holling Type 2 density-dependent grazing with a preferential prey selection on diatoms, cryptophytes, and the other bacterial group, in which the single 340 microzooplankton maximum grazing rate is implemented for both bacterial groups for model simplicity purposes (Text S1; https://doi.org/10.5194/bg-2020-302 Preprint. Discussion started: 2 September 2020 c Author(s) 2020. CC BY 4.0 License.

Equation S11
). Thus, it is the half-saturation density that determined the degree of preferential grazing by microzooplankton on a certain bacterial group, the change of which would ultimately depend on biomass of each group. Due to the lack of a priori knowledge on the relative magnitude of gHNA and gLNA, we assign the identical initial guess value (Table 1) to let the data assimilation scheme find the optimized values that best fit overall observations. Compared to gLNA values, smaller 345 optimized gHNA values across all years (Tables S2-6) indicate preferential grazing of HNA cells by microzooplankton, consistent with previous speculations that grazers selectively remove larger and more active cells (Giorgio et al. 1996;Gonzalez, Sherr, and Sherr 1990;Sherr, Sherr, and McDaniel 1992), or HNA cells . Despite respiration being the largest sink for the BCD of HNA cells, the absolute amount of grazing and its values normalized by group-specific biomass are modelled to be still greater for HNA cells, compared to LNA cells (Figure 4)

Bacterial carbon stocks and flows
Our model results show significantly higher per-cell (or per C biomass) HNA BP across all years compared to the LNA group. 360 The higher per-cell rates of HNA cells in the model is largely due to the way the parameter optimization is performed to keep higher maximum cell-specific growth rates of HNA cells compared to those of LNA cells, but these per-cell rates are a function of both bulk rates and biomass stocks that are constrained from measurements and determined from trophic interactions in the model. As with phylogenetic groups (Fuchs et al. 2000;Teira et al. 2009;Yokokawa et al. 2004), cell-specific growth rates (equivalent to per-cell BP in our study) are expected to differ among distinct bacterial physiological groups, but there are 365 limited studies that focused on group-specific cell activities or growth rates (Gasol et al. 1999;Giorgio et al. 1996;GÜnter et al. 2008;Longnecker, Sherr, and Sherr 2005;Moràn, Ducklow, and Erickson 2011). Moran et al (2011) showed that HNA cells greatly outgrew LNA cells in Waquoit Bay Estuary, Massachusetts, with a cell-specific growth rate of up to 2.26 d -1 , compared to relatively slow growth of LNA cells (< 0.5 d -1 ). High cell-specific HNA BP in our model could be attributed to much larger total SDOC uptake of HNA cells than LNA cells, despite higher HNA respiration. We note that HNA cells also 370 have significantly higher SDOC uptake rates per unit C biomass compared to LNA cells. Several studies demonstrate that HNA cells depend more than LNA cells on phytoplankton substrates for growth and metabolism (Li, Jellett, and Dickie 1995;Morán et al. 2007;Scharek and Latasa 2007), but our study is the first to show the importance of the SDOC pool for the carbon demand of HNA bacteria. The hypothesis that WAP bacteria might rely on SDOC has received indirect support previously, https://doi.org/10.5194/bg-2020-302 Preprint. Discussion started: 2 September 2020 c Author(s) 2020. CC BY 4.0 License.
presumably due to LDOC limitation (Hugh W. Ducklow et al. 2011;Kim and Ducklow 2016;Luria et al. 2017). HNA cells 375 are also modelled to have relatively high per-cell respiration and respiration efficiency with regard to the fate of their carbon demand, while carbon demand of LNA cells is more likely lost to top-down control factors followed by respiration. The high HNA bacterial respiration drives an overall lower BGE of the HNA cells, despite their high BCD.
Although much of the discussion focuses on bacteria, our model also captures well the rest of the ecosystem variables.
The WAP typically exhibits strong interannual variability in phytoplankton and zooplankton biomass accumulations (Ducklow 380 et al. 2007), but the lack of their strong interannual variability in the model is due to assimilating climatological observations of diatom-and cryptophyte-specific Chl and a different year's observations of microzooplankton biomass. Krill are not assimilated, but predicted as reasonable simulated values (0.13 ± 0.03 to 0.40 ± 0.18 mmol C m -3 ) compared to the available field data (the mean krill biomass = 0.12 ± 0.03 mmol C m -3 and the maximum krill biomass = 0.57 mmol C m -3 in 2017-18 at Palmer Station; data provided by D. Steinberg). 385

Bacterial physiological and taxonomic association with ecosystem functions
In Bowman et al. (2017) fHNA and mode alone accounted for 36% and 46% of the variance in bulk BP, respectively, and together accounted for up to 51% of the variance in bulk BP. In our study, Modes 3, 5, and 7, characterized by copiotrophic taxa with large genomes and more 16S rRNA gene copies (Bowman et al., 2017), are associated with high NPP, POC export, and BCD, while Modes 4 and 6, characterized by taxa associated with more oligotrophic conditions, are associated with low 390 NPP, POC export, and BCD. Dokdonia sp. MED134, a common bacterial species of the modes associated with high NPP, POC export, and BCD, is a proteorhodopsin-containing marine flavobacterium shown to grow faster with light (Gómez- with high NPP suggests sufficient irradiance and therefore light-enhanced growth rates and cell yields of this species. Mode 395 4, by contrast, is dominated by Planktomarina temperata RCA23, a slowly growing bacterium that specializes in using complex organic substrates (Giebel et al. 2013). These attributes are consistent with this species' high occurrence during low NPP and low POC in the model. Candidatus Pelagibacter, abundant in Mode 6, is generally known as an oligotrophic specialist with a low DOC requirement, but is often observed during the Antarctic phytoplankton blooms (Delmont et al. 2014;Luria, Ducklow, and Amaral-Zettler 2014), consistent with its occurrence during high NPP periods in our model results. 400 The bacterial mode association with BCD, NPP, and particle export suggest that bacteria in the coastal WAP are controlled by resource availability. The modes predicted by high NPP and high POC export were also characterized by high BP (Bowman et al. 2017). It has been hypothesized that due to minimal inputs of terrestrial organic matter, bacteria in the WAP must ultimately rely on in situ NPP for organic matter source (Ducklow et al. 2012b). Indeed, our results show that compared to zooplankton excretion and detrital dissolution, phytoplankton excretion accounts for a major fraction of DOC 405 production. Similarly, phytoplankton account for a major fraction of detritus production, but particle export in the model is also modelled to be proportional to optimizable particle sinking speed that is most likely linked to the difference in the time-https://doi.org/10.5194/bg-2020-302 Preprint. Discussion started: 2 September 2020 c Author(s) 2020. CC BY 4.0 License. rate change of POC observations. This explains the occasional discrepancy between NPP and POC export and why Mode 6 shows a different relationship with NPP and POC export. In summary, our study provides a novel framework for the prediction of different ecosystem functions using microbial taxonomy. Certain modes represent distinct ecosystem states in the WAP and 410 such mode-state associations are reasonably explained from microbial perspectives. However, we do not address a seasonal succession and development in mode itself or the mode predictability of the key WAP ecosystem states. Future investigations should focus on directly assimilating a few dominant or seasonally distinct modes in the ecosystem model, to fully resolve seasonality of the mode-state associations along the WAP.

Climate change scenarios 415
The WAP has experienced significant atmospheric and ocean warming and resulting changes in marine ecosystem processes and further climate change is projected for the next several decades. Based on our baseline ecosystem model simulations (Results 3.2-3.3), we investigate how future climate scenarios, with decreasing sea ice and warming temperature together, might affect bacterial dynamics and ecosystem functions. Under changing climate conditions, it is expected that increased NPP and phytoplankton-and zooplankton accumulations early in the season would result in a significant build-up of DOC pools. 420 This is the case only for SDOC, but bacteria are soon strongly LDOC-limited due to their preferential and increased LDOC uptake for their primary carbon source. The growth of bacteria and increased bacterial production rates during strong LDOC limitation indicate that bacteria depend on SDOC to satisfy the rest of their carbon demand, resulting in the depletion of SDOC pool as the season progresses. In other words, bacteria are more likely resource-limited, especially labile pool of the organic matter, and SDOC subsequently plays an increasingly important role as a part of the bacterial carbon demand. This change is 425 particularly important in HNA cells, as shown by a relatively large increase of BCD via SDOC pool, compared to LNA cells.
Despite all four loss terms with regard to the fate of the BCD (i.e., respiration, grazing, viral mortality, and RDOC excretion) increase for both groups under perturbed conditions, overall high BCD leads to increased biomass of both bacterial groups.
Temperature is a major factor regulating bacterial biomass, production, and growth rates by changing the rate of enzymatic reactions (Kirchman, Morán, and Ducklow 2009;White et al. 1991). In our case, stocks and rates of both HNA-and LNA cells 430 somewhat increase under a temperature warming alone scenario ( Figure S8) but much more so under a sea-ice melting alone scenario (i.e., increased photosynthesis and resource availability). This is evident for all variables during high NPP and plankton accumulation events early in the spring ( Figure S9), except for SDOC uptake of both bacterial groups due to overall increased resource availability. This suggests that temperature per se is not a more important limiting factor for bacterial growth than resource availability (Ducklow et al. 2012a), and warming temperature would rather enhance bacterial utilization 435 of the already increased organic matter from increased phytoplankton activity. Our findings also suggest that future climate conditions along the WAP would further impact the distribution of taxonomic groups with a potential shift to more abundant HNA cells in the WAP bacterial communities due to their preferential SDOC utilization. This shift would cooccur with increased NPP and POC export, reinforcing the "high NPP-high particle export-more abundant HNA bacteria" link shown in the optimized model results for each year. The scenario of "more abundant HNA bacterial cells in more productive waters" 440 implies a relatively strong resource control on these actively growing cells. This is consistent with previous studies that show increased HNA growth rates in response to enhanced phytoplankton-derived organic substrate (Moran et al. 2010) and larger abundance of HNA cells in areas or periods where bacterial assemblages were predominantly controlled by resources, rather than grazing (Morán et al. 2007).

Conclusions 445
Heterotrophic microbial diversity has seldom been considered in detail in the formulation and analysis of marine pelagic ecosystem models reflecting in part the lack of suitable field data for model evaluation. Utilizing genomic products to prescribe taxonomic aspects of the bacterial model dynamics, our study investigates the association of bacterial abundance with different physiological states, bacterial community structure and key ecosystem functions. Our modelling approach enables the observations in different bacterial populations to constrain the group-specific processes and model parameters that have been 450 poorly understood. These include the partitioning of BP specific to HNA and LNA groups, the partitioning of the bacterial uptake of DOC pools with different lability, and the half-saturation density of each bacterial group in microzooplankton grazing. The model is an effective platform to explore the WAP microbial response to changing climate conditions, in which warming and decreasing sea ice would induce a potential shift to the dominance of HNA bacteria in more productive waters due to their increasing dependence on SDOC. 455

Code availability
The model simulation results and codes are available in a NetCDF data structure and Fortran 90 at the GitHub data repository (https://github.com/hyewon-kim-whoi).

Data availability
Complete Palmer LTER time-series data used for data assimilation are available online (http://pal.lternet.edu/data). Surface downward solar radiation flux data used for physical forcing of the model simulations can be found in the National Centers for Environmental Prediction website (https://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanalysis.surface.html). The

680
Model closure terms include RDOM and higher trophic levels. The flows between the prognostic state variables with the name of the numbered flows in the legend only represent for two bacterial compartments.

Figure 2: Data assimilation scheme.
A variational adjoint method is employed for the parameter optimization and data assimilation processes (adapted from Glover et al., 2011). Gradient: the sensitivity of the total cost function with respect to model parameter from optimization. After optimization is finished, optimized model output is interpreted as a function of bacterial taxonomic modes (modes).