Modelling polar marine ecosystem functions guided by bacterial physiological and taxonomic traits

Heterotrophic marine bacteria utilize organic carbon for growth and biomass synthesis. Thus, their physiological 15 variability is key to the balance between the production and consumption of organic matter and ultimately particle export in the ocean. Here we investigated a potential link between bacterial traits and ecosystem functions in a rapidly changing polar marine ecosystem based on a bacteria-oriented ecosystem model. Using a data-assimilation scheme we utilized the observations of bacterial groups with different physiological states to constrain the group-specific bacterial ecosystem functions. We then examined the association of the modelled bacterial and other key ecosystem functions with eight recurrent 20 modes representative of different bacterial taxonomic traits. High nucleic acid (HNA) bacteria showed relatively high cell-specific productivity, respiration, and utilization of the semi-labile dissolved organic carbon pool compared to their low nucleic acid (LNA) bacteria counterparts. Both taxonomy and physiological traits reflected the variability of bacterial carbon demand, net primary production, and particle sinking flux. Numerical experiments under perturbed climate conditions showed a potential shift from LNA- to HNA-dominated bacterial communities in the warming WAP. Our study suggests that bacterial 25 diversity via different taxonomic and physiological traits can guide the modelling of the WAP ecosystem, providing insights into key bacterial and ecosystem functions under climate change. The stepwise addition of one predictor variable to the other predictor variable (i.e., f HNA adding to mode or vice versa) did not improve the model performance (not shown). These results suggest a clear link between the modelled ecosystem functions and observed bacterial taxonomic (modes) and physiological ( f HNA) traits observations.


Introduction
Microbes regulate many key ecosystem functions in the marine food web. Unicellular primary producers fix organic carbon (i.e., an ecosystem function termed primary production), while heterotrophic marine bacteria and archaea (hereafter 30 bacteria) utilize the fixed organic carbon for growth and biomass synthesis (i.e., an ecosystem function termed bacterial production) (Azam et al. 1983). Thus, the variability in the abundance and activity of bacteria is central to understanding the balance between production and consumption of organic matter and ultimately particle export. In flow cytometric analyses, bacteria cluster into two groups of cells with different nucleic acid content, including high nucleic acid (HNA) and low nucleic acid (LNA) cells (Bouvier et al. 2007;Gasol et al. 1999). These two groups are suggested to represent lineages (Schattenhofer 35 et al. 2011;Vila-Costa et al. 2012) or physiological states (Bowman et al. 2017), and HNA cells are generally larger in both cell and genome size compared to LNA cells (Bouvier et al. 2007;Calvo-Díaz and Morán 2006). The significance of HNA versus LNA cells in determining distinct ecosystem states and functions has been investigated, but much is still unknown. In a recent study along the West Antarctic Peninsula (WAP), the high dimensionality of the bacterial community structure data was reduced via emergent self-organizing maps and subdivided into a small number of bacterial modes associated with specific 40 taxonomic and functional traits (Bowman et al. 2017). Bowman et al. (2017) demonstrated that a combination of taxonomy, physiological structure (i.e., HNA and LNA cells), and abundance of bacterial communities explained up to 73% of the variance in bulk bacterial production. Their findings imply that bacterial physiological and taxonomic traits could inform a predictive ecosystem model to further explore ecologically important questions such as: Can bacterial traits reflect key ecosystem functions like primary production and particle sinking flux? If so, what are the underlying mechanisms driving such 45 bacterial trait-ecosystem function relationships? And how will these relationships be impacted by climate change?
The WAP is a rapidly warming marine ecosystem, with resulting changes in physical, ecological, and biogeochemical processes (Clarke et al. 2009;Cook et al. 2005;Ducklow et al. 2007;King 1994;Meredith and King 2005;Stammerjohn et al. 2008;Vaughan et al. 2003;Vaughan 2006;Whitehouse et al. 2008). Routine monitoring through the Palmer Long-Term Ecological Research project (Palmer LTER; since 1991) has revealed climate-driven variations in seasonal phytoplankton 50 accumulation (Saba et al. 2014;Schofield et al. 2017), bacterial dynamics (Bowman and Ducklow 2015;Ducklow et al. 2012a;Kim and Ducklow 2016;Luria et al. 2017;Luria et al. 2014), nutrient drawdown , and micro-and macrozooplankton dynamics Steinberg et al. 2015;Thibodeau et al. 2019). The wealth of Palmer LTER observations enabled the construction of a numerical marine ecosystem model for the coastal WAP region (i.e., the WAP-1D-VAR model; Kim et al. 2021), based on existing regional test-bed models of other ocean basins (Friedrichs 2001;55 Friedrichs et al. 200655 Friedrichs et al. , 2007Luo et al. 2010Luo et al. , 2012. The WAP-1D-VAR model was compared against roughly bi-weekly timeseries data over the growth season (October -March) near Palmer Station (64.77°S, 64.05°W; the mean depth of ~65 m) that recorded seasonal variations in ecological processes modulated by variations in surface light, mixed layer depth, and surface sea-ice cover. The WAP-1D-VAR model utilized a data assimilation scheme to minimize the misfits between model results and observational data via a variational adjoint method (Lawson et al. 1995), by assimilating the available Palme LTER 60 observations. Serving as a mechanistic model, assimilation of the Palmer LTER observations constrained poorly measured bacterial processes (e.g., respiration, viral and grazing mortality, growth efficiency, carbon demand, and utilization of dissolved organic matter with varying lability) and enabled model predictions of the microbial system state in changing environments. Yet, incorporating molecular observations into an ecosystem model is still a challenge due to differences in how levels of biological organization are treated in observations and models (Hellweger 2020) and the high dimensionality of microbial 65 molecular observations. One argument is that molecular-level changes may not directly translate into a clear picture of changes in community structure or resulting changes to bulk ecosystem functions.
In this study, we explored a potential link between bacterial traits and ecosystem functions in the warming coastal WAP, using a bacteria-oriented ecosystem model originally derived and modified from the WAP-1D-VAR model (Kim et al. 2021). The bacterial traits examined in this study included both physiological and taxonomic traits. For physiological traits, 70 the model explicitly simulated the dynamics of the two ubiquitous bacterial groups of differing nucleic acid content, a HNA group and a LNA group, by directly assimilating the group-specific biomass observations. For taxonomic traits, taxonomic modes derived from bacterial 16S rRNA gene sequence data (calculated by Bowman et al. 2017) were compared to model output values at the corresponding time points, with the assumption that microbial taxonomy would provide information about microbial ecosystem process and structure. In contrast to genome-scale, metabolic flux, or gene-centric models (Coles et al. 75 2017; Feist et al. 2009;Reed et al. 2014), this study combines statistical products from genomic analyses with numerical ecosystem modelling to incorporate molecular information into ecosystem-level dynamics.

Bacteria-oriented ecosystem model
The bacteria-oriented ecosystem model was originally derived and modified from the 1-D variational data assimilation 80 model for the coastal WAP region (WAP-1D-VAR v1.0; Kim et al. 2021) simulating 12 state variables, including diatoms, cryptophytes, bacteria, microzooplankton, krill, labile dissolved organic matter (LDOM), semi-labile DOM (SDOM), ammonium (NH4), nitrate (NO3), phosphate (PO4), and particulate detritus (Figure 1; model equations in Appendix A; more details about model setup in Text S1-4). Refractory DOM (RDOM) and higher trophic levels were implicitly represented as model closure terms. Distinct to this study's model compared to the WAP-1D-VAR model was the inclusion of HNA and 85 LNA bacterial compartments (as biomass) and the partitioning of the bulk bacterial productivity by each bacterial compartment. The model was forced by mixed layer depth (MLD), photosynthetically active radiation (PAR) at the ocean surface, surface sea-ice concentration, water-column temperature, and eddy diffusivity ( Figure S1) to simulate the stocks and flows of C, N, and P through the model state variables using a constant time step of 1 hour and a second-order Runge-Kutta scheme. In essence, the time rate of change of the group-specific biomass (constrained by observations; section 2.4) was 90 calculated as follows: .4.38, A.4.41), GZ C HNA is C-specific grazed 95 amount of cells by microzooplankton (mmol C m -3 d -1 ; Eq. A.4.44), and M C HNA is viral mortality (mmol C m -3 d -1 ; Eq. A.4.47) of the HNA group (the same form applies to LNA group below). dCLNA Total (bulk) bacterial production (BP; BP = BPHNA + BPLNA) was constrained by observations, and therefore, the groupspecific production (BPHNA and BPLNA, mmol C m -3 d -1 ) was determined during optimization: 100

Modelling framework
The modelling framework consisted of a dynamic (mechanistic) part and a data-driven part ( Figure 2): 1) the dynamic part as the processes associated with the data-assimilative model ( Figure 1) that made predictions of the microbial ecosystem 105 processes based on prognostic, time-evolving coupled ordinary differential equations (Appendix A); and 2) the data-driven part representing how bacterial modes (Bowman et al. 2017) were compared to optimized model outputs. The analysis in the present study relied on the two types of modes, including taxonomic modes and functional modes, described in Bowman et al. (2017): taxonomic modes (modes hereafter) were determined from 16S rRNA gene sequence abundance, while functional modes (fmodes hereafter) were derived from predicted community metabolic structure. Briefly, sequence reads were 110 categorized into closest estimated genomes and closest completed genomes via the paprica pipeline (Bowman and Ducklow, 2015) and the high dimensional community and metabolic structure data were reduced to 2-D space via a self-organizing map and K-means clustering of map units (Bowman et al. 2017). The final clustering of map units constitutes the modes, and each sample was assigned the mode of its closest map unit. In this approach the mode is a single categorical variable that succinctly describes key structural attributes of the sample. It is important to recognize the categorical nature of these modes, and to 115 understand that -because of the 2-D nature of the map -there is no linear progression among modes. Thus mode 1, for example, is not necessarily more similar to mode 3 than it is to mode 7. Neither mode nor fmodes is necessarily correlated to physiological traits of the bacteria (i.e., HNA-and LNA compartments). In other words, derived from separate observations of different parameters in the same bacterial samples, the relative abundance of HNA or LNA, mode, and fmodes are independent with each other by design. 120 We selected a nearshore Palmer LTER water-column time-series station, Station B (64.77°S, 64.05°W), in the coastal WAP as the modelling site for this study (the mean depth of ~65 m). The Palmer LTER Station B datasets consisted of roughly bi-weekly physical, chemical, and biological profiles collected via a profiling Conductivity-Temperature-Density (CTD) and rosette. Additional observational data were utilized for bacterial flow cytometric (HNA and LNA) and 16S rRNA gene amplicon data collected from Arthur Harbour Station B at 10 m depth (situated 1 km from the Palmer Station B) or Palmer 125 Station seawater intake at 6 m depth (Bowman et al. 2017). Three upper-ocean depth levels (0, 10, and 20 m) were modelled for 4 consecutive Palmer LTER growth seasons, including November 2010-March 2011(year 2010-11 hereafter), 2011-12, 2012-13, and 2013, but the only results from 10 m were analysed in detail due to the availability of bacterial traits data there. The 1-D modelling of the coastal WAP region could be justified given that the WAP is a region of relatively weak net advection compared with the Antarctic Circumpolar Current (ACC) or the subpolar gyres (Meredith et al., 2008(Meredith et al., , 2013, and 130 the CTD observations at Palmer Station did not show the evidence of abrupt changes in physical and biogeochemical tracers due to lateral advection, but showed rather laterally homogeneous temperature and salinity distributions during the Antarctic growing season for the modelled depth and years in this study (Kim and Ducklow 2016). Given the availability of the Palmer LTER observations over the Austral spring-summer season, we optimized the model each year separately over the timeframe of available observations. This way, each year possessed its own unique 135 optimized model parameter set (model) equivalent to a model solution for the minimized model-observation misfit for that year. In addition to these four years (2010-11 to 2013-14), we optimized the model for the climatological year, referred as the climatological model. The climatological year was constructed by averaging observations in the four years (2010-11 to 2013-14), rather than the whole Palmer LTER multi-decadal period (since 1991), due to the limited availability of HNA and LNA biomass data only in those four years. Details on constructing the climatological year and model initialization, spin-up, and 140 bottom boundary conditions are found in the Supplementary Material (Text S3).

Data assimilation and parameter optimization
The model utilized a variational adjoint data assimilation scheme (Lawson et al. 1995) to minimize the misfit between observations (i.e., assimilated data, section 2.4) and model output by optimizing a subset of model parameters (Friedrichs 2001;Spitz et al. 2001;Ward et al. 2010). The data-assimilation scheme ( Figure 2) consisted of four main steps (Glover et al. 145 2011). First, the model was integrated forward in time from prescribed initial conditions and initial model parameter guess values (Table 1) to calculate the model-observation misfits referred as total cost function or total cost (section 2.5). Second, an adjoint model constructed using the Tangent linear and Adjoint Model Compiler (TAPENADE) was integrated backward in time and compute the gradients of the total cost with respect to the model parameters. Third, the computed gradients were passed to a limited-memory quasi-Newton optimization software M1QN3 3.1 (Gilbert and Lemaréchal 1989) to determine the 150 direction and optimal step size by which the model parameters needed to be modified to reduce the total cost. Finally, a new forward mode simulation was performed using the new set of modified parameters from the third step. These four steps were conducted in an iterative manner until the pre-set convergence criteria were satisfied ensuring the convergence of the optimized parameters and a local minimum achieved by the total cost, via low gradients (sensitivity) of the total cost with respect to each optimized parameter and positive eigenvalues of the Hessian matrix (section 2.6). 155 The initial parameter subset submitted to optimization consisted of 10 different model parameters, with one parameter per each state variable, the change of which yielded the largest decrease in the total cost function during preliminary sensitivity tests, including αDA (initial slope of photosynthesis vs. irradiance curve of diatoms, mol C (g Chl a) -1 d -1 (W m -2 ) -1 ), αCR (initial slope of photosynthesis vs. irradiance curve of cryptophytes, mol C (g Chl a) -1 d -1 (W m -2 ) -1 ), Θ (maximum Chl:N ratio, g Chl a (mol N) -1 ), μHNA (maximum HNA growth rate, d -1 ), r A max,HNA (maximum HNA active respiration rate, d -1 ), gHNA (half-160 saturation density of HNA bacteria in microzooplankton grazing, mmol C m -3 ), μMZ (maximum microzooplankton growth rate, d -1 ), μKR (maximum krill growth rate, d -1 ), and remvKR (krill removal rate by higher-trophic levels, (mmol C m -3 ) -1 d -1 ; Tables S2-6). In most cases this initial subset led to positive eigenvalues of the inverse Hessian matrix, but in case of negative eigenvalues the corresponding parameters were eliminated from optimization (section 2.6) and a few other sensitive parameters were added and kept, one at each optimization cycle, if they further reduced the total cost function. Every assimilation cycle, 165 we ensured that group-specific bacterial model parameters were optimized in the direction to properly represent the dynamics associated with each group (Table 1), in which we assigned different magnitudes of each parameter based on the best guesses and literatures (del Giorgio and Cole 1998;Jiao et al. 2010). For instance, maximum bacterial growth rate of the HNA group (μHNA, d -1 ) was ensured to be optimized to be higher than that of the LNA group (μLNA, d -1 ), so if μHNA was optimized smaller than μLNA, μHNA was reset back to the original value instead of being updated. 170

Assimilated data
We assimilated Palmer LTER observational data from 0, 10, and 20 m corresponding to compartments and flows in the model, including NO3, PO4, phytoplankton taxonomic specific chlorophyll (Chl) for diatoms and cryptophytes (Schofield et al. 2017), microzooplankton biomass , primary production (PP), bulk BP, HNA bacterial biomass, LNA bacterial biomass, semi-labile dissolved organic carbon (SDOC), particulate organic carbon (POC), and particulate organic 175 nitrogen (PON). The group-specific Chl was not measured in 2011-12, but due to its importance in constraining the groupspecific phytoplankton dynamics, the 4-year climatological value was assimlated for 2011-12. NO3 was not assimilated in 2010-11, while POC, PON, and SDOC were not assimilated in 2012-13 and 2013-14 due to the lack of observations in those years. Krill biomass data were not assimilated due to the strong patchiness of the distribution (many zero values) that would hinder proper model optimization, while a single year measurement data of microzooplankton biomass (2010-11) was 180 assimilated for all years to at least provide constraints on phytoplankton grazing parameters. Microzooplankton modelobservation misfits were not examined due to the discrepancy in the timing and location of the data compared to this study. SDOC was calculated by subtracting the background (RDOC) concentration (40.0 mmol m -3 ) from climatological total DOC concentration. POC (PON) were assimilated to represent the model detrital pool, but its measurements contained living biomass from bottle filter experiments. Climatological observations showed that living phytoplankton and bacterial biomass 185 accounted for 74% of total POC and 71% of total PON, so these fractions were used to exclude living biomass from the bulk particulate material pool. When converting Chl to phytoplankton C (N) biomass, the maximum Chl to N ratio was used along with other reference ratios (Tables S2-6). BP (mmol C m -3 d -1 ) was derived from 3 H-leucine incorporation rate (pmol l -1 h -1 ) data using the conversion factor of 1.5 kgC mol -1 leucine incorporated (Ducklow 2000). Group-specific bacterial biomass (mmol C m -3 ) was estimated from bacterial abundance measured by flow cytometry (i.e., bulk bacterial biomass multiplied by 190 the fraction of each group, fHNA or fLNA, with the conversion factor of 10 fgC cell -1 ) (Fukuda et al. 1998).

Cost function and portability index
The total cost function or cost (J) was defined as follows to represent the misfit between observations (a ! m,n ) and model output (am,n) (Luo et al. 2010): where m and n represent assimilated data types and data points, respectively, M and Nm are the total number of assimilated data types and data points for data type m, respectively, and σ m is the target error for data type m. Hereafter, we referred the total cost as the total cost normalized by M (J' = J/M) and normalized costs of individual data types (J'm) as the modelobservation misfit equivalent to a reduced Chi-square estimate of model goodness of fit (i.e., J' = 1 as a good fit from optimization, J' >>1 as a poor fit due to underestimation of the error variance or the fit not fully capturing the data, and J' <<1 200 as an overfitting of the data, fitting the noise, or overestimation of the error variance). The base-10 logarithm of Chl and PP was used in Eq. 5 to account for high productivity of the WAP waters and the approximate log-normal distribution of those data types (Campbell 1995;Glover et al. 2018). The target error σ m was calculated for each data type m as: σ m = a ! m,n $$$$$ · CV m (6) where a ! m,n $$$$$ is the climatological mean of the observations and CV m is the adjusted coefficient of variation (CV) of the 205 observations of each data type over 0, 10, and 20 m (due to observational error and seasonal and interannual variations). The average CV of each data type at a single depth across the modelled years was higher compared to those across every measured depth within the mixed layer over an extended year period modelled in the WAP-1D-VAR model (2002-03 to 2011Kim et al. 2021) and was therefore reduced to the level in the mixed layer to avoid an overestimated target error of each data type. The rationale behind using the adjusted CV in the target error calculation was based on Luo et al (2010), in which all properties 210 in the mixed layer should be completely mixed, a perfect measurement without significant errors should generate similar data values at every measured depth within the mixed layer, and the average CV of all the depth profiles can be used as CV in the target error calculation. The standard deviation was used as target errors of the log-converted data types. The CV of the logconverted data type was estimated as the average of ± 1 standard deviation in log space converted back into normal space (Doney et al. 2003;Glover et al. 2018). 215 We computed the portability index to evaluate the broader applicability of the optimized model parameter set of each year in predicting dynamics of the other year (Friedrichs et al. 2007): x is the normalized cross-validation cost when a model parameter set optimized for a given year is used to simulate another year, and J'c is the normalized total cost of the climatological model. A portability index close to 1 indicates a more 220 portable model, or a system that is not particularly sensitive to year-to-year variations in optimized model parameters, while an index <<1 indicates a less portable model, or a system sensitive to year-to-year variations in optimized model parameters.

Uncertainty analysis
The uncertainties of the optimized parameters were estimated from a finite difference approximation of the complete Hessian matrix during iterative data assimilation processes (i.e., second derivatives of the cost function with respect to the 225 model parameters; Text S4). When computed at the minimum of the cost function value, the square root of a diagonal element in the inversed Hessian matrix is the logarithm of the relative uncertainty of the corresponding optimized parameter. The absolute uncertainty of the constrained parameter was calculated as pf ´ e ±σi where pf is the value of optimized parameter and σf is the relative uncertainty of the corresponding optimized parameter. We then conducted Monte Carlo experiments to calculate the impact of the optimized parameter uncertainties on the model results. The Monte Carlo experiments consisted of 230 1) creating an ensemble of parameter sets (N = 1,000) by randomly sampling values within the uncertainty ranges of the constrained parameters and 2) then performing a model simulation using each parameter set. All uncertainty estimates were calculated following standard error propagation rules and presented herein as ± 1 standard deviation.

Model skill assessment 235
The iterative optimization procedure ( Figure 2) reduced by 24-93% the misfits between observations and model output for each year and the climatological year, compared to the misfits obtained using the initial guess parameters ( Table 2). The optimized parameter sets satisfied the pre-set convergence criteria, including local minima achieved by the total costs, low gradients of the total costs with respect to each optimized parameter, and positive eigenvalues of the Hessian matrix. The total costs were reduced by optimizing only a subset of the model parameters (5-7 constrained parameters and 3-6 optimized 240 parameters; Tables S2-6). The optimized parameters in common across all years were αDA (initial slope of photosynthesis vs. irradiance curve of diatoms, mol C (g Chl a) -1 d -1 (W m -2 ) -1 ), μHNA (maximum HNA bacterial growth rate, d -1 ), μLNA (maximum LNA bacterial growth rate, d -1 ), and gCR (half-saturation density of cryptophytes in microzooplankton grazing, mmol C m -3 ). gHNA (half-saturation density of HNA bacteria in microzooplankton grazing, mmol C m -3 ), gMZ (half-saturation density of microzooplankton in krill grazing, mmol C m -3 ), and μKR (maximum krill growth rate, d -1 ) were next frequently optimized, at 245 least for 4 years out of a total of 5 modelled years. Because this study focused on composites of the modelled ecosystem functions as a function of bacterial mode and of the fraction of different physiological groups (section 3.2), rather than of year (Figures S2-5), we combined observations and model results from all four years together for further model skill assessment. The Taylor diagrams indicated similar model skills between the four years ( Figure 3a) and the climatological year ( Figure 3b). Three core bacterial variables in this study, 250 including HNA biomass, LNA biomass, and BP, showed overall better model-observation agreements than other data types, with relatively high correlations, low centred (bias removed) root-mean-square difference (RMSD), and normalized standard deviation closer to 1. These three variables also had better fits to the four-year seasonal cycles of the observations than other data types ( Figure S7). However, the model skill for HNA biomass slightly degraded in the climatological model ( Figure 3b), with lower correlations and normalized standard deviation and higher RMSD than the four years together ( Figure 3a). The 255 optimized models captured best the temporal and spatial (depth) variability of PP, as shown by its high correlations ( Figure 3), but the models tended to underestimate PP with relatively larger errors than for other data types ( Figure S7). By contrast, there were slight positive model biases for POC and PON ( Figure S7) and their variability was not well captured as shown by the negative correlations ( Figure 3). Cross-validation cost analyses showed increased model-observation misfits when a set of parameters optimized for 260 one year was used to simulate another year's dynamics (Tables 1-2), suggesting that each year was best modelled using its own unique set of optimized parameters. The magnitude of the cost function increase varied by year pair, with the portability index values indicating that the optimized model parameters for 2012-13 was most portable (0.76 ± 0.11), followed by those for 2013-14 (0.73 ± 0.17), 2010-11 (0.67 ± 0.08), and 2011-12 (0.61 ± 0.12; Table 2).
The rest of the model stocks and flows fell into one of three categories: 1) the variable for a single year values were assimilated (i.e., microzooplankton biomass); 2) the variables for which observational values for the given year were 280 assimilated (i.e., nutrients, POC or detritus, and SDOC), and 3) the variables that were not assimilated (i.e., krill biomass, LDOC, NH4, and particle sinking flux). There was a little interannual variability in the average microzooplankton carbon biomass ( Figure 5). NO3, POC, and SDOC in unassimilated years were modelled to values comparable to those in other assimilated years ( Figure 5). Modelled LDOC and NH4 were also within the reasonable range of their typically small value (< 1 μM). 285

Bacterial physiological and taxonomic association with ecosystem functions
A property map of the emergent self-organizing map nodes (as generated by Bowman et al. 2017) showed the mode association with community structure ( Figure 6). The coloured map units (the circles in the background) were clustered into taxonomic mode membership or modes ( Figure 6A), showing a different frequency of appearance year-to-year ( Figure 6B). Each mode was dominated by unique bacterial taxa. For example, Candidatus Pelagibacter was most abundant in mode 6 290 ( Figure 6C), Dokdonia sp. MED134 in mode 7 ( Figure 6D), Candidatus Thioglobus singularis PS1 in mode 1 ( Figure 6E), Owenweeksia hongkongensis DSM 17368 in mode 2 ( Figure 6F), Rhodobacteraceae in mode 5 ( Figure 6G), and Planktomarina temperata RCA23 in mode 4 ( Figure 6H).
To explore a potential link between mode and the key ecosystem functions, we first extracted the modelled net primary production (NPP), POC sinking flux, and BCD from the ecosystem model at the time of bacterial samples and depth (10 m) 295 that were placed into a single, observed mode. We then performed a linear regression with mode as a factor (i.e., mode as a categorical predictor with 8 modes rather than an ordinal or continuous variable; equivalent to a one-way ANOVA with 8 different categories). fmode did not have a significant relationship with any of the modelled ecosystem functions examined (all p > 0.05; not shown). By contrast, 27%, 36%, and 77% of the total variance in the modelled NPP, POC sinking flux, and BCD was explained by mode ( Figures 7A-C). In particular, modes 3, 5, and 7 were associated with 2-3 times higher NPP, 300 POC sinking flux, and BCD, compared to when mode 4 dominated (two-sample t-test with unequal sample size, p = 0.02 for NPP and p < 0.001 for POC sinking flux and BCD), or to when mode 6 dominated (p = 0.03 for NPP, p = 0.003 for POC sinking flux, and p < 0.001 for BCD). The observed mode was positively correlated to the observed fHNA (r 2 = 0.52, p < 0.001; not shown). Thus, we also examined a potential link between the observed fHNA and the key model ecosystem functions as described above (i.e., linear 305 regression with an observed fHNA as a predictor and the modelled ecosystem functions as dependent variables). The observed fHNA was positively correlated to the modelled NPP (r 2 = 0.33, p < 0.001; Figure 7d), and to a stronger extent, to the modelled POC sinking flux (r 2 = 0.51, p < 0.001; Figure 7E) and to the modelled BCD (r 2 = 0.54, p < 0.001; Figure 7F). The stepwise addition of one predictor variable to the other predictor variable (i.e., fHNA adding to mode or vice versa) did not improve the model performance (not shown). These results suggest a clear link between the modelled ecosystem functions and observed 310 bacterial taxonomic (modes) and physiological (fHNA) traits observations.

Climate change experiments
We explored the response of the modelled bacterial dynamics and ecosystem functions (sections 3.2-3.3) to changing climate along the WAP (Figure 8). Due to a varying range of portability of the optimized model solution for each year, we used the climatological model parameter set (Table S6) to simulate an overall system response under perturbed ocean 315 temperature (+0.5°C and +1.0°C relative to observed temperatures) and sea-ice forcing fields (5% and 10% loss of sea-ice relative to observed sea-ice concentrations). These experiments were conducted under each perturbed environmental condition separately (i.e., warming alone in Figure S10 versus melting alone in Figure S11) and simultaneously (i.e., climate change experiments; Figure 8). We only focused on the results from the climate change experiments in this section, given that despite different impacts of each physical forcing changes (i.e., temperature impacts on rate processes versus sea ice impacts on light 320 and photosynthesis but not MLD) climate change would cause simultaneous changes in sea ice and water temperature along the WAP. The climate change experiments resulted in a combination of changes in overall bacterial stocks and rates, as well as the key ecosystem functions, and shifts in the seasonal timing or phenology ( Figure 8A), compared to the base state (first row as the base state and second and third rows as anomalies under perturbed conditions in Figure 8B). HNA bacterial stock and 325 rates responded more strongly to the perturbed climate conditions compared to LNA bacterial stock and rates. Under combined warming/melting (+1.0°C/-10%) conditions, there were maximum increases of the HNA variables by 19-35% (29 ± 89% for biomass, 22 ± 67% for LDOC uptake, 35 ± 111% for SDOC uptake, 26 ± 79% for respiration, 25 ± 78% for BP, 29 ± 89% for viral mortality, 19 ± 26% for grazing, and 29 ± 89% for RDOC excretion), compared to the maximum increases of the LNA variables by 3-15% (3 ± 2% for biomass, 6 ± 11% for LDOC uptake, 15 ± 27% for SDOC uptake, 8 ± 3% for respiration, 7 ± 330 6% for BP, 3 ± 2% for viral mortality, 7 ± 18% for grazing, and 3 ± 2% for RDOC excretion). In contrast to most bacterial variables that increased consistently throughout the growth season, microzooplankton grazing rates showed seasonally mixed responses for both HHA and LNA groups (i.e., the maximum decreases of 8 ± 32% for HNA and of 4 ± 32% for LNA).
Similarly, there were maximum increases of NPP and POC sinking flux by 14 ± 15% and 3 ± 22%, and maximum decreases by 4 ± 11% and 3 ± 13%, respectively. SDOC exhibited the maximum increase by 2 ± 1% early in the season but shortly 335 became depleted as the season progressed. LDOC decreased always in response to the perturbed conditions, with the maximum decrease by 10 ± 43%.

Model skill assessment
Despite the important biogeochemical role that bacteria play in the ocean, the vast majority of marine ecosystem 340 models neither include bacteria as a model compartment nor explicitly simulate bacterial processes. Most existing models parameterize the complex bacterial remineralization processes of the sinking organic matter with depth by using empirical relationships, such as, by fitting the power law functions, or other similarly-derived approaches and parameterizations (Buesseler et al. 2020;Cael and Bisson, 2018). Cellular functions, taxa, and functional gene expression of other prokaryotes, such as cyanobacteria (Hellweger 2010;Martín-Figueroa et al. 2000;Miller et al. 2013), or a diverse suite of microbial 345 functional groups (Coles et al. 2017;Dutkiewicz et al. 2020) have been modelled so far; however, our study serve as the first to explicitly model bacterial groups of different physiological traits.
In this study, only a subset of the model parameters was optimized to best simulate bacterial and other ecological patterns for each year, consistent with other data assimilation modelling studies (Friedrichs 2001;Friedrichs et al. 2006Friedrichs et al. , 2007Luo et al. 2010Luo et al. , 2012. In general, optimization of this class of marine ecosystem models requires adjustment of a small number 350 of independent model parameters to achieve well-posed model solutions, due to the highly cross-correlated nature of parameters in the inherently nonlinear model equations (Fennel et al. 2001;Harmon and Challenor 1997;Matear 1996;. Most of the constrained parameters in this study were directly associated with bacterial processes, with overall better model-observation fits for bacterial data types, giving confidence in the simulated bacterial C stocks and flows. Optimization also sheds light on major unknown parameters in the bacterial grazing process, including gHNA and gLNA 355 (half-saturation density of HNA and LNA bacteria in microzooplankton grazing, respectively). Microzooplankton grazing of the given bacterial group was simulated using Holling Type 2 density-dependent grazing with a preferential prey selection on diatoms, cryptophytes, and the other bacterial group, in which the single microzooplankton maximum grazing rate is implemented for both bacterial groups for model simplicity purposes (Text S1, Tables S2-6). Thus, it was the half-saturation density that determined the degree of preferential grazing by microzooplankton on a certain bacterial group, the change of 360 which would ultimately depend on biomass of each bacterial group. Due to the lack of a priori knowledge on the relative magnitude of gHNA and gLNA, we assigned the identical initial guess value (Table 1) to let the data assimilation scheme find the values that best fit overall observations of the bacterial group-specific dynamics. Compared to gLNA, smaller optimized gHNA (Tables S2-6) reflected preferential grazing of HNA cells by microzooplankton, consistent with previous speculations that grazers selectively remove larger and more active cells (Giorgio et al. 1996;Gonzalez et al. 1990;Sherr et al. 1992), so HNA 365 cells ). Together with the higher mean cell-specific grazing rates for the HNA group (section 3.2), our results suggest preferential grazing of HNA cells by microzooplankton.
In this study, the model portability index reflects the extent to which a single model framework represented by model parameters and equations captures the observed variability in different years, given variable environmental forcing and the accompanying shift in plankton ecosystem structure. The optimized model parameter set for 2012-13 was most portable and 370 the optimized parameter set for 2011-12 was the least portable (Table 2), in which the most (n = 7 out of total 11) and the least numbers (n = 5 out of total 11) of parameters were constrained (i.e., optimized with low uncertainties), respectively (Tables  S3-4). The other two years exhibited intermediate levels of portability, wth similar portability index values characterized by the same number of constrained parameters (n = 6 out of total 10 for 2010-11 and n = 6 out of total 12 for 2013-14; Tables S2,  S5). In other words, it was the number of well-constrained parameters that mattered most in driving high model portability, 375 suggesting the connection between overfitting and portability of optimized models in this study. Also, varying degrees of model portability across the years rendered it difficult to select one particular year's model solution representing the climatological dynamics, consistent with the characteristics of the original WAP-1D-VAR model. Instead, better model skill was found by utilizing parameters from assimilating climatological observations into a more general version of the model (section 4.4). 380

Bacterial carbon stocks and flows
Assimilating each bacterial group's biomass allows for the partitioning of the BP for each group as well as other physiological processes (e.g., SDOC uptake rates) that were never measured in this study. First, optimized models yielded in significantly higher cell-specific BP of the HNA group across all years, which could be attributed to the way the parameter 385 optimization was performed to keep higher maximum cell-specific growth rates of HNA cells. However, it should also be noted that the cell-specific BP rates were driven by biomass stocks that were determined from the modelled trophic interactions. As with phylogenetic groups (Fuchs et al. 2000;Teira et al. 2009;Yokokawa et al. 2004), cell-specific bacterial growth rates are expected to differ among distinct bacterial physiological groups, but there are limited studies focusing on group-specific cell activities (Gasol et al. 1999;Giorgio et al. 1996;Günter et al. 2008;Longnecker et al. 2005;Moràn et al. 2011). Moràn et 390 al (2011) showed that HNA cells greatly outgrew LNA cells in Waquoit Bay Estuary, with a cell-specific growth rate of up to 2.26 d -1 for HNA cells versus < 0.5 d -1 for LNA cells. Second, our model results revealed HNA group's significantly higher uptake rates of both LDOC and SDOC pools than their LNA counterparts. Several studies have demonstrated that HNA cells depend more than LNA cells on phytoplankton substrates for growth and metabolism (Li et al. 1995;Morán et al. 2007;Scharek and Latasa 2007). The hypothesis that WAP bacteria might rely on SDOC has received indirect support previously, 395 presumably due to LDOC limitation Kim and Ducklow 2016;Luria et al. 2017), but our study is the first to show the importance of the SDOC pool for HNA cells' C demand. Although much of the discussion focuses on bacteria, the model also captured well the rest of the ecosystem variables. Modelled nutrient stocks were above detect limits and indicated the lack of macronutrient limitations. The WAP typically exhibits strong interannual variability (Ducklow et al. 2007), but the lack of the strong interannual variability in model 400 microzooplankton biomass is due to assimilating climatological observations. One exception is krill biomass that was modelled 3-8 times larger than the maximum value from the available field data in 2017-18 (0.57 mmol C m -3 ; not shown). It should be noted that there were inconsistences in the nature of the assimilated data types, such as a single-year observation of microzooplankton (versus each year-specific observations of others) and two unassimilated data types including krill biomass. Also, there can be compensating errors in krill grazing rate and metabolism values given that krill are mobile laterally. These 405 observational limitations made it challenging to construct a complete microbial carbon budget without significant uncertainties. A more complete assimilation of zooplankton data should be the next effort to improve the model fits and minimize uncertainties in the bacterial variables and, therefore, to expand the current study.

Bacterial physiological and taxonomic association with ecosystem functions
The positive associations of the observed fHNA with the modelled NPP and POC sinking flux suggest a relatively 410 strong resource control on these actively-growing cells, compared to slowly-growing LNA cells. This is consistent with previous studies that showed increased HNA growth rates in response to enhanced phytoplankton-derived organic substrate (Morán et al. 2010) and more abundant HNA cells in areas or periods where bacterial assemblages were predominantly controlled by resources, rather than grazing (Morán et al. 2007). It has been hypothesized that due to minimal inputs of terrestrial organic matter, bacteria must ultimately rely on in situ NPP for organic matter source in the WAP (Ducklow et al. 415 2012b), supporting the importance of resource control on actively-growing bacteiral populations. In this study, modes 3, 5, and 7, characterized by copiotrophic taxa with large genomes and more 16S rRNA gene copies (Bowman et al. 2017), were associated with high modelled NPP, POC sinking flux, and BCD, while modes 4 and 6, characterized by taxa associated with more oligotrophic conditions, were associated with low modelled NPP, POC sinking flux, and BCD. Dokdonia sp. MED134, a common bacterial species of the modes associated with high NPP, POC sinking flux, 420 and BCD, is a proteorhodopsin-containing marine flavobacterium that grows faster with light (Gómez-Consarnau et al. 2007;Kimura et al. 2011) and in conditions under which resources are abundant (Gómez-Consarnau et al. 2007). Given the coastal WAP being primarily light-limited (Ducklow et al. 2012), the correspondence of D. Dokdonia MED134 to high modelled NPP suggests light-enhanced growth rates and cell yields from sufficient irradiance. By contrast, mode 4, dominated by Planktomarina temperata RCA23, is a slowly growing bacterium that specializes in using complex organic substrates (Giebel 425 et al. 2013). These attributes are consistent with high occurrence of mode 4 during the periods of low modelled NPP and POC sinking flux. Candidatus Pelagibacter, abundant in mode 6, is generally known as an oligotrophic specialist with a low DOC requirement, but often observed during the Antarctic phytoplankton blooms (Delmont et al. 2014;Luria et al. 2014), the characteristics of which support its occurrence during the periods of high modelled NPP. In summary, our study provides a novel framework connecting the dynamics of different ecosystem functions with 430 microbial physiology and taxonomy. Certain modes represent distinct WAP ecosystem states and the mode-state associations are reasonably explained from microbial perspectives. However, we did not investigate a seasonal succession and development in mode itself or the mode association of the key WAP ecosystem states. Future investigations should focus on including a few dominant or seasonally distinct modes in data assimilation, in order to fully resolve the seasonality of the mode-ecosystem state associations along the WAP. 435

Climate change experiments
The WAP has experienced significant atmospheric and ocean warming and resulting changes in marine ecosystem processes and further climate change is projected for the next several decades. The magnitudes of the perturbations used in the climate change experiments (+0.5º/+1.0ºC compared to observed temperature fields and -5%/-10% compared to observed seaice fields) are within the range of the long-term changes in temperature and sea-ice duration along the WAP continental shelf. 440 The temperature of the ACC water that has direct access to the WAP shelf has shown a large increase after the 1980s, equivalent to a uniform warming of the upper 300 m layer by 0.7ºC (Ducklow et al., 2012). The trend in the annual ice season duration is -1. Under combined warming/melting conditions, we expected that increased NPP and phytoplankton accumulations early in the season would result in a significant build-up of DOC pools. This was the case only for SDOC, and bacteria were soon LDOC-limited due to their preferential LDOC uptake for their primary carbon source. The growth of bacteria and increased bacterial rates during LDOC limitation was because bacteria depended on SDOC to meet the rest of their carbon demand, resulting in the depletion of SDOC pool later in the season. In other words, bacteria were more likely resource-limited, 450 in particular by the labile DOC pool, and SDOC subsequently played an increasingly important role. This change was particularly important in HNA cells, as shown by a relatively large increase of their BCD via SDOC, compared to LNA cells. Temperature is often regarded as a major factor regulating bacterial physiological rates by changing the rate of enzymatic reactions (Kirchman et al. 2009;White et al. 1991). In this study the modelled stocks and rates of HNA cells increased under the warming alone experiment ( Figure S10) but equally or more than under the melting alone experiment (i.e., increased 455 photosynthesis and resource availability; Figure S11). This suggests that temperature per se is not necessarily a more important limiting factor for bacterial, at least HNA, growth than resource availability (Ducklow et al. 2012a), and warming may rather enhance HNA utilization of the already increased organic matter from the increased productivity. Also, future climate may impact the (re)distribution of bacterial taxonomic groups, with a potential shift to more abundant HNA cells in the WAP bacterial communities, due to their preferential SDOC utilization. 460 The major limitation of our climate change experiments is the duration of the simulations. An ideal set of climate change simulations should be performed for longer-term periods as well as continuous across many years, not simply limited to growth seasons. This study could not accommodate these requirements due to the limited observations and existing data gaps in each year. Despite this limitation, we were able to validate the climatological model's capacity to partly reproduce the already observed, climate-driven trends of some variables along the WAP. Under each year's forcing fields, the climatological 465 model parameter set reproduced the interannual variability fairly well compared to the observed interannual variability, except for a few cases (e.g., overestimated BP and HNA biomass in 2011-12, underestimated PP in 2012-13 and 2013-14; Table S7), providing confidence in its usage for climate change impacts. 2011-12 was characterized by the negative temperature anomaly (-0.13 ± 0.83ºC versus 0.03 ± 0.84ºC for the 4-year climatology) and the positive sea-ice anomaly (24 ± 38% versus 21 ± 29% for the 4-year climatology), with lower temperature and higher sea-ice cover than the other three years (all p < 0.05, two-470 sample t-test). This coldest year had the lowest values of BP, HNA biomass, and PP observations (Table S7), consistent with increases in the modelled BP, HNA biomass, and PP under the combined warming/melting conditions. A combination of low HNA biomass, low PP, and low POC flux was also modelled in 2011-12, largely responsible for driving the positive association of the observed fHNA with the modelled NPP and POC sinking across years (section 4.3). Sea ice did not retreat until mid-December in 2011-12 ( Figure S1), and due to subsequently low light levels PP was modelled to be low. The low modelled PP 475 drove both low HNA biomass and low particle export, reinforcing the strong resource control on fast-growing bacterial populations and the conventional "high PP-high export" paradigm along the WAP.
Finally, our climate change simulations share similar results with those performed with the WAP-1D-VAR model with one bacterial compartment (Kim et al. 2021). In the original model, simultaneous combined warming and reduced seaice conditions resulted in increased NPP, net community production, POC sinking flux, bulk bacterial productivity and 480 biomass, and SDOC, in contrast to LDOC that was strongly limited early in the season. This potential shift to a more productive and efficient export system state is partially in agreement with suggestions made by previous studies that warming may induce more recycling favourable and microbial-dominated food-webs (Moline et al. 2004;Sailley et al. 2013). Despite the increased productivity and plankton accumulations, LDOC may become strongly depleted and, therefore, bacteria may need to depend more on SDOC to meet a significant part of their carbon demand (i.e., an increasing important role of SDOC for bulk bacterial 485 communities). Most of these results convey the same story as this study's climate change experiments, thereby adding confidence in the climate change simulations. However, it should be noted that the increased complexity of bacterial dynamics in the modified model adds two important contributions to the original model including: 1) the dominance of the HNA group over the LNA group in the warming WAP waters and 2) bacterial taxonomic (i.e., mode) and physiological (i.e., fHNA) traits being a significant indicator of key WAP ecosystem functions. 490

Conclusions
Heterotrophic microbial diversity has seldom been considered in detail in the formulation and analysis of marine pelagic ecosystem models reflecting in part the lack of suitable field data for model evaluation. Utilizing genomic products to prescribe taxonomic aspects of the bacterial model dynamics, this study investigated the association of bacterial abundance with different physiological states, bacterial community structure and key ecosystem functions. The modelling approach used in the present 495 study enabled the observations in different bacterial populations to constrain the group-specific processes and model parameters that have been poorly understood. These included the partitioning of BP specific to HNA and LNA groups, the partitioning of the bacterial uptake of DOC pools with different lability, and the half-saturation density of each bacterial group in microzooplankton grazing. The model also served as an effective platform to explore the WAP microbial response to changing climate conditions, in which warming and decreasing sea ice would induce a potential shift to the dominance of HNA 500 bacteria in more productive waters due to their increasing dependence on SDOC.

Code availability
The model simulation results and codes are available in a NetCDF data structure and Fortran 90 at HHK's GitHub data repository (https://github.com/hyewon-kim-whoi). 505

Data availability
Complete Palmer LTER time-series data used for data assimilation are available online (http://pal.lternet.edu/data). Surface downward solar radiation flux data used for physical forcing of the model simulations can be found in the National Centers for Environmental Prediction website (https://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanalysis.surface.html). The 510 Tangent linear and Adjoint Model Compiler (TAPENADE) used to construct an adjoint model is available online (http://wwwsop.inria.fr/tropics/).

Author contribution
HHK designed the research, performed model simulations, and wrote the manuscript. JSB provided observational data and 515 helped data analyses and interpretation. HWD, OMS, and DKS provided observational data. YWL contributed model simulations. SCD supervised the research and significantly revised the manuscript.

Competing interest
The authors declare that they have no conflict of interest. 520

Acknowledgements
This study leverages the wealth of marine data collected by the Palmer Long-Term Ecological Research (Palmer LTER) program along the west Antarctic Peninsula, and the authors thank the scientists, students, technicians, station support and logistical staff, and ship captains, officers and crew involved. This research was supported, in part, by the U.S. National Science 525 Foundation Office of Polar Programs through award NSF PLR-1440435 and the U.S. National Aeronautics and Space Administration Ocean Biology and Biogeochemistry Program through award NASA NNX14AL86G. HK was also supported by the Investment in Science Fund and the Reuben F. and Elizabeth B. Richards Endowed Fund from Woods Hole Oceanographic Institution. • Maximum photosynthesis rate: P C MAX = μDA ´ Tf ´ min(Nf,DA, Pf,DA) (A.2.6) • C-specific gross primary production: • Limitation on N and P uptake: • Chlorophyll production: • Respiration: • Maximum primary production rate: P C MAX = μCR ´ Tf ´ min(Nf,CR, Pf,CR) (A.3.6) • C-specific gross primary production: • Limitation on N and P uptake: 3.14) 620 • Respiration: • POM production by aggregation: • Grazing by microzooplankton:

Bacterial processes (for both HNA and LNA groups)
• Cellular quota ( if Nf,BAC < 1, • Bacterial P uptake: (A.4.24) 690 • Respiration:  .4.37) elseif Q C N,BAC < q C N,BAC and Q P N,BAC < q C N,BAC/q C P,BAC (i.e.,N in short) E • SDOM excretion to adjust stoichiometry: MZ (A.5.14) 760 • Remineralization of inorganic nutrients: • Respiration:  KR/q C N,KR (A.6.10) E P KR,SDOP,1 = (1 -fex,KR) ´ exKR ´ G P KR ´ Q C P,KR/q C P,KR (A.6.11) 800 • SDOM excretion to adjust stoichiometry: KR (A.6.13) E P KR,SDOP,2 = 0.5 ´ E C KR,SDOC,2 ´ Q C P,KR (A.6.14) 805 • Remineralization of inorganic nutrients: • Respiration:      Map units are colored and numbered according to taxonomic mode membership (a). Location of samples used in this study within the map (b). The map was trained with a larger set of samples, here, only those samples for which BP and flow cytometry data were available (those samples used in this study) are shown. Mode boundaries are the same as in (a). Each sample was placed within 1115 the map unit that had the most similar community structure, however, the position of each samples within the map unit is random. Relative abundance of the most abundant taxa in the microbial community structure dataset in each map unit after training (c-h). For example, P. ubique HTCC1062 (c) dominated samples associated with Mode 6, while Ca. Thioglobus singularis PS1 (e) dominated samples associated with Mode 1. The boundaries across all panels are as in (a).     Table 2: Data types, observed means, coefficient of variation, target errors, and costs before and after optimization. The observed mean ( " # ), coefficient of variation (CV), and target error (s) of each assimilated data type used for calculating the normalized cost function 1150 (unitless; Eq. 5) before (J'0) and after optimization (J'f). Data type unit: mmol m -3 for nitrate (NO3), phosphate (PO4); mmol C m -3 for diatom chlorophyll (ChlDA), cryptophyte chlorophyll (ChlCR), HNA and LNA bacterial biomass, SDOC, and POC; mmol N m -3 for PON; and mmol C m -3 d -1 for primary production (PP) and bacterial production (BP). NO3 was not assimilated in 2010-11, while SDOC, POC, and PON were not assimilated in 2012-13 and 2013-14 (denoted as '-' in the table  Table 3: Cross-validation cost and portability index. J'c as the normalized optimized cost from the climatological model (equivalent to J'f under the climatological model parameter set in Table 1) and J'x as the normalized cross-validation cost (Eq. 7) where, for example, J'x,  under 2010-11 model parameter set indicates the normalized cross-validation cost from simulating 2010-11 model parameter set against 2011-12.