As three-dimensional (3-D) aquatic ecosystem models are used more frequently for operational water quality forecasts and ecological management decisions, it is important to understand the relative strengths and limitations of existing 3-D models of varying spatial resolution and biogeochemical complexity. To this end, 2-year simulations of the Chesapeake Bay from eight hydrodynamic-oxygen models have been statistically compared to each other and to historical monitoring data. Results show that although models have difficulty resolving the variables typically thought to be the main drivers of dissolved oxygen variability (stratification, nutrients, and chlorophyll), all eight models have significant skill in reproducing the mean and seasonal variability of dissolved oxygen. In addition, models with constant net respiration rates independent of nutrient supply and temperature reproduced observed dissolved oxygen concentrations about as well as much more complex, nutrient-dependent biogeochemical models. This finding has significant ramifications for short-term hypoxia forecasts in the Chesapeake Bay, which may be possible with very simple oxygen parameterizations, in contrast to the more complex full biogeochemical models required for scenario-based forecasting. However, models have difficulty simulating correct density and oxygen mixed layer depths, which are important ecologically in terms of habitat compression. Observations indicate a much stronger correlation between the depths of the top of the pycnocline and oxycline than between their maximum vertical gradients, highlighting the importance of the mixing depth in defining the region of aerobic habitat in the Chesapeake Bay when low-oxygen bottom waters are present. Improvement in hypoxia simulations will thus depend more on the ability of models to reproduce the correct mean and variability of the depth of the physically driven surface mixed layer than the precise magnitude of the vertical density gradient.
Since the middle of the last century, anthropogenic impacts
have dramatically decreased water quality throughout the Chesapeake Bay
(Boesch et al., 2001), one of the largest estuaries in North America.
Land-use change along with the industrialization and urbanization of the
Chesapeake Bay watershed have caused dramatic increases in nutrient inputs to
the bay (Kemp et al., 2005), spurring additional primary production and
phytoplankton abundance (Harding Jr. and Perry, 1997). Because increased
primary production leads to more organic matter throughout the water column
that is eventually decomposed by bacteria, these increased nutrient inputs to
the bay have led to a corresponding decrease in dissolved oxygen (DO)
concentrations (Hagy et al., 2004). Hypoxia, generally defined as the
condition in which DO concentrations are below 2
Interest in the ecological impacts of reduced DO concentrations has been
elevated due to the observed proliferation of hypoxic events in the world's
coastal oceans, creating vast dead zone areas that compress suitable habitats
for many marine species (Diaz, 2001; Diaz and Rosenberg, 2008; Pierson et
al., 2009). Low-DO waters can greatly impact the abundance and health of
important ecological species, potentially resulting in suffocation and major
kills of fish, crabs, and shellfish (Breitburg, 2002; Ekau et al., 2010;
Levin et al., 2009). While the presence of DO concentrations
Map of the Chesapeake Bay and its watershed.
Improving the health of the Chesapeake Bay has become a priority for the Environmental Protection Agency (EPA) along with the six states and Washington, DC that make up the bay watershed (Fig. 1), and together they have committed to utilizing a suite of regulatory models to inform their management decisions (USEPA, 2010). The Chesapeake Bay Program (CBP), a regional partnership that has led and directed the restoration of the Chesapeake Bay since 1983, has undertaken an extensive modeling effort of the bay (Cerco and Cole, 1993; Cerco et al., 2002; Cerco and Noel, 2004, 2013). This modeling system is being used by the CBP to estimate the aggregate effect of changes in management practices, including land use, atmospheric deposition, animal populations, and fertilizer and manure application. Recently, the modeling system has been used to conduct scenario simulations to assess management actions needed to achieve desired bay water quality standards (USEPA, 2010). Ultimately this model was used to establish a regulatory set of total maximum daily loads of nutrients and sediment delivered from the watershed, with the goal of significantly improving water quality throughout the bay (USEPA, 2010).
Many 3-D hydrodynamic-oxygen models of varying complexity stemming from the academic research community have also been used to simulate DO concentrations throughout the Chesapeake Bay (Scully, 2010, 2013; Hong and Shen, 2013; Feng et al., 2015; Testa et al., 2014; Y. Li et al., 2015). Bever et al. (2013) specifically demonstrated that multiple models of varying complexity are able to generate skillful estimates of hypoxic volume in the bay. Some of these models are being used in the bay to simulate short-term and/or seasonal forecasts of DO conditions. Furthermore, some models are also being used to generate scenario forecasts, or projections, that assess the impact of changes in management practices on estuarine DO concentrations, in some cases taking into account the impacts of future changes in climate.
As ecosystem and water quality models are increasingly used for operational forecasts as well as scenario-based management decisions by the regulatory and academic research communities, it is important to understand the relative strengths and limitations of existing models of varying complexity. The ability to discern which variables must be most accurately simulated in order to adequately reproduce the temporal and spatial variability of bay oxygen concentrations is a necessary prerequisite for fully understanding how volumes of low-DO water are initiated and sustained within water quality models. The utilization of multiple models can also inform projections by providing independent confidence bounds for management decisions. To those ends, the overarching goals of this research are to compare the relative skill of various 3-D Chesapeake Bay models characterized by different levels of biogeochemical complexity and spatial resolution, to better understand factors limiting their ability to reproduce observed DO distributions, and to suggest approaches for the continued improvement of these models.
Eight 3-D models were evaluated in this study (Table 1), each of which includes hydrodynamic and DO components. Among the eight models, there are four different hydrodynamic base models. Models B, C, D, F, and G utilize the Regional Ocean Modeling System (ROMS; Shchepetkin and McWilliams, 2005; Haidvogel et al., 2008) that employs a structured grid with sigma layers in the vertical dimension. Specifically, Models B, C, and F use a ROMS implementation developed for the Chesapeake Bay based on Xu et al. (2012; ChesROMS). Model D employs a ROMS implementation for the Chesapeake Bay based on M. Li et al. (2005), while Model G uses the ROMS-based Chesapeake Bay Operational Forecast System (CBOFS; Lanerolle et al., 2011). Models A, E, and H each use a different hydrodynamic base model: the Curvilinear Hydrodynamics in Three Dimensions model (CH3D; Cerco et al., 2010), the Finite-Volume Community Ocean Model (FVCOM; Jiang and Xia, 2016), and the Hydrodynamic Eutrophication Model – Environmental Fluid Dynamics Code (EFDC; Park et al., 1995; Hong and Shen, 2012; Du and Shen, 2015), respectively. The only model that employs a non-sigma vertical grid is Model A and the only model utilizing an unstructured horizontal grid is Model E. While Model E contains 10 sigma vertical layers, all of the other sigma grids use 20 layers. All of the grids vary in terms of their horizontal resolution, with Models A and G utilizing the highest resolution horizontal grids.
Model characteristics.
These four hydrodynamic models are coupled to five different models used to simulate DO (Table 1). Models A, B, C, D, and E utilize full biogeochemical models that include various combinations of oxygen, phytoplankton, zooplankton, and multiple inorganic and organic nutrients as state variables. Specifically, Models A and E employ a version of the Integrated Compartment Model (ICM; Cerco et al., 2010; Jiang et al., 2015), Model B uses the Estuarine Carbon Biogeochemistry model (ECB; Feng et al., 2015), Model C uses the Biogeochemistry model (BGC; Brown et al., 2013), and Model D uses the Row–Column AESOP model (RCA; Testa et al., 2014). In terms of food web complexity the models vary considerably: Models B and C employ a single phytoplankton group whereas Model D uses two phytoplankton groups, Model E uses three, and Model A, the most complex of the participating models, uses five.
In contrast to the full biogeochemical models discussed above (Models A through E), Models F, G, and H represent oxygen dynamics as simply as possible and therefore do not utilize a full biogeochemical component. Rather, the models impose a biological oxygen consumption rate that is model-specific, but constant in both space and time. This component is referred to as a constant-respiration model (CRM). In this model, DO is introduced to the estuary via the river and ocean boundaries and is set to saturation at the estuarine surface. This constant-respiration oxygen parameterization (Scully, 2010) is simplistic, yet has been shown to adequately represent Chesapeake Bay oxygen dynamics (Scully, 2010, 2013; Bever et al., 2013).
The major difference in forcing between the eight model implementations is that Models A and B use riverine input derived from watershed models, whereas Models C–H used the measured flow from United States Geological Survey gauging stations, extrapolated using various techniques. Model A utilized the CBP's regulatory watershed model (Shenk and Linker, 2013), while Model B utilized the Dynamic Land Ecosystem Model (Yang et al., 2015a, b; Tian et al., 2015). At the open boundary with the Atlantic Ocean, Models B, C, D, F, G, and H utilize a sub-tidal elevation extrapolated from tidal stations on either side of the open boundary. Model E uses the TPXO tidal model, while Model A uses a mix of observational and model forcing (Cerco et al., 2010). While Model B utilizes wind forcing based on observations from the Thomas Point Light, Models C through H use wind estimates from the North American Regional Reanalysis (NARR).
The eight models used in this analysis have been developed for a variety of
purposes. Model A is a governmental regulatory model developed by the CBP
that has been extensively calibrated specifically to examine water quality
issues in the Chesapeake Bay (Cerco and Cole, 1993; Cerco and Noel, 2004,
2013; Cerco et al., 2010) and has been used in the development of the 2010
Chesapeake Bay Total Maximum Daily Load (USEPA, 2010). The National Oceanic
and Atmospheric Administration employs the hydrodynamic component of Model F
for operational forecasts of a variety of physical estuarine parameters for
the Chesapeake Bay
(
Finally, a ninth model is calculated as the mean of the results from the eight models described above, and is referred to here as Model Mean, or Model M.
Model simulations were compared to cruise data from the CBP for 2004 and 2005
from 13 stations along the main stem of the bay (Table 2, Fig. 2). The years
2004 and 2005 were selected to represent relatively wet and average years,
respectively, and the 13 stations were chosen as they have been found to
offer optimal estimates of bay-wide hypoxic volume (Bever et al., 2013).
Stations were sampled on up to 34 cruises over the 2 years (Table 2),
generally twice a month from April to August and once a month for the
remainder of the year. Observational data can be downloaded from the CBP
Water Quality Database (
Characteristics of observation stations (from USEPA, 2012).
Location of the CBP water quality monitoring stations used in this study.
Stratification of the density and oxygen fields was examined to identify the
maximum gradient of the pycnocline and oxycline as well as the depth of the
top of the pycnocline and oxycline. In open ocean studies, the depth of the
top of stratification is commonly referred to as the mixed layer depth (MLD),
although this term is less frequently used in the estuarine literature. As
the research presented here distinguishes between the depths of the top of
the pycnocline and that of the oxycline, these will be referred to
respectively as the density (
The CBP defines the top and bottom of stratification in order to distinguish
individual designated use areas for water quality management purposes (USEPA,
2004). They suggest that the top of the pycnocline be defined as the
shallowest occurrence of a density gradient of 0.1
While the CBP definition described above delineates between designated use
boundaries according to density, our research focuses on the relationship
between the pycnocline and oxycline, requiring an alternate definition that
can be applied to both the density and oxygen distributions. In addition, the
CBP definition often generates estimates for the depth of the top of the
pycnocline that are too shallow compared to the maximum depth of surface
mixing (Fig. 3). As a result, a percentage threshold criterion was developed
that identifies the bottom of the reasonably well-mixed layer, rather than
perfectly mixed layer, and is used in this analysis. The percentage threshold
method defines a density or DO profile as being stratified if a change of
10 % of the difference between the profile's maximum and minimum values
occurs within a single meter (Fig. 3). For example, if the maximum DO
concentration throughout the water column on an individual sampling date is
10
Density and dissolved oxygen profiles for a mid-bay station (CB4.1C)
on
Simulations of the Chesapeake Bay from the eight models described above were statistically compared to historical monitoring data using a variety of skill metrics including root-mean squared difference (RMSD), bias, standard deviation, and correlation coefficient. These metrics are illustrated on Taylor and target diagrams (Taylor, 2001; Hofmann et al., 2008; Jolliff et al., 2009), which offer a compact way of assessing model skill by displaying a number of different skill metrics. Target diagrams illustrate the bias and total RMSD of model output, which Taylor diagrams do not. Taylor diagrams include quantitative information on the standard deviations and correlations between the model output and the observations, which target diagrams do not. Both diagrams, however, represent unbiased RMSD, sometimes called “centered-pattern RMSD”. On target diagrams, a model symbol above the horizontal axis overestimates the mean of the observations and a model symbol to the right of the vertical axis overestimates the variability of the observations. (See Hofmann et al. (2008) and Jolliff et al. (2009) for a more detailed description of these diagrams.) On Taylor diagrams, a model symbol lying on the horizontal axis exactly correlates to the observations and a model symbol further from the origin than the observation symbol overestimates the standard deviation of the observations. (See Taylor (2001) for a more detailed description of these diagrams.)
Taylor and target diagrams presented here are normalized to the standard deviation of the observations, allowing multiple variables be represented on the same plot. This also conveniently allows the unit circle on a target diagram to represent the skill of a model defined as the mean of the observations. In effect, this means that if a model falls within the unit circle, it exhibits a skill that is greater than the skill obtained if one were to simply use the mean of the observations. The Taylor and target plots are either temporal (displaying model skill at a single station over the study period) or spatial (displaying model skill during a single month over the entire set of study stations). In addition, summary diagrams are presented which combine both temporal (examining the seasonal changes at each individual station) and spatial (examining differences across the bay during an individual month) variability.
Model skill was assessed using the hourly model output (daily for CH3D-ICM chlorophyll and nitrate) that was nearest in time to that of the observation and from the grid cell that encompassed the observation location. For months with two observations, each observation was individually matched to the model output and the skill statistics from those comparisons were averaged for that month. The native horizontal resolution and bathymetry of the individual model grids was preserved in the comparison so as not to bias the analysis through varying interpolation methodologies. For stratification variables, the models and observations were interpolated to a 1 m vertical grid that extended only as deep as the individual models' bathymetry or deepest observation in order to preserve the differences in bathymetric grids while allowing for a direct comparison of the observations to the models. Model–data comparisons at the bottom of the water column were not necessarily based on the same depths, since in many cases the modeled bathymetry was shallower (or at times, deeper) than the deepest data point at a given station. In order to avoid issues with extrapolation and/or grid stretching, data at the bottom of the water column were always compared with model estimates from the deepest grid cell provided by each particular model. Model–data comparisons for stratification and mixed layer depths only included stations and times for which stratification was defined to exist in both the observed and simulated fields.
An analysis of model skill of the combined temporal and spatial variability
of DO at the surface and bottom of the water column, as well as at the
observed MLD
The monthly temporal variability of bottom DO at each station over the 2 years studied is resolved similarly well by all of the models (Fig. 5a), but the models have difficulty simulating spatial DO variability during each month (Fig. 5b). Due to the stations chosen for this analysis (Fig. 2), the spatial variability being examined here is essentially the north to south variability. Most models exhibit a latitudinal gradient with respect to their skill in reproducing the temporal variability of bottom DO, with models overestimating DO at the more northern stations (Fig. 5a). Some models differ in their ability to reproduce summer (May–September) DO concentrations and winter (October–April) DO concentrations (Fig. 5b). Models B, F, and G all distinctively overestimate mean DO in the summer compared to the winter. In contrast, Models A and C perform similarly well in both seasons (Fig. 5b). In addition, all three constant respiration models, as well as Models D and E, substantially underestimate DO at several stations in the winter.
Normalized summary
All eight models generally resolve the pycnocline and oxycline with similar
skill (Fig. 6). All models consistently underestimate the mean and standard
deviation of the maximum strength of stratification within the pycnocline and
oxycline, defined herein as the maximum vertical gradients of density and
oxygen (Fig. 6a). All models, except for Model A (see Sect. 4.2), also
underestimate the mixed layer depth, regardless of whether it is computed in
terms of density or oxygen. (Note that these model symbols in Fig. 6a are
located above the
Normalized target diagrams for Models A–H demonstrating the
Normalized summary
All eight models are also characterized by similar skill in representing the
temporal and spatial variability of density stratification and MLD
All eight models reproduce the variability of bottom DO better than the variables that are generally thought of as being the primary drivers of hypoxic conditions, including stratification (Fig. 6), salinity, chlorophyll, and nitrate (Fig. 8, Table 3). However, all models reproduce patterns in temperature across the bay and through time better than any of the other variables in this model comparison (Fig. 8). All eight models, as well as the Model Mean, are characterized by very low bias in modeled temperature, and correlation coefficients of approximately 0.99; this high skill results from the very strong and predictable seasonal temperature variability. Even though the five models with full biogeochemical components (Models A, B, C, D, E) are characterized by large differences in their mechanistic approaches to modeling nitrate and chlorophyll, they produce similar total RMSDs for all of the variables examined at both the surface and the bottom (Table 3).
The mean of the eight models (Model M) has a higher model skill (lower RMSD) than any individual model across nearly every variable examined (Table 3). In addition, for nearly all observations at all stations, the 95 % confidence interval of all model hindcasts encapsulates the observed bottom DO concentration (Fig. 9), even though any individual model may overestimate or underestimate observed DO. Models generally fall into greater agreement during the summer, when DO is low, and into lesser agreement in the winter when DO is replete. While this study does not allow for a true interannual comparison, it is interesting that at station CB4.1C the model ensemble closely matches the timing of the drawdown of DO in the spring of 2004 (Fig. 9), whereas it produces a summer rather than spring initiation of hypoxic conditions in 2005. In addition, the model ensemble produces a premature relaxing of hypoxic conditions for both years at this observation station.
In order to better understand the impact of stratification on DO
concentrations throughout the water column, the relationship between the
observed pycnocline strength and MLD
Normalized target diagrams for Models A–H demonstrating the
Mean and standard deviation (SD) of observations and total normalized RMSD for each model. RMSD for each model except when not applicable (N/A).
Pycnocline and oxycline correlation statistics (all correlations
have
In examining the eight 3-D models in this study, there is not a
statistical difference between the ability of simple and complex models to
simulate the mean and monthly variability of bottom DO; in addition, models
with higher spatial resolution do not necessarily produce better estimates of
DO.
Models currently simulating hypoxia throughout Chesapeake Bay compute oxygen concentrations in essentially two distinct ways: they either utilize a simple constant respiration model or a full biogeochemical model. In this study, the relative skill of both types of models is compared. Specifically, in examining results of the comparison between five biogeochemical models (A, B, C, D, E) and three simplistic constant respiration models (F, G, H), the two groups of models performed statistically similar in their skill of reproducing bottom DO concentrations (Fig. 3, Table 3). These results support those of Bever et al. (2013) who compared three constant respiration models with the CBP regulatory model (Model A) and similarly found that all four of the models were equally skillful in terms of reproducing the seasonal variability in bottom DO throughout the bay in 2004 and 2005. Consistent with the results of Scully (2013), this result implies that the seasonal variability of DO in the Chesapeake Bay is primarily dependent on underlying hydrodynamic mechanisms which are nearly identical for all eight models, rather than on aspects related to the biogeochemical cycling which vary dramatically between models and in fact are constant in three of the eight models. It should be noted, however, that the 2 years studied here were relatively wet years and an analysis of dry years may offer different results.
Many previous studies have examined the costs and benefits of adding complexity to biogeochemical models. For example, increasing biogeochemical complexity has been found to improve skill in some biogeochemical data assimilative parameter optimization studies (Friedrichs et al., 2006, 2007; Lehmann et al., 2009; Bagniewski et al., 2011; Ward et al., 2013; Xiao and Friedrichs, 2014). The additional parameters associated with increased complexity generally provide more parameters that are available for additional tuning and subsequent improved model–data agreement. This is in contrast to the results of this analysis demonstrating that increased biogeochemical complexity does not necessarily improve model–data agreement. In this case, the increase in model complexity has likely outpaced the ability of the researchers to fully tune the model to the available observations. However, even past studies that have invoked formal parameter optimization methodologies, such as genetic algorithms and variational adjoint methods (Friedrichs et al., 2007; Ward et al., 2010; Xiao and Friedrichs, 2014), have found that under certain conditions, adding too much complexity does not necessarily improve model skill and in fact can decrease model skill and portability, primarily due to artifacts resulting from overtuning. This mirrors findings from the larger ecosystem modeling community where the best-fit models are often those with intermediate complexity (Fulton et al., 2003).
In this study, horizontal grid resolution differed significantly between
model implementations, with the most highly resolved grid (Model G) including
more than 9 times more grid cells than the lower resolution grids
(Table 1). A certain degree of resolution is clearly required to successfully
simulate dynamic processes, and a model with 8–10
Normalized summary
Time series of bottom dissolved concentrations for station CB4.1C. Red dots represent the 34 observations made during 2004–2005. Grey lines are the individual model simulations. The dark blue line represents the Model Mean while the cyan line represents the 95 % confidence interval of the model simulations.
Scatter plots comparing observations of
Accurately simulating the observed spatial variability of DO (Fig. 4b) was a greater challenge than simulating the temporal variability of DO (Fig. 4a) for all eight models participating in this intercomparison. This is especially true in the winter months when the vast majority of the bay is oxygen replete and the models have difficulty representing the observed variability from station to station. The majority of the models tend to slightly overestimate mean bottom DO in the summer whereas multiple models (e.g., Models D, E, F, and G) exhibit a strong negative bias during January and/or February of 2005, primarily at stations in the middle to southern portion of the bay's deep channel. Interestingly, increased biological complexity and higher grid resolution do not completely resolve this issue, as this is true for models utilizing full biogeochemical models (Models D, E) as well as those using highly resolved model grids (Model G). This is likely due to the ephemeral nature of the biological divers of DO.
The strong performance of the constant respiration models implies that these models may be excellent candidates for providing short-term bottom oxygen forecasts. The high DO skill of the CRM models primarily results from the fact that seasonal variations in physical processes (primarily wind mixing and temperature) play a dominant role in controlling the seasonal cycle of oxygen (Scully, 2013). Because the underlying hydrodynamic models all use similar physical forcing, the constant respiration models are able to simulate the seasonal cycle of DO with similar skill as the more complex biogeochemical models. As a result, these simple models that are easier to tune and require less in the way of computational resources than full biogeochemical models, may be efficiently used to produce short-term (on the order of days) DO forecasts. On the contrary, the more complex full biogeochemical models will be necessary for scenario-based and long-term (on the order of months to years) forecasting which requires that models respond to prescribed changes in the biogeochemical environment, such as increased rates of nutrient loading due to changes in land use, land cover, and/or climate.
Overall, model DO skill is greater than that of the variables generally
considered to drive DO variability, such as stratification, salinity, mixed
layer depth, chlorophyll, and nitrate; only modeled temperature has higher
skill than modeled DO.
Since dissolved oxygen concentrations in the Chesapeake Bay are controlled by physical processes (e.g., advection, wind mixing, heating/cooling, and stratification), as well as biological processes (e.g., photosynthesis and respiration), it is critical to understand the skill of the models in terms of how well they reproduce the many factors influencing oxygen concentrations. As expected, the five models containing a specific biogeochemical model component had more difficulty simulating the observed chlorophyll and nitrate concentrations than the physical variables (temperature and salinity), both at the surface (Table 3) and the bottom (Fig. 8). Replicating the correct location, magnitude, and timing of phytoplankton blooms and nutrient cycling is a complex issue, and as a result, these features are generally not well simulated in the models. While the models generally simulate the total amount of chlorophyll adequately, it is more uniformly spatially distributed in the models rather than in patchy blooms as in nature, leading to the underestimation of chlorophyll variability across all models. Although all models produced a relatively high correlation between observed and modeled temperature and salinity (Fig. 8), the correlation coefficients for chlorophyll and nitrate were much lower. The correlations for observed vs. modeled DO was more similar to that of the physical variables (temperature, salinity) than the biological variables (chlorophyll and nitrate), highlighting that the seasonal variability in bottom DO is regulated more by physical than biological factors. This also explains the success of the constant respiration models, which by definition contain no biological variability yet reproduce DO variability nearly as well as the most complex biogeochemical models.
In this study, model skill was also considerably higher for bottom oxygen than it was for the vertical gradient of stratification and mixed layer depths (Figs. 6, 8). The underestimation of the vertical gradient across all models is largely due to the numerical diffusion that characterizes all of these hydrodynamic models, but may also be partially due to an underestimation of the winds or a lack of diffuse freshwater input around the bay. Even though the models all underestimated the strength of stratification (Figs. 4, 6), modeled stratification in summer was strong enough to prevent mixing with the relatively well-oxygenated surface waters. This result suggests, somewhat surprisingly, that simulating the correct vertical gradient of stratification is not absolutely necessary for skillful bottom DO simulations. Models need only simulate enough stratification to effectively cut off vertical mixing in order to develop an isolated bottom layer that can then experience a draw down in oxygen via respiration. In addition, the models must also correctly simulate the horizontal advection of oxygen (Scully, 2013; Y. Li et al., 2015). The fact that bottom DO is simulated so well by the eight models analyzed here suggests that not only is the advection of oxygen well represented in the models but also the strength of stratification, i.e., the maximum vertical gradients of density and oxygen, produced by these models is sufficient. Thus, although novel and somewhat unexpected, these results are not contradictory to previous studies demonstrating the importance stratification plays in initiating summer hypoxia in the Chesapeake Bay (Murphy et al., 2011).
Model skill in terms of reproducing observed mixed layer depths was likewise
much lower than model skill of reproducing observed oxygen concentrations.
All models, except Model A, produced mixed layer depths (MLD
The two variables for which the models have greatest skill are DO and
temperature (Fig. 8). This is because oxygen variability is driven primarily
by seasonal variability in physical processes such as solubility and wind
mixing and to a lesser degree by variability in oxygen consumption (Scully,
2013). As a result, the models using a constant mean respiration rate produce
as realistic hypoxia simulations as the biogeochemically complex models.
Observations clearly show this strong seasonal variability in bottom DO
(Fig. 11a) and, to a slightly lesser extent, clear seasonal variability in DO
at the bottom of the bottom of the oxygen mixed layer (MLD
Time series of observations at station CB4.1C from 2003 to 2006 for
Most of the aerobic habitat in the bay during the summer is located above the
MLD
On average, the models miss the observed depth of the MLD
Time series of observations of dissolved oxygen and MLD
The impact of habitat compression can be substantial, as many bay species
require DO concentrations well above the traditional hypoxic threshold
(USEPA, 2010). While not all of the main stem stations develop hypoxic water
each year, most mesohaline stations experience a dramatic drawdown of oxygen
to levels during the summer that effectively remove a large portion of the
bay from habitable space (Murphy et al., 2011; Schlenger et al., 2013).
Studies have shown that some species modify their behavior based on the
oxycline depth, which acts to constrict the habitable space in the water
column (Prince and Goodyear, 2006; Pierson et al., 2009; Elliott et
al., 2013). Since species can be negatively impacted by low-DO concentrations
as high as 5
To better simulate DO conditions and summer habitat compression due to low-DO
water, simulations of the depth of the top of the pycnocline (MLD
Although the suite of models examined reproduce DO concentrations relatively well overall (Fig. 4), the models typically overestimate summer habitat compression by producing low DO concentrations too high in the water column (Fig. 6). Observations from the Chesapeake Bay Program show a strong correlation between the depths of the oxygen and density-defined mixed layers (Fig. 10b). The models analyzed here also clearly exhibit a close relationship between their skill in simulating the depths of the oxygen and density-defined mixed layers (Fig. 6). These strong relationships between the depths of the oxygen and density-defined mixed layers result from the fact that the pycnocline represents the physical barrier that leads to the development of the oxycline. Therefore, the inability of the models to accurately simulate habitat compression is an artifact of their lack of skill in simulating the depth of the density-defined mixed layer. In contrast, the strength of density stratification is not well correlated to the strength of oxygen stratification. This is because a relative wide range of intensities of density stratification is still sufficient to cut off vertical mixing, leading to the observed draw-down in bottom DO. Thus, even though all models underestimate the strength of the pycnocline, they still produce enough stratification to greatly reduce mixing. The results from this paper thus indicate that to further improve DO simulations and better estimate summertime habitat compression, it is even more critical for models to accurately simulate the depth of the top of the pycnocline than to accurately simulate the absolute strength of the pycnocline.
The multi-model ensemble approach allows for the development of a model
mean, which taken as its own model, is the most skilled model when examining
the combined suite of variables analyzed in this study.
The model skill assessment presented here demonstrates that the average of all eight models, or five models in the case of chlorophyll and nitrate, does better than any individual model if looking across the suite of variables analyzed. This finding is similar to that of other studies that examined the value of the Model Mean from a multi-model ensemble (e.g., Gneiting and Raftery, 2005; Hagedorn et al., 2005). While the concept of using a multi-model ensemble has been most extensively employed by atmospheric, climatic, and global circulation modelers, such as the Intergovernmental Panel on Climate Change (e.g., Collins et al., 2013), the tool's utility for aquatic ecosystem modeling is gaining traction (Meier et al., 2012; Trolle et al., 2014; Janssen et al., 2015). As models are increasingly used in regulatory decisions regarding aquatic ecosystems, a cohort of similarly skilled models can be used to help inform a set of confidence bounds around an environmental forecast. Due to the restrictions placed on models used in regulatory actions, utilization of a multi-model ensemble may not be realistic for all environmental and resource managers; however, multiple models can be integrated into the decision-making process even when the ultimate decision must be based on a single model. For example, a confidence interval plot could help identify where regulatory model output might be acting out of sync with other skilled water quality models of the same system, thereby informing managers of the potential shortfalls associated with the regulatory model. Furthermore, if the models tend to be predicting similar DO concentrations, a cohort of models could enhance the confidence in regulatory decisions based on a single regulatory model (Friedrichs et al., 2012; Weller et al., 2013). Comparing multiple models can also help inform how to better improve models in the future, as this study has aimed to do.
All models analyzed here exhibited a high degree of skill in simulating dissolved oxygen concentrations within the main stem of the Chesapeake Bay in 2 years corresponding to relatively wet and average years. Their high skill results from the fact that physical processes (e.g., solubility, wind-mixing, and advection) exert a first order influence on the seasonal cycle of oxygen. As a result, the models' ability to reproduce dissolved oxygen concentrations is independent of the complexity of the biogeochemical parameterizations: the simplest constant respiration models were found to reproduce observed oxygen concentrations as well as the most biologically complex models. Essentially, all models are equally capable of respiring most of the available oxygen in the lower water column during summer.
This study also suggests that for use as management tools for water quality and living resources, it is more critical for these models to adequately resolve the depth of the mixed layer than the absolute strength of stratification (as long as modeled stratification is strong enough to limit vertical mixing). This is critical because observations show that during warmer months, oxygen-depleted water fills the water column to where stratification limits further mixing, which effectively cuts off waters below the mixed layer for use by the majority of the Chesapeake Bay's most recognized and valued living resources. These results furthermore suggest that modelers should focus their efforts on improving the hydrodynamics of their models in an effort to improve simulations of mixed layer depth dynamics and variability.
These findings have significant ramifications for short-term bottom DO forecasts, which may be successful with very simple oxygen parameterizations embedded in hydrodynamic models. In contrast, scenario-based water quality forecasts are likely to benefit from more complex models, which must adequately reproduce the longer-term response of the oxygen field to changes in nutrient and organic matter loads. This study also helps to demonstrate how multiple community models from governmental agencies and academic institutions may be used together to provide a model mean and a set of confidence bounds for regulatory model results that could be used to inform management decisions.
Observations used in this analysis can be downloaded from the Chesapeake Bay
Program's Water Quality Database website at
(
This work was supported by the NOAA IOOS program as part of the Coastal Ocean Modeling Testbed. We thank Yun Li and Younjoo Lee for help with the ROMS-RCA simulations used in this analysis and Ray Najjar for his insights and comments. This is VIMS contribution 3520 and UMCES contribution 5130. Edited by: K. Fennel