Species richness and functional attributes of fish assemblages across a large-scale salinity gradient in shallow coastal areas

. Coastal ecosystems are biologically productive and their diversity underlies various ecosystem services to humans. 10 However, large-scale species richness (SR) and its regulating factors remain uncertain for many organism groups, owing not least to the fact that observed SR (SR obs ) depends on sample size and inventory completeness (IC). We estimated changes in SR across a natural geographical gradient using statistical rarefaction and extrapolation methods, based on a large fish species incidence dataset compiled for shallow coastal areas (<30 m depth) from Swedish fish survey databases. The data covered a ca. 1,300 km north-south distance and a 12-fold salinity gradient along sub-basins of the Baltic Sea plus Skagerrak and, 15 depending on sub-basin, 4 to 47 years of samplings during 1975–2021. Total fish SR obs was 144, and the observed fish species were of 74% marine and 26% freshwater origin. In the 10 sub-basins with sufficient data for further analysis, IC ranged from 77–98%, implying that ca. 2–23% of likely existing fish species had remained undetected. Sample coverage exceeded 98.5%, suggesting that undetected species represented <1.5% of incidences across the sub-basins, i.e. highly rare species. To compare sub-basins, we calculated standardised SR (SR std ) and estimated SR (SR est ). Sub-basin specific SR est varied between 35 ± 7 20 (SE) and 109 ± 6 fish species, being ca. three times higher in the most saline (salinity 29–32) compared to the least saline sub-basins (salinity <3). Analysis of functional attributes showed that differences with decreasing salinity particularly reflected a decreasing SR of benthic and demersal fish, of piscivores and invertivores, and of marine migratory species. We conclude that, if climate change continues causing an upper-layer freshening of the Baltic Sea, this may influence the SR, community composition and functional characteristics of fish, which in turn may affect ecosystem processes such as benthic-pelagic coupling and connectivity between coastal and open sea areas.


Introduction
Biodiversity is essential for ecosystem processes, and ultimately for the humans depending on these (IPBES, 2019). Coastal ecosystems are often biologically diverse and highly productive, providing valuable ecosystem services to humans, such as food, water purification and protection against floods (Griffiths et al., 2017;Kraufvelin et al., 2018;Pan et al., 2013). However, 30 threats to coastal biodiversity from e.g. overfishing, habitat loss, pollution, eutrophication and climate change are many and profound (Duncan et al., 2015;Griffiths et al., 2017;Pan et al., 2013). At the same time, the number of species occurring in coastal habitats often remains uncertain (Appeltans et al., 2012), making improved understanding of coastal biodiversity especially important to support conservation and management measures (Pan et al., 2013;Rooney & McCann, 2012).
Taxonomic inventories, or species censuses, are required e.g. for the analysis of biodiversity patterns, delineation of species 35 ranges, and prioritization of conservation efforts (Mora et al., 2008). Species richness (SR), i.e. the number of species in an ecosystem, is a classical indicator of biodiversity, also referred to as "alpha diversity" (Gotelli & Colwell, 2001;Hill, 1973).
However, since achieving complete species inventories is often impracticable with realistic sample efforts, most censuses remain incomplete and many rare species remain unknown. Consequently, it is important to consider the effect of sample size and inventory completeness (IC) on observed SR (SR obs ) to avoid biased or misleading comparisons or interpretations (Chao 40 & Chiu, 2016;Chao et al., 2020;Colwell & Coddington, 1994;Mora et al., 2008).
SR is connected to several ecosystem processes, such as productivity (Duffy et al., 2017), and the efficiency of resource use and nutrient cycling. SR may also facilitate the simultaneous provision of several ecosystem processes, i.e. an ecosystem's multifunctionality (Byrnes et al., 2014). However, since species do not contribute equally to ecosystem functioning, the diversity of species functional attributes adds another important dimension to ecosystem understanding (Duncan et al., 2015;45 Reiss et al., 2009). Functional diversity can enhance long-term stability, through functional redundancy and complementarity, and can help to buffer ecosystems against disturbances (O'Gorman et al., 2011).
Salinity is a key variable influencing SR in coastal areas, as natural differences in salinity among locations function as a threshold or "ecological barrier" for the distribution of freshwater and marine species, for example in the Baltic Sea (Olenin & Leppäkoski, 1999;Vuorinen et al., 2015). At the same time, an on average intensified water cycle caused by global warming 50 is changing the salinity regimes of marine and coastal ecosystems (Durack et al., 2012;Liblik & Lips, 2019;Meier et al., 2021). It is important to understand how salinity influences species distributions in aquatic ecosystems to be able to better predict how potential changes may affect ecosystem functioning.
The Baltic Sea, one of the world's largest brackish water bodies, exhibits a pronounced, geographically stable salinity gradient that is maintained by sporadic inflows of saline water from the North Sea through the Danish Straits and by freshwater input 55 from large rivers, especially in the north. Hence, the Baltic Sea gradient can serve as model on the influence of salinity on species distributions (Johannesson & Andre, 2006;Ojaveer et al., 2010). SR obs is often higher at the more saline conditions, e.g. for macroalgae, benthic bacteria and benthic meio-and macrofauna (Broman et al., 2019;Klier et al., 2018;Middelboe et al., 1997;Schubert et al., 2011). In other studies, SR obs was highest at highest salinity, lowest at intermediate salinity and intermediate at lowest salinity, e.g. for phytoplankton and benthic macrofauna (Bonsdorff, 2006;Olli et al., 2019;Zettler et 60 al., 2014), or there was no clear trend between SR obs and salinity, e.g. for bacterio-, pico-and mesoplankton (Herlemann et al., 2016;Hu et al., 2016).
The species composition of fish in the Baltic Sea is regulated by salinity as well (Olsson et al., 2012;Pekcan-Hekim et al., 2016), even though other factors, such as temperature or habitat complexity, might also influence large-scale patterns of fish SR in estuaries (Vasconcelos et al., 2015;Schubert et al., 2011). In the Baltic Sea, fish SR obs is generally higher in areas of 65 higher compared to lower salinities (HELCOM, 2020;Hiddink & Coleby, 2012;Lappalainen et al., 2000;MacKenzie et al., 2007;Ojaveer et al., 2010;Pecuchet et al., 2016;Thorman, 1986). Various studies have also reported changes in fish SR obs or species composition over time (e.g. Ammar et al., 2021;Törnroos et al., 2019). However, despite concerns that fish SR may decline in the future due to decreasing upper layer salinity (e.g. MacKenzie et al., 2007;Pecuchet et al., 2016;Vuorinen et al., 2015), information on how the complete coastal fish assemblage varies spatially in relation to the Baltic Sea salinity gradient, 70 including potential differences across functional groups, is lacking. Hence, there is a need to complement already existing information on the influence of salinity on various Baltic Sea organism groups with more complete information in relation to fish diversity, taxonomically and functionally. This kind of understanding for multiple trophic levels is needed to better understand and predict how changing salinity, in the Baltic Sea and in coastal areas in general (Durack et al., 2012;Liblik & Lips, 2019), may affect ecosystem structure and functioning (MacKenzie et al., 2007). For example, if different species groups 75 are differently affected, this may also change biotic interactions such as benthic-pelagic coupling, with effects on exchanges of energy, mass or nutrients between benthic and pelagic habitats (Griffiths et al., 2017). Moreover, understanding species richness at a broader, sub-regional scale is important to support analyses of potential species richness and species compositions at more local scales within each sub-basin.
To this aim, we compiled a large dataset on fish species observations in shallow (<30 m depth) Swedish coastal areas, defined 80 in alignment with the EU Marine Strategy Framework Directive as the area within one nautical mile beyond the national baseline (EC 2017). The data compilation was based on multiple existing sources of Swedish mapping and monitoring combined over the years 1975-2021. The extensive dataset covered fish species incidence information from 1,638 unique observations/fishing occasions, during which in total 21,670 species incidences were recorded. Geographically, the data covered 12 hydrographically distinct sub-basins, and a ca. 12-fold salinity gradient from close to freshwater conditions in the 85 inner Baltic Sea to close to fully marine conditions at the Swedish west coast. The annual mean water temperature varies ca.
2-fold across the same area. Since SR obs is strongly dependent on sample size, which differed between sub-basins, we used statistical rarefaction-extrapolation methods to estimate IC and standardised SR per sub-basin. Further, we categorized each fish species according to origin (marine vs. freshwater) as well as three functional attributes based on coastal habitat preference, vertical preference and feeding habitat, and investigated the influence of salinity (and, for comparison, temperature) on fish 90 SR in total and within the functional attributes. We discuss the results in the context of the regulating influence of salinity on fish SR and community composition in coastal ecosystems, and potential implications for conservation and ecosystem management.

Study system 95
The Baltic Sea, an enclosed, essentially non-tidal brackish marine region with a maximum and mean depth of 460 and 54 m, respectively, and a water residence time of 25-40 years, is, among the world's largest estuaries (area: 415,000 km 2 ; HELCOM, 2018). Its current brackish conditions were formed by gradual narrowing of its opening to the North Sea and have been in place since ca. 3,000 years (Russell, 1985). Due to its geographically variable but locally relatively stable salinity conditions the Baltic Sea has been called a "marine-brackish-limnic continuum" (Bonsdorff, 2006). Its surface salinity changes from <3 100 (psu) in the inner-most areas in the north and north-east to almost fully marine (ca. 29) in the Kattegat in the southwest (Table   1). Within this gradient, the Baltic Sea can be divided into hydrographically distinct sub-basins, mostly separated by shallow sounds or sills. To strengthen the database with respect to higher salinity areas we additionally included a North Sea sub-basin adjacent to Kattegat, i.e. Skagerrak (salinity ca. 30; Table 1). Reflecting its salinity conditions the Baltic Sea harbors a unique fish fauna with a mixture of freshwater species (e.g. Northern pike (Esox lucius), perch (Perca fluviatilis), pikeperch (Sander lucioperca)), and marine species (e.g. cod (Gadus morhua), herring (Clupea harengus); (Olsson et al., 2012). Further, many marine fish populations have adapted to the brackish 110 conditions from their Atlantic counterparts (Laikre et al., 2005), for example Baltic cod and herring populations, and one flounder species is endemic to the Baltic Sea (Momigliano et al., 2018). Hence, the Baltic Sea may also have a unique value as a refuge for evolutionary lineages, and constitute an important genetic resource for management and conservation (Johannesson & Andre, 2006).

Species richness data 115
The primary source of fish species data was the Swedish National database of coastal fish (KUL; www.slu.se/kul), which holds public, quality-assured data from surveys encompassing coastal fish monitoring, mapping projects and surveillance programs over the entire salinity gradient of the Baltic Sea plus Skagerrak. Coastal areas were delineated as the zone within one nautical mile beyond the national baseline (EC 2017), as available from the Water Information System Sweden ("VISS", https://extgeoportal.lansstyrelsen.se/standard/?appid=1589fd5a099a4e309035beb900d12399, in Swedish), and KUL data records for 120 1975 to 2021 were extracted (last KUL database access 2021-04-27). Subsequently, data from shallow depths <30 m were selected, corresponding to the main represented sampling methods in the database (Table S1), and approximately to the photic depth in the concerned coastal habitat types (Kaskela et al., 2012). This selection resulted in 154,172 data entries, i.e. individual fish that had been caught and identified to species by specialists, with the number of years with available data differing between sub-basins (Table S2). Further, quality-assured data were included from 1) a national coastal trawl survey (n=4,420), 2) the 125 ICES-coordinated International Bottom Trawl Survey (IBTS, n=1,969) and 3) national projects using standardised methodology (n=893), all carried out in the Skagerrak, Kattegat and the Sound, selecting only hauls from <30 m depth within the concerned geographical delineations (see above). Corresponding trawl data for the inner Baltic Sea area are not collected at depth <30 m in Swedish waters. Across databases, only sampling occasions with geographical coordinates and verified sampling references were included. Hence, data collected from multiple gears were combined, including gill nets, fyke nets, 130 seines, trap nets, low impact underwater detonations and trawls, in order to maximize the chance of including different species (Table S1). The ambition to collate information from all available fish surveys implied some differences in predominating data collection methods across the studied geographical range. The main data sources were trawls and trap net surveys in the most saline sub-basins, i.e. Skagerrak, Kattegat and the Sound, and gill net surveys in the remaining sub-basins (Table S1). While each gear has a specific selectivity and efficiency, a previous comparison of data from gill and fyke net samplings at the 135 Swedish west coast did not reveal consistent differences in biodiversity metrics, and the statistical approaches were chosen to minimize a potential bias when comparing SR among sub-basins (Bergström et al., 2013, Chao et al., 2020; see also search of additional data sources below, and Sects. 2.3 and 4).
Data on observed SR, SR obs, was available for all 12 sub-basins (Table 2), representing between 4 and 47 years of data, depending on sub-basin (Table S2). However, subsequent statistical analyses and comparisons were conducted only for the 10 140 sub-basins containing data from at least 30 sampling/fishing occasions, corresponding to several hundred fish incidences (i.e. Bothnian Bay, the Quark, Bothnian Sea, Åland Sea, N Baltic Proper, W Gotland Basin, Bornholm Basin, the Sound, Kattegat, and Skagerrak, see also Sect. 2.3). This dataset is hereafter referred to as "raw data", and contained in total 160,453 entries (i.e. fish individuals caught and determined to species) from 1,638 sampling/fishing occasions at 4,571 unique locations. E Gotland Basin and Arkona Basin were not statistically analysed since we considered these sub-basins too under-sampled, with 145 only 13 and 7 samplings, respectively, from 9 and 4 different years, and less than 100 species incidences in total (Tables 2,   S2). Additionally, we searched for evidence of fish species that had remained undetected in the fish surveys (i.e. incidence database, see Sect. 2.3), by identifying fish species records from three additional sources, using the same criteria for geographical and depth delineations of shallow coastal areas as above, i.e. 1) the SLU hosted national public database for citizens' reporting of species observations (SLU Swedish Species Information Centre, https://www.artportalen.se/; n=8,926, 150 unreasonable species observations deemed as falsely identified were discarded), 2) the national archive for oceanographic data hosted by the Swedish Meteorological and Hydrological Institute (SHARKweb, https://www.smhi.se/en/services/opendata/national-archive-for-oceanographic-data; n=1,259), and 3) published inventory data from Skagerrak, Kattegat and Bornholm Basin (Pihl & Wennhage, 2002;Pihl et al., 1994;Wikström A., 2009). These "additional data sources" were used as complementary information on SR obs but could not be used in the statistical analysis since they did not have complete 155 sampling and species incidence information. Further, our SR results were compared with the HELCOM (2020) checklist on macro-species containing information for all of the sub-basins and depths in the Baltic Sea region.

Analysis of species richness data
The raw data was first summarized to a dataset of unique fish species caught in presence/absence format, per fishing/sampling occasion in each sub-basin, and then further aggregated to an incidence frequency format (Chao et al., 2014), where incidence 160 is occurrence/presence of a species. The resulting dataset gives the observed total incidence of each species over the number of fishing/sampling occasions per sub-basin, and is referred to as "fish incidence database". Each unique combination of a fishing/sampling location per date was defined as one sampling unit, and these were summed per sub-basin to obtain the sample sizes. Subsequently, incidence-based Hill diversity numbers of three "orders", which differ in their propensity to include or exclude relatively rarer species (Hill, 1973), were calculated to quantify the species diversity of each assemblage, i.e. 1) species 165 richness (SR), which counts all species equally irrespective their incidence frequency, 2) Shannon diversity (ShD), which considers the incidence frequency and can be interpreted as the effective number of frequent species, and 3) Simpson diversity (SiD), which can be interpreted as the effective number of highly frequent species (Chao et al., 2014;Chao et al., 2020).
Calculations were performed using the R package iNEXT (Chao et al., 2020;Hsieh et al., 2016), and the values are hereafter referred to as observed SR, ShD and SiD, respectively. It should be noted that, using these methods, Shannon and Simpson 170 diversity are expressed in terms of richness, i.e. number of species, which differs from other known formats. Specifically, ShD is the exponential of Shannon's entropy index, and SiD is the inverse of Simpson's concentration index (Chao et al., 2014). SR obs is highly dependent on "sample completeness" (Colwell & Coddington, 1994;Hill, 1973) and typically underestimates the "true" SR due to undetected species, an aspect referred to as under-sampling, sampling bias or sampling problem (Chao et al., 2014;Chao & Jost, 2015;Menegotto & Rangel, 2018). Similar to Hill numbers, "sample completeness" can be 175 calculated for different "orders" (Chao et al., 2020). The zero-order sample completeness is hereafter referred to as inventory completeness (IC). It is calculated as the ratio of SR obs to the estimated "true" SR (i.e. observed plus undetected SR, see "estimated SR" below), hence giving the proportion of detected species without considering the species incidence frequencies. We calculated IC for the data merged over time, and including both resident and migrating/visiting fish species.
The first-order sample completeness, hereafter referred to as "sample coverage" (SC), is a measure where species are 180 weighted by their detection probabilities, giving the proportion of incidences detected from the estimated "true" incidences (Chao et al., 2020).
To correct for the effect of differing sample completeness on SR obs , and allow accurate, unbiased comparisons between subbasins, we used a coverage-based rarefaction and extrapolation method implemented for incidence data in the R package iNEXT (Chao et al., 2014(Chao et al., , 2020Hsieh et al., 2016). A coverage-based method was chosen because more traditional sample 185 size-based corrections can introduce a systematic bias, since the number of samples needed to fully characterize a community depends on its SR (Chao & Jost, 2012). For each sub-basin, we obtained 1) the rarefied SR, ShD and SiD, which were standardised to the minimum observed SC across all included sub-basins ( hereafter referred to as standardised values, i.e. SR std , ShD std and SiD std ), and 2) the fish SR extrapolated to twice the actual sample size (hereafter referred to as estimated values, i.e. SR est , ShD est and SiD est ; Chao et al., 2014Chao et al., , 2020Hsieh et al., 2016). Similar analyses were also 190 conducted for SR of fish with different functional attributes (see Sect. 2.4). All calculations were conducted using R version 4.0.4 (R Core Team, 2021).

Fish functional attributes
All observed fish species were assigned functional attributes based on ecological and behavioral traits, as well as into being of either marine or freshwater origin (Kullander, 2002). The affinity of each species to different parts of the coastal habitat, or 195 habitat preference, was assigned based on (Elliott & Dewailly, 1995;Pihl & Wennhage, 2002), however with certain adaptations to suit both marine and brackish conditions (Table S3-S4). Applied categories were: Catadromous or anadromous migrants (CA), using coastal habitats only when migrating between marine and freshwaters for spawning and feeding; Marine juvenile migrants (MJ), using coastal habitats primarily as nursery or feeding grounds; Marine visitors (MV), occurring irregularly in the coastal area, having their primary habitat in deeper waters; Marine seasonal migrants (MS), making regular 200 seasonal visits to coastal habitats, usually as adults; and Coastal residents (CR), spending almost their complete life cycle in coastal habitats or the littoral coastal zone. The main vertical distribution of each species in the water column, considering the adult stage, was assigned based on (Elliott & Dewailly, 1995;Koli, 1990) as: Pelagic (P), living mainly in the water column; Demersal (D), mainly associated with the bottom substrate; Demersal-pelagic (DP), alternating between the water column and bottom substrate; and Benthic (B), staying close to the seabed. Main feeding habits were assigned by combining information 205 on feeding guild (Elliott & Dewailly, 1995) with trophic levels (TL) and principal diet composition (Froese and Pauly, 2021), as: Piscivores (Pi; TL 3.6-4.4); Invertivores and piscivores (IP; TL 2.9-3.9); Invertivores (I; TL 2.8-3.9); Planktivores (PL; TL 3.1-3.2) and Omnivores (O; TL 2.8-3.5).

Sea water salinity and temperature
For each sub-basin, data on ambient salinity and temperature was extracted from the "Baltic Sea Physics Reanalysis" product, 210 as calculated by the Swedish Meteorological and Hydrological Institute (SMHI) with the coupled physical-biochemical model system NEMO-SCOBI, and available from year 1993 (CMEMS, 2021). This encompassed full coverage layers with a 4 km x 4 km grid. Monthly mean values close to the sea bed for all grid cells representing areas less than 30 m depth were first identified, and then used for calculating long-term means and standard deviations for the years 1993-2019.

Statistical analyses 215
Linear regressions were used to analyse the relationships between salinity and temperature, respectively, and observed, standardised and estimated SR, ShD and SiD. To test for any additional explanatory effect of temperature, after accounting for the effect of salinity, we used ANOVA to compare models with salinity as the only explanatory factor with models with salinity plus temperature as explanatory factors. Furthermore, relationships were tested between the different functional attributes and salinity. To reduce skewness and approximate normality, left-skewed response variables were log 10 -transformed prior to 220 analysis, or, in two cases where the response variable included zero-values, Yeo-Johnson transformed (Yeo and Johnson, 2000). All analyses were conducted using R version 4.0.4 (R Core Team, 2021).

Salinity and temperature
The annual mean salinity varied more than 12-fold in the shallow coastal areas across the studied sub-basins, from 2.7 in the 225 northernmost Baltic Sea to 32.4 in the Skagerrak. Across the same geographical range, the annual mean water temperature varied from 4.5°C in the north to ca. 9-10°C in the Sound and outwards (Table 1).

Fish species observations and distribution
SR obs varied from 23 (Arkona Basin) to 101 (Kattegat) ( Table 2, that also contains related information on e.g. sample size and species incidences per sub-basin), and amounted to 125 across sub-basins and years. Since IC was <100% (see Sect. 3.3), this 230 can be assumed a lower bound estimate of the true SR. Indeed, the additional data sources contained 19 more species that were not represented in the fish incidence database, resulting in a total fish SR obs of 144 (Tables S3, S4). Of the species in the fish incidence database, 49% occurred only in the higher salinity Skagerrak-Kattegat region including the Sound, 15% occurred only in the Baltic Sea region (i.e., inside the Sound), and 36% occurred in both these regions. The species ranging most widely were herring (Clupea harengus), brown trout (Salmo trutta), European sprat (Sprattus sprattus) and eelpout (Zoarces 235 viviparous), with incidences reported from all 12 sub-basins (Tables S3, S4). Table 2. Summary information for the fish incidence database and additional data sources, covering 12 sub-basins in Swedish shallow coastal areas (<30 m depth). Inventory completeness (IC), sample coverage (SC), standardised (SRstd) and estimated (SRest) values were statistically estimated for sub-basins with sample size >30 fishing/sampling occasions. Of these, SRstd was calculated for a SC of 98.5%, which was the lowest SC among sub-basins with sufficient data (i.e. Åland Sea). For comparison, the last two columns 240 show SRobs when also including species records from additional data sources (not included in the statistical analyses, see Sect. 2.2), or for the whole of each Baltic Sea sub-basin according to HELCOM (2020), i.e. across countries and depths. NA: not applicable; n.d.: not determined.

Species incidence data set
Including additional data sources  (Chao et al., 2020). c Percentage of incidences detected from the estimated "true" (i.e. observed plus undetected) incidences (Chao et al., 2020). d Considered a lower bound estimate (Chao et al., 2020). e Not included in the Baltic Sea region 250

Inventory completeness and sample coverage
The fish species IC in Swedish shallow coastal areas varied from 76.7% in Skagerrak to 97.7% in the Bothnian Sea for the assessed sub-basins (see section 2.2), suggesting that ca. 2-23% of species statistically likely to be present remained undetected (Table 2). The SC exceeded 98.5% in all sub-basins (Table 2), suggesting that these undetected species were highly rare, likely representing <1.5% of incidences. The species accumulation curves (SAC) show the SR obs at the conducted sample sizes, and 255 SR estimated for hypothetical smaller and larger sample sizes, including 95% confidence intervals. According to these, the steepest increase of accumulated species occurred with the first ca. 20 samplings in all sub-basins, and coastal fish SR was highest in the Kattegat, followed by the Skagerrak and the Sound, and lowest in the other seven sub-basins (Fig. 1a).
The SAC's also visualize differences in IC between sub-basins. For the three most saline sub-basins, Skagerrak, Kattegat and the Sound, the SACs were still clearly increasing with increasing sample size even when extrapolating to double the actual 260 sample size. Hence, SR est for these sub-basins are more uncertain and more likely biased low than for sub-basins where the curve flattened, illustrating a more complete inventory, e.g. W Gotland Basin and the Bothnian Sea (Fig. 1a). SR est , estimated based on extrapolation of the information in the fish incidence database, were similar to SR obs if complementing the incidence data with records from the additional data sources (Table 2).

Shannon and Simpson diversity 275
Rarefaction and extrapolation SACs carried out for Shannon diversity (ShD) show that the effective number of frequently recorded fish species was quite well captured by the samplings in all statistically assessed sub-basins, illustrated by SACs with small remaining slopes at extrapolated higher sample size. As for SR obs , ShD was highest in Kattegat, while the remaining nine sub-basins clustered in two separate groups. The lowest ShD was noted for the Åland Sea, the Quark and Bothnian Bay (Fig.   1b, Table S5). The effective number of highly frequent species, i.e. Simpson diversity (SiD), was also well captured in all sub-280 basins, being highest in Kattegat, while SiD in the remaining sub-basins clustered in four groups (Fig. 1c, Table S5).

Standardised and estimated species richness
To compare coastal fish SR, ShD and SiD across sub-basins, we estimated their standardised values against the minimum observed SC in any of the sub-basins. This represented a standardisation to the SC of the Arkona Basin data (98.5%; Tables 2 and S3). SR std was ca. three times higher in the relatively more saline Kattegat (SR std = 78) compared to the least saline Bothnian 285 Bay (SR std = 24), as also confirmed by comparing the respective SR est values ( Table 2). The differences were smaller for ShD and SiD. For example, based on SiD std and SiD est , the effective number of highly frequent species was ca. two times higher in coastal areas of the Kattegat compared to the Bothnian Bay (Table S3). This implies, as also seen from the SACs (Fig. 1), that the frequent and most frequent fish species were captured quite well by the samplings for all sub-basins, and that remaining uncertainties in differences across the salinity gradient is mostly due to uncertainty in the numbers of highly rare fish species. 290 The estimated SR per sub-basin (SR est , statistical extrapolation of the fish incidence data) was very similar to the total observed species richness when also including species records from additional data sources, with a mean ratio of 1.07 ± 0.03 (calculated based on data presented in Table 2).

Relationships of SR with salinity and temperature
Species richness increased with increasing mean water salinity, which explained 37-55% of the variance in the data based on 295 SR obs , ShD obs and SiD obs . Using the standardised or estimated values, i.e. values corrected for sample size, resulted in stronger correlations, i.e. higher explained variance (40-77%; Fig. 2a-c, Table S6). SR obs , ShD obs and SiD obs were not correlated with mean water temperature, but, using the standardised and estimated values, correlations with temperature were also significant (explaining 48-77% of the variance; Fig. 2d-f, Table S6). The slope estimates of the linear regressions differed more across observed, standardised and estimated values for SR than for ShD and SiD (Table S6). In all cases, adding temperature as 300 explanatory variable to the regression models with salinity as explanatory variable did not improve the model (all P>0.14).

Fish functional attributes
Of the recorded fish species, 74% and 26% were of marine and freshwater origin, respectively (based on the incidence data, i.e. SR obs of 92 vs. 33 species; Table S3). In the most saline sub-basins, i.e. Skagerrak and Kattegat, the SR std of marine fish species was seven to ten times higher than that of freshwater fish species. The SR std of marine vs. freshwater fish were rather 305 similar in the central Baltic Sea, while in the northernmost and least saline sub-basins, i.e. Bothnian Sea, the Quark and Bothnian Bay, the SR std of freshwater fish species exceeded the SR std of marine fish species by two to three times. In total, the marine fish SR std decreased by a factor of 8-11 along the salinity gradient, from 39 and 57 marine species (SR std ) in Skagerrak and Kattegat to 5 in the Bothnian Bay. Freshwater fish SR std increased by a factor of 2-4 along the same gradient (Fig. 3, Table   S3). These distributional patterns of freshwater vs. marine fish species were also reflected by negative univariate correlations 310 of freshwater SR (obs, std and est) with salinity, and positive univariate correlations of marine SR with salinity (Fig. S1, Table   3).
Concerning habitat preference, half of the fish species in Swedish shallow coastal areas were classified as being coastal resident species (CR; based on incidence data only, SR obs : 63 species, Table S3). This group dominated coastal fish assemblages in all sub-basins, with CR SR std of 19-30 across sub-basins (Fig. S2, Table S7), and was not linearly related to salinity (Fig. 4a, 315 Table 3). A similar result was noted for catadromous or anadromous fish species, with SR std of 2-6 across sub-basins, and CA SR std not related to salinity (Fig. 4b, S2, Tables 3, S7). In the more saline sub-basins, fish species classified as marine visitors or as marine juvenile or seasonal migrants contributed significant numbers to the SR std , while these species groups did not exist or contributed only little to the SR std in the Baltic Sea region (Fig. S2, Table S7). Reflecting this pattern, the SR of marine migrating or visiting fish species (i.e. MJ, MS and MV) was significantly positively related to salinity in most cases, with the 320 strongest correlations for marine juvenile visitors (MJ; Fig. 4c-e, Table 3). Table 3.

and, when significant (P<0.05), the linear regression lines (solid) and 95%-confidence intervals (shaded areas surrounded by dashed lines. Please note that data points (Table S7), lines and shaded confidence intervals are overlying each other within the panels in some cases. For regression equations and statistics, see
Concerning vertical distribution, benthic fish species (B) were important contributors to SR std in the sub-basins of higher salinity, but only few or no fish species belonged to this group in the less saline sub-basins ( Fig. S3; Table S7). A similar, 330 though less pronounced distribution pattern was also found for demersal fish species (D). Accordingly, the SR of these groups were positively related to salinity in all cases (i.e. for SR obs , SR std and SR est , Fig. 5a,b, Table 3). The SR of demersal-pelagic (DP) fish species varied between sub-basins with a SR std of 6-16, not related to salinity (Fig. 5c, Table 3). A similar picture was found for pelagic fish species (P), where SR std varied between 5-12 across sub-basins (Fig. S3, Table S7) and was not related to salinity (Fig. 5d, Table 3). 335 The two most common feeding groups, across all sub-basins, were invertivore and piscivore species (IP) as well as invertivores (I). The third-most represented feeding group was piscivores (Pi), followed by planktivores and omnivores in lower and often similar SR std (Fig. S4, Table S7). SR std of Pi and IP increased with increasing salinity, and SR std and SR est of I increased with increasing salinity (Fig. S5, Table 3). Table 3. Simple linear regressions between salinity and SR for the different fish functional attributes (observed (obs), standardised (std) and estimated (est) SR) in Swedish shallow coastal areas. When YJ(y), the response variable was Yeo-Johnson transformed (Yeo and Johnson, 2000). n.s.=not significant.

Response variable
Parameters (± SE) n R 2 P-value  (Table S7), lines and shaded confidence intervals are overlying each other within the panels in some cases. For regression equations and statistics see Table 3. For marine visitors (e), for clarity following transformation, only the observed SR is shown (no transformation was needed for the standardised and estimates values and there were no significant (Table S7), lines and shaded confidence intervals are overlying each other within the panels in some cases. For regression equations and statistics see Table 3.

Discussion
Data from species censuses have been called "probably the most basic data in ecology", as they are widely useful for example to define species ranges and biodiversity patterns, and support conservation efforts (Gaston & Blackburn, 2000). A limitation for the use of taxonomic inventory data for biodiversity purposes, however, is their completeness, i.e. the fraction of species in a given location that has been sampled (Mora et al., 2008). In this study, the coastal fish taxonomic inventory completeness 375 (IC) varied between 77-98% for the ten assessed sub-basins, exceeded 80% in eight sub-basins, and averaged 86% (Table 2). This average IC in shallow Swedish coastal waters is high compared to an assessment of marine fish species census data worldwide, where global IC averaged 79%, indicating that ca. 21% of fish species still remained to be described. Marine fish IC exceeded 80% in less than 2% of marine areas worldwide, with the highest IC of 92% found for reef-associated species, and the lowest of 56% for bathydemersal species (Mora et al., 2008). Similarly, a global assessment published 2012 concluded 380 that ca. 77% of marine fish were known. The comparatively low IC in Skagerrak (77%, Table 2) can be related to a, in this context, rather small sample size for a relatively high number of marine migrating/visiting species which are only more rarely present in this sub-basin, and hence harder to detect during sampling (Fig. S2, Table S7).
The SR of frequent and very frequent species (i.e. Shannon and Simpson diversity, ShD and SiD) were well described by the sample sizes available to date in the assessed sub-basins, with calculated ("observed") ShD and SiD being similar to both 385 standardised and estimated values (where effects of differing sample sizes are considered; Table S5). This indicates that the remaining uncertainty in fish SR obs is caused by a number of highly rare species, corroborated by a high sample coverage (SC) across sub-basins (Table 2). This is a typical pattern, since well-known species are usually common and have large geographical ranges, whereas newly discovered species are usually (more) locally rare and geographically concentrated (Appeltans et al., 2012;Mora et al., 2008;Pimm et al., 2014). Specifically, in terms of species numbers and depending on the 390 sub-basin, 1-37 statistically likely existing and highly rare fish species remained undetected in the samplings/fish incidence database ( Table 2, SR est minus SR obs ). However, the high similarity between SR est and total SR (i.e. SR obs based on incidence data plus species records from additional data sources, Table 2) suggests that the total observed species lists for fish in Swedish shallow coastal areas were close to complete for all assessed sub-basins, i.e. including the highly rare species undetected as of the incidence database. This comparison between SR est and total SR also independently confirmed/validated that the SR values 395 estimated based on the fish incidence database (SR est ) were realistic.
The most recent check-list of Baltic Sea macrospecies, i.e. containing fish species reported across Baltic countries at both shallow and deeper water depths but excluding the Skagerrak, contains 242 fish species (HELCOM, 2020). In our analyses of Swedish shallow coastal areas the total fish SR obs amounted to 144 (i.e., SR obs from incidence data plus additional data sources), also if Skagerrak is excluded. Comparing the sample-size corrected estimates of SR in coastal areas (SR est ) with the sub-basin 400 specific SR obs of HELCOM (2020) suggests that ca. 50-90% of all reported Baltic Sea fish species are found in Swedish shallow coastal areas (i.e. 100*SR est /"SR obs all countries and depths", Table 2).
Our study reinforces that SR obs is strongly dependent on SC, as relatively rare species are more likely to be missed at lower sample size/sample coverage, and that comparisons of SR obs in species assemblages without accounting for this effect can lead to biased or misleading conclusions (Chao & Chiu, 2016;Colwell & Coddington, 1994;Gotelli & Colwell, 2001;Hill, 1973;405 Menegotto & Rangel, 2018;Pimm et al., 2014). When sample sizes are not uniform among sites or over time, SR obs need to be corrected for SC before valid comparisons can be made. However, such methods have so far only rarely been used for coastal and estuarine fish assemblages (Waugh et al., 2019).
Besides the effects of sample size, SR and IC might to some extent also have been differentially influenced by differences in predominating sampling gear across sub-basins. Multi-mesh gill nets dominated in seven of the statistically assessed sub-410 basins, while trap nets and trawls dominated in the other three (Table S1). As one "sample" in each of these cases represents a different effort, due to differences in gear selectivity and sampling approach, this strictly does not allow for direct comparisons (Bergström et al., 2013;Waugh et al., 2019). For example, at the Swedish west coast, gill nets typically sample a wider range of species than fyke nets, which are more selective towards demersal and demersal-pelagic species (Bergström et al., 2013). Merging data from multiple gear types into one analysis may have caused a certain bias in this regard. However, 415 we argue that our approach was feasible given that the gears used in the different sub-basins are optimized for the locally prevailing conditions, i.e. aiming to sample the existing assemblages as completely as possible (Bergström et al., 2013), as additional data from relevant trawl surveys were also included, and considering the long time horizon of data collection. Further supporting our approach, biodiversity metrics that were standardised against catch size revealed no consistent differences when comparing gill and fyke net samplings at the Swedish west coast (Bergström et al., 2013). Our assumption also appears justified 420 given that SR est was similar to SR obs from incidence data plus species records from additional data sources (Table 2), giving confidence that the potentially introduced bias due to differing fishing gear and methods did not strongly influence the general patterns and results of this comparative and large-scale statistical analysis.
That clear predominance of marine species in the most saline sub-basins compared to freshwater species in the inner parts of the Baltic Sea is in agreement with the fact that salinity functions as threshold or "ecological barrier" for the distribution of many freshwater and marine species (Olenin & Leppäkoski, 1999;Vuorinen et al., 2015). It also corroborates patterns earlier reported for fish SR obs in three Baltic sub-basins (Hiddink & Coleby, 2012) and estuaries in general (Whitfield, 2015). The 430 relatively small number of freshwater fish species incidences observed in the higher salinity sub-basins in our study (Fig. 3) likely stems from sampling close to freshwater tributaries, and reflects that many freshwater fish species can stand extended exposure to certain salinity levels (<ca. 9) and tolerate brief exposure to higher salinities (>ca. 15; Peterson & Meador, 1994).
While temperature did not significantly correlate with observed SR, ShD or SiD, it was positively related with the standardised and estimated values ( Fig. 2d-f, Table S6), which may indicate a temperature effect on fish biodiversity. In previous studies, 435 temperature has shown positive correlations with SR obs in North Atlantic demersal and benthopelagic fish assemblages (Gislason et al., 2020), and with fish SR obs in the coastal Norwegian Skagerrak (Lekve et al., 2002) as well as in estuaries worldwide (Vasconcelos et al., 2015), all being examples of the often found general pattern that broader-scale SR co-varies with climatic variables such as temperature (Currie et al., 2004). However, given the clear relationship between salinity and the incidences of freshwater vs. marine fish species across the studied sub-basins (Fig. 3), we consider the studied salinity 440 gradient to represent a case where the "physiological tolerance hypothesis" applies strongly, i.e. that SR in a particular area is limited by the number of species that can tolerate the local salinity conditions (Currie et al., 2004). In accordance, the regression models with salinity alone did not improve by adding temperature as additional explanatory variable. This conclusion is in agreement with observations from estuaries that fish SR is influenced by the broader distributions and habitat preference patterns of marine and freshwater species that can colonize these areas (Vasconcelos et al., 2015). 445 Besides salinity and temperature, which show a pronounced gradient over the large spatial scale of our study (Table 1) and were identified as likely main drivers here (Fig. 2), fish SR might also be influenced by other factors, such as for example human pressures. The cumulative pressure from human activities in the Baltic Sea, combining factors such as fishing, eutrophication and hazardous substances, is generally higher in the southern and south-western sub-basins. These sub-basins also show both relatively higher salinity and fish SR, compared to for example the northernmost sub-basins with lower 450 cumulative human pressure, salinity and fish SR (Tables 1, 2;Korpinen et al., 2012;HELCOM 2018). Hence, no negative relationship between cumulative human pressure and fish SR is expected on the large spatial and temporal scales studied here.
This does, however, not contradict that variation in levels of human pressures can have effects on fish concerning other aspects, such as population sizes or growth rates (Olsson et al., 2019, Bastardie et al., 2021, Reckermann et al., 2022, or could possibly affect SR on smaller spatial scales than studied here. Evidently, since the rarefaction-extrapolation analyses that we used here 455 are based on species incidence frequencies (Chao et al., 2020), the statistical results could potentially be influenced by human pressures that alter these frequencies, even if fish SR in itself may not be affected. However, given that the rarefied and extrapolated SR (i.e. SR std and SR est ) are based on SC, where rare species are not influential, these statistics are rather robust against such effects (unless there would be severe changes in the incidence frequencies of common species).
Another potential explanatory factor to consider is habitat complexity, related to e.g. diversity of substrate or habitat-forming 460 macrophytes, which can promote higher aquatic biodiversity (Soukup et al., 2021). Differences in habitat complexity may play some role in the observed large-scale patterns in fish SR given that macroalgal SR increases with increasing salinity across the Baltic Sea, with a larger share of habitat-forming and perennial species in more marine waters (Middleboe et al., 1997, Schubert et al., 2011. Hence, a greater habitat complexity with increasing salinity may enhance fish SR, further reinforcing any salinityinduced distributional pattern. 465 In compiling data we have assumed that salinity changes have been minor during the time period from which samples were obtained, compared to the pronounced spatial salinity gradient (observations over 17 to 47 years, depending on sub-basin, during 1975-2021; Table S2). According to monitoring data from the Baltic Sea, temporal changes in surface salinity varied between an increase of 3 [psu] in the Kattegat and a decrease of 1 [psu] in the Bothnian Sea during the past decades (1980Ammar et al., 2021), which can be considered small compared to the spatial salinity gradient ranging from 3 to 29. 470 Considering temporal patterns in SR and community composition, it was earlier reported that the observed fish SR increased in Kattegat, Arkona Basin and the central Baltic during 2001(Hiddink & Coleby, 2012, and that the observed SR of demersal fish increased in the Baltic Proper and the Bothnian Sea during ca. 1971-2013 (Törnroos et al., 2019). First-time observations of known fish species in sub-basins where they were not previously caught have been particularly related to increasing spring temperatures (+3-6 °C during 1980-2015; Ammar et al., 2021). Such potential temporal patterns were not 475 analysed in our study, where we merged the fish incidence data across years to provide as accurate as possible SR estimates at the sub-basin scale, and focused on large-scale spatial patterns.
Along with the changes in species, our study revealed changes in fish SR for different functional groups across the studied salinity gradient. As the functional groups represent differences among species groups in e.g. use of resources or level of mobility, these changes may also lead to differences in coastal ecosystem functioning across the different sub-basins (Elliott 480 et al., 2007;Franco et al., 2008). Migrating fish species are typically of marine origin (here, marine juvenile migrants, marine seasonal migrants or marine visitors) and cannot tolerate low salinity, explaining their predominance at higher salinities (Fig.  S2), and in agreement with known patterns in European estuaries in general (Elliott & Dewailly, 1995;Franco et al., 2008). This pattern, with marine open sea fish species periodically using coastal areas, may be related to enhanced prey availablity, and to hiding places and the typically more turbid waters providing protection from predators (Franco et al., 2008). Moreover, 485 the higher SR of migratory fish at higher salinity is likely relevant for the ecological connectivity between ecosystems, i.e. by transport of local "coastal" production to the open sea and vice versa (Franco et al., 2008). It also emphasizes the important role of coastal areas as nursery grounds, migration routes and refuge areas for marine fish species (Elliott et al., 2007). This connectivity is also maintained in the less saline sub-basins, though the concerned functional groups are represented by only a few species (Fig. S2; Berkström et al., 2021). 490 Further, benthic and demersal fish SR decreased with decreasing salinity (Fig. 5a,b, Table 3), corroborating a previously documented decrease in demersal fish SR obs from the saline Kattegat to the less saline northern Baltic Proper (Pecuchet et al., 2016), and in accordance with high benthic preference of marine fish species in European estuaries (Elliott & Dewailly, 1995;Franco et al., 2008). This pattern further corresponds with that the observed SR of benthic meio-and macrofauna, which are a dominating prey for benthic and demersal fish, also decreases with decreasing salinity in the Baltic Sea (Broman et al., 2019;495 Zettler et al., 2014). Taken together, these patterns suggest that benthic-pelagic coupling through fish predation likely involves a lower number of species links, or functional redundancy, towards lower salinity sub-basins. Concerning feeding habits, the general composition of feeding guilds noted in the higher-salinity sub-basins was similar to that reported on a larger European scale (Elliott & Dewailly, 1995). Also, the higher piscivorous fish SR in the more saline sub-basins (Fig. S5e) and the low omnivorous fish SR which was unrelated to salinity ( Fig. S5c) was in agreement with findings from European estuaries (Franco 500 et al., 2008). In summary, the identified differences in functional traits of fish along the salinity gradient were largely related to the respective changes in the predominating fish origin, i.e. freshwater vs. marine species.

Conclusions
Since fish SR and a number of functional attributes changed along the salinity gradient, respective changes in the coastal fish communities may be foreseen if climate change further alters salinity conditions in the Baltic Sea. While the confidence in 505 future salinity projections remains low (HELCOM, 2021), recent ensemble simulations estimate that the two main drivers of climate-related changes in salinity in the Baltic Sea region, increasing river runoff (leading to lower salinity) and sea level rise (leading to higher salinity), approximately compensate each other, and may result in no net salinity changes (Meier et al., 2021). Mean (depth-integrated) observed Baltic Sea salinity did not change during 1982-2016, however, vertical changes were observed with freshening trends of the upper layer down to 40-50 m depth in most sub-basins, and increasing salinity below 510 the halocline in the deep layer of some sub-basins (Liblik & Lips, 2019). Hence, if not considering potential phenotypical acclimation or genetic adaptation, an upper layer freshening would, based on the results from this and earlier studies (e.g. Hiddink & Coleby, 2012;MacKenzie et al., 2007;Pecuchet et al., 2016), likely lead to less species-rich native fish communities in shallow coastal areas, where more and more marine species are excluded. Further, successful recovery of marine overfished species may become less probable while certain freshwater fish species may be favored (MacKenzie et al., 2007;Peterson & 515 Meador, 1994). Indeed, marine fish species were negatively affected by a period of freshened conditions in the Baltic Sea during the ca. 1970-90s (Ojaveer & Kalejs, 2005). Benthic fish species, being mostly of marine origin, may be especially vulnerable to freshening in the Baltic Sea region where their proportion in the fish assemblage is already relatively low to date.
Besides salinity changes, fish SR and distribution may also be influenced by other climate-change related processes, including warming and resulting higher deep-water oxygen consumption rates, or changes in the Baltic Sea circulation (HELCOM, 2021;520 MacKenzie et al., 2007). Increasing water temperatures have already been linked to increased observed fish SR in the adjacent North Sea (Hiddink & Ter Hofstede, 2008), and in the Kattegat (Hiddink & Coleby, 2012). Further ecosystem-based assessments are needed to obtain realistic predictions of the net effect of such ongoing environmental changes on future fish SR/community composition and on how they may interact with human activities such as fishing patterns, and with conservation needs for biodiversity management. 525

Code and data availability
The data used in this study is publicly available via the SLU Database for Coastal Fish -KUL (https://www.slu.se/en/departments/aquatic-resources1/databases/database-for-coastal-fish-kul/), the SLU Swedish Species

Author contribution
LB, MK and BK designed the study. LB, ME and MK compiled the data. MK and LB assigned the functional characteristics 535 to the fish species. BK analysed the data. BK and ME visualized the data. All co-authors discussed and validated the results.
BK prepared the manuscript with contributions from all co-authors.
The authors declare that they have no conflict of interest. management, and by the Department of Aquatic Resources, Swedish University of Agricultural Sciences. Dr. Malin Werner kindly contributed to the compilation of trawl data. We acknowledge the Swedish national programs for fish, fisheries and marine environmental monitoring, hosted by SLU, that made this study possible. 545