Co-occurrence patterns in aquatic bacterial communities across changing permafrost landscapes

Introduction Conclusions References


Introduction
Permafrost is widespread in Arctic and boreal regions  and is estimated to contain ca. 1700 Pg of organic carbon (McGuire et al., 2009;Tarnocai et al., 2009). Permafrost thawing and erosion is evident by the northward retreat of the permafrost boundary (Thibault and Payette, 2009). In some northern regions this has led to the expansion of permafrost thaw ponds and lakes (thermokarst systems; Grosse et al., 2013), whereas in other regions there has been a contraction and loss of these waterbodies (e.g., Andresen and Lougheed, 2015). These thermokarst systems are part of circumpolar and global biogeochemical cycles (Abnizova et al., 2012;Walter et al., 2007). Although some are carbon sinks (Walter Anthony et al., 2014), others are net sources of carbon dioxide (CO 2 ) and methane (CH 4 ) to the atmosphere (Walter et al., 2008).
Microbial communities are among the main drivers of key biogeochemical processes (Ducklow, 2008), and in thermokarst systems they are composed of functionally diverse taxa Rossi et al., 2013). In particular, these systems are favorable for bacterial methanotrophs  as well as archaeal methanogens (Mondav et al., 2014), and the relative activity of these two groups will affect methane balance and the net emission of greenhouse gases. Identifying factors that shape microbial communities in these aquatic systems is therefore essential for understanding the functional significance of these permafrost thaw systems in the global carbon budget.
Aquatic bacterial communities are thought to be selected by a combination of bottom-up (resource availability) and top-down (viral lysis, grazing) controls. Less studied are bacteria-bacteria interactions (facilitation, competition), which may further contribute to nonrandom distributions observed among microbial taxa (e.g., Horner-Devine et al., 2007). Examining co-occurrence patterns has the potential to unveil ecological processes that structure bacterial communities. Specifically, patterns of co-occurrence may reveal to what extent groups of microbes share habitat preferences, to what extent there may be ecological linkages among bacterial taxa and with other planktonic organisms, and the extent of phylogenetic closeness of co-occurring bacterial taxa given that closely related taxa may share life strategies and ecological traits.
Across northern landscapes, both regional (e.g., climate and the degradation state of permafrost) and local (e.g., nutrients, dissolved organic carbon and oxygen) conditions are likely to influence the distribution of bacterial communities of thaw ponds and lakes. These thermokarst systems show a high degree of limnological (Deshpande et al., 2015) and bacterial heterogeneity , making them suitable models to investigate the co-occurrence patterns among bacterial taxa as well as their network relationships within microbial consortia. The main objectives of this study were to characterize the ecological linkages within microbial communities as a response to permafrost thawing. Our hypotheses were that (i) bacterial communities follow co-occurrence patterns along the permafrost degradation gradient, due to distinct habitat preferences among bacteria, and (ii) these habitat preferences relate to differences in the phylogenetic structure of bacterial communities.
To test the above hypotheses, we employed highthroughput sequencing of the 16S rRNA gene to determine the composition of bacterial communities in thaw ponds and lakes of Nunavik (Québec, Canada) along a north-south permafrost degradation gradient. In addition, we sampled rockbasin lakes that were under the same regional climate but whose formation was not related to climate change. We investigated the relationships among bacterial taxa and local environmental conditions by means of network analysis, which has been applied with success elsewhere to evaluate microbial distribution patterns (Barberan et al., 2012;Peura et al., 2015;Steele et al., 2011) and responses to environmental perturbation (Araújo et al., 2011). We then examined the potential linkages between the bacteria and phytoplankton, phototrophic picoplankton and zooplankton biomass in the ponds and lakes.

Study sites and sampling
Surface water (0.2 m) from 29 thermokarst ponds was collected from 1 to 13 August 2012 in two types of permafrost landscapes. Thaw ponds were located in the vicinity of Whapmagoostui-Kuujjuarapik (W-K: lat 55 • 15 N, long 77 • 45 W) and Umiujaq (lat 56 • 32 N, long 76 • 33 W), within four valleys in the eastern Canadian subarctic, Nunavik, along a north-south permafrost degradation gradient as described in Comte et al. (2015): the Sasapimakwananisikw River valley (SAS) and the Kwakwatanikapistikw River valley (KWK), in sporadic, highly degraded permafrost landscapes (< 10 % permafrost coverage; see Bhiry et al., 2011, for details), and the Sheldrake River valley (BGR) and Nastapoka River valley (NAS), which are in discontinuous permafrost landscapes (10-50 % permafrost coverage). In addition, we sampled five rock-basin lakes as "reference lakes" (RBL) in catchments near the W-K village as a fifth valley type; these waters occupy glacially scoured basins, and their origin is not related to permafrost degradation.
At each site, temperature, conductivity, dissolved oxygen and pH were measured using a 600R multiparametric probe (YSI, Yellow Springs, OH, USA). Water for dissolved organic carbon (DOC) and chlorophyll a (Chl a) was filtered through Milli-Q water-rinsed 47 mm diameter, 0.22 µm pore size acetate filters and onto GF/F filters, respectively (Whatman, GE Healthcare Life Sciences, Little Chalfont, Buckinghamshire, UK). Water samples for total phosphorus (TP) and total nitrogen (TN) were preserved with H 2 SO 4 (0.15 % final concentration) until subsequent analyses.
Samples for zooplankton were collected using a 35 µm net, fixed in ethanol (final concentration: 75 %, v/v), and stored in cold (4 • C) dark conditions until analysis by inverted microscopy. Microbial abundance samples for flow cytometry (FCM) analysis were further collected and fixed with glutaraldehyde (final concentration: 2 %, v/v) and stored frozen at −80 • C until analysis.

Chemical and plankton analyses
DOC concentrations were analyzed on a Shimadzu TOC-5000A carbon analyzer and nutrients were analyzed using standard methods (Stainton et al., 1977). Colored dissolved organic matter (CDOM) was measured by spectrophotometric analysis of absorbance at 254 nm by water filtered through 0.2 µm pore size filters and the dissolved aromatic carbon content was determined using the SUVA 254 index (Weishaar et al., 2003).
Phytoplankton biomass was estimated as Chl a concentrations, which were determined using high-performance liquid chromatography (ProStar HPLC system, Varian, Palo Alto, CA, USA) following the procedures described in Bonilla et al. (2005). Zooplankton, specifically copepods, rotifers and cladocerans, were enumerated following the Utermöhl procedure (1958) and inverted microscopy (Zeiss Axiovert, Carl Zeiss Microscopy GmbH, Jena, Germany). Bacteria, picocyanobacteria and phototrophic picoeukaryotes were enumerated using a FACScalibur flow cytometer (BD, Mississauga, ON, Canada), equipped with an argon laser, at the lowest flow rate (12 µL min −1 ), using 1 µm yellow-green microspheres (Polysciences Inc., Warrington, PA, USA) in suspension as an internal standard. Bead concentration was controlled using Trucount Absolute counting tubes (BD, Mississauga, ON, Canada). Bacteria were stained by adding 20 µL of 50X SYBR Green I (Life Technologies, Thermo Fisher Scientific, Waltham, MA, USA) to 500 µL of sample for 10 min in the dark. Bacterial cells were then discriminated on the basis of their green fluorescence (FL1) and sidescatter signals (SSC) while excited at 488 nm, whereas phototrophic picoeukaryotes and picocyanobacteria were discriminated from unstained samples on the basis of their red autofluorescence (FL3) with a threshold in orange (FL2) and SSC. The resulting data were analyzed using the CellQuest Pro software with manual gating.

Bacterial community composition
Bacterial community composition (BCC) was determined by 454-pyrosequencing of the V6-V8 regions of the 16S rRNA gene. In brief, water was sequentially filtered through a 20 µm mesh net to remove larger organisms, a 47 mm diameter, 3 µm pore size polycarbonate filter (Whatman) and a 0.2 µm pore size Sterivex unit (EMD Millipore, Billerica, MA, USA) using a peristaltic pump. The filters were preserved with 1.8 mL of RNA later (Life Technologies) and stored at −80 • C until further processing. For this study, the bacterial community composition of the free-living fraction (< 3 µm) was examined. DNA was extracted from cells collected onto Sterivex units using the PowerWater Sterivex DNA Isolation Kit (MO BIO Laboratories Inc., Carlsbad, CA, USA) following the manufacturer's instructions. Extracted DNA was amplified in three separate 20 µL PCR reactions using 1 µL of template (three concentrations: undiluted, 0.5, and 0.2 times diluted) and a Phusion high-fidelity DNA polymerase kit (New England Biolabs, Whitby, ON, USA), as well as reverse 1406R and forward 969F primers with sample-specific tags as in Comeau et al. (2011). Amplicons were purified using a PCR purification kit from Feldan (QC, Canada), quantified spectrophotometrically (Nanodrop, ND-1000, Wilmington, DE, USA) and sequenced using Roche/454 GS FLX Titanium technology at Plateforme d'Analyses Génomiques, Institut de Biologie Intégrative et des Systèmes, Université Laval (Québec, Canada). The raw reads have been deposited in the NCBI database under the accession number SRP044372.
All sequence data processing was within the QIIME v1.8.0 pipeline (Caporaso et al., 2010b). Reads were first pre-processed by removing those with a length shorter than 300 nucleotides. The remaining reads were then processed through a QIIME denoiser. Denoised sequence reads were quality-controlled and chimeras were detected using UPARSE (Edgar, 2013). Operational taxonomic unit (OTU) sequence representatives were aligned using PyNAST (Caporaso et al., 2010a) with the pre-aligned Greengenes 16S core set (DeSantis et al., 2006) as a template and taxonomically classified using the Mothur Bayesian classifier (Schloss et al., 2009). The reference database was the SILVA reference database (Pruesse et al., 2007) modified to include sequences from our in-house, curated northern 16S rRNA gene sequence database . Sequences classified as plastid or mitochondrial 16S were removed from the analyses.

Phylogenetic analyses
All phylogenetic analyses were based on a phylogenetic tree constructed with an approximate maximum-likelihood (ML) approach using FastTree v.2.1 (Price et al., 2010) following the procedures described in Monier et al. (2015). UniFrac dw4000 (weighted) and duw4000 (unweighted) distances (Lozupone and Knight, 2005) among the different microbial communities were all computed based on the OTU approximate ML phylogenetic tree. Clustering of UniFrac distances was performed using the unweighted pair group method with arithmetic mean (UPGMA) algorithm, and cluster robustness was assessed using 1000 jackknife replicates (on 75 % subsets). Beta-diversity significance was assessed using UniFrac Monte Carlo significance test on dw4000 with 10 000 randomizations, as implemented in QIIME.
We investigated community phylogenetic diversity as defined by Faith (1992), along with other diversity metrics such as phylogenetic species richness and evenness (Helmus et al., 2007), using the R package "picante" v1.5 (Kembel et al., 2010). Community phylogenetic structure was investigated with the calculation of the net relatedness index (NRI) that measures the phylogenetic relatedness for each community. Specifically NRI determines whether OTUs are more closely related to co-occurring relatives than expected by chance (Webb et al., 2002).

Statistical analyses
All statistical analyses were carried out using R 3.0.3 (R Core Team, 2014). Abiotic and biotic environmental variables were log-transformed, with the exception of pH (already on a log scale). All analyses were performed on the subsampled data set (4000 sequences per sample) with a total number of 2166 OTUs.
Dissimilarities in community composition among the different valleys were visualized using cluster and principal coordinate analyses. A rank abundance plot was generated to identify the bacterial dominants.
The taxonomic uniqueness of sites as well as the taxa that contribute the most to these compositional differences was evaluated by means of local contribution to beta-diversity (LCBD; Legendre and De Cáceres, 2013). Differences in LCBD, phylogenetic diversity, species richness and structure across spatial scales were tested using ANOVA followed by Tukey's HSD test and regression models to identify links between site uniqueness and environmental variables.
Significant associations between the abundance of bacterial OTUs and the five valleys were further assessed by correlation indices (as a measure of habitat preferences), including the point biserial correlation statistic r pb and its groupequalized value r.g. as defined by De Cáceres and Legendre (2009). Permutation tests (1000 permutations) tested the null hypothesis that the abundance of OTUs in ponds of a given valley was not different from their abundances in ponds located in other valleys. Correction for multi-testing was applied using the method of Benjamini and Hochberg (1995) that controls the false discovery rate and is a less stringent condition than Bonferroni. OTUs that were significantly associated with valleys were submitted to BLASTn search in NCBI GenBank (http://blast.ncbi.nlm.nih.gov/Blast.cgi) to identify the lowest level of classification possible. A heat map was produced to examine the variability in the ecological preference among the 30 most abundant OTUs.

Co-occurrence patterns
Co-occurrence analyses were performed using the overall data set and each of the data sets for the five individual valleys. The data were filtered by using only those OTUs with a minimum of 20 reads and that were detected in at least three different ponds. This filtering step removed poorly represented OTUs and reduced the network complexity, resulting in a core community of 294 OTUs.
Randomness in co-occurrence of OTUs in the regional and individual valley data sets was tested in a null model using the quasiswap algorithm (Miklós and Podani, 2004) and Cscore metric (Stone and Roberts, 1990) under 50 000 simulations. SES (standardized effect size) was used as a measure of OTU segregation as described in Heino and Grönroos (2013) in order to determine whether this may relate to the overall environmental heterogeneity, the heterogeneity in biotic and abiotic variables separately, or to specific environmental variables. Environmental heterogeneity was determined using homogenization of group dispersion (Anderson et al., 2006) and defined as the mean distances of ponds to the centroid (central point) of each valley. Analyses were conducted on Euclidean distances on standardized variables and based on 1000 permutations. Similarly, the homogenization of group dispersion method was used to determine whether communities among ponds within a given valley were more similar than within other valleys.
Network analyses were conducted on the filtered OTU data set. In addition, a total of eight physicochemical vari-ables (DOC; TP; TN; pH; SUVA 254 ; COND: conductivity; T: water temperature; DO: dissolved oxygen concentration) and 7 biotic variables (Chl a: phytoplankton biomass; BA: bacterial abundance; PC: abundance of picocyanobacteria; PE: abundance of phototrophic picoeukaryotes; Rot: abundance of rotifers; Clad: abundance of cladocerans; Cop: abundance of copepods) data were also included in the network. For each environmental variable, any missing data were estimated as the mean for the corresponding valley and all data were then normalized by subtracting the mean value for the overall study and dividing by the corresponding standard deviation.
To examine associations between the bacterial OTUs and their environment, we analyzed the correlations of the OTUs with each other and with biotic and abiotic variables using the maximal information coefficient (MIC; Reshef et al., 2011). The MIC value indicates the strength of the relationship between two variables and is analogous to R 2 in general linear models. MIC does not provide information on the sign of the association between two nodes, and we therefore extracted the linearity metric (MIC-ρ 2 ) from the edges of the network, which indicates the type of association: an MIC-ρ 2 value greater than 0.2 implies a strong nonlinear association and likely "non-coexistence" among OTUs (Reshef et al., 2011). Computations were carried out using MINE (Reshef et al., 2011). Following the procedure described in Peura et al. (2015), relationships with p < 0.05 were selected to construct networks, which corresponded to a MIC cutoff of 0.44 depending on the number of samples in our data set. Parameters for analysis were set to default, and false discovery rates (Benjamini and Hochberg, 1995) were below 0.03. MIC matrices were translated into networks using Cytoscape 3.2.0 (Shannon et al., 2003). Nodes represented bacterial OTUs as well as both biotic and abiotic variables, which were connected by edges that denote the strength of the relationship between two variables (MIC). The topology of the resulting undirected network was investigated using the package "igraph" (Csardi and Nepusz, 2006) in R and compared to an Erdős-Rényi random network of similar size. Following Peura et al. (2015), high degree nodes were defined as "hubs" and the implication of their removal for network topology was evaluated. Networks were then visualized in Gephi 0.8.2 (Bastian et al., 2009) using the Fruchterman-Reingold layout algorithm. Unconnected nodes were removed along with self-loops and duplicated edges.
The relationship between the connectivity of OTUs (as indicated by the degree value in the network) and their corresponding abundance was examined in generalized linear models in order to relax the normality assumptions. OTU abundance was first calculated per individual pond as the product of percent of total reads and total bacterial abundance. The total abundance of an OTU in the data set was then obtained by summing the abundance calculated for each pond. A heat map was produced to examine the variability in the ecological preference among the 30 most connected OTUs.

Bacterial phylogenetic structure
The phylogenetic composition of bacterial communities differed significantly among valleys (dw4000, UniFrac weighted significance test; p ≤ 0.01). The clustering and principal coordinate analyses (PCoA) based on weighted UniFrac distances (dw4000; Fig. 1a, b) suggested that communities within the SAS valley tend to clustered together, as did the KWK communities. However, a test for homogeneity of multivariate dispersions did not support this as no significant difference in the distance to group (valley) centroid was detected (P = 0.39, F = 1.08). Permafrost landscape type had a significant effect on phylogenetic composition (permutational analysis of variance on dw4000; R 2 = 0.31, P = 0.001). The reference lakes did not group together, likely reflecting their disparate catchment properties. The cluster analysis based on unweighted UniFrac distances indicated a stronger clustering according to permafrost landscape type (permutational analysis of variance on duw4000; R 2 = 0.51; P = 0.001) by comparison with weighted UniFrac distances (Fig. S1 in the Supplement; UniFrac unweighted significance test, p ≤ 0.01). The discrepancy between dw4000 and duw4000 patterns indicated the presence of a small number of highly abundant OTUs within different valleys (Fig. S2). In fact, only 18 OTUs had a > 1 % contribution to the total number of sequence reads.
Community phylogenetic analysis based on NRI indices showed that all site clusters had significant phylogenetic structure (positive NRI values; one-sample t test, t = 18.9, df = 33, P < 0.0001; Table S1 in the Supplement), indicating that bacterial communities within each valley were more closely related to each other than expected by chance. There was no significant difference in phylogenetic structure among valleys (ANOVA, P = 0.4; Fig. 1c), but there were large differences within individual valleys, with some ponds less phylogenetically clustered than others. For example, the NAS valley two ponds had higher NRI values than the majority of the ponds located within the valley. Ponds located within the SAS valley showed significantly higher phylogenetic species richness and diversity than the KWK, NAS and BGR valleys (PSR: P = 0.002, F = 5.6, R 2 = 0.36; PD: P < 0.0001, F = 11.3, R 2 = 0.55).

Spatial bacterial taxonomic distribution
The local contribution to beta-diversity (LCBD) values indicated the compositional uniqueness of local bacterial communities. One-way ANOVA showed that pond location had a significant influence on compositional uniqueness (F = 2.8, R 2 = 0.27, P = 0.04), with the rock-basin lakes having the highest LCBD estimates (Fig. S3). There was high variability among ponds within the same valley, and there was no significant difference in taxonomic uniqueness among permafrost valleys. Stepwise backward selection identified the best regression model for LCBD as a function of environmental variables (Table S2), with four environmental variables (F = 3.2, R 2 = 0.22, P = 0.03): DOC, conductivity, SUVA 254 and Chl a. Sites with a high degree of taxonomic uniqueness had high DOC content and conductivity but low Chl a concentrations. SUVA 254 made no significant contribution to the model (P = 0.07), and there was no relationship between LCBD, species richness and distance to the closest neighbor.
The thaw pond communities were dominated by OTUs that were assigned to Betaproteobacteria, particularly the order Burkholderiales, which was well represented in all communities (35.4 % of the total number of reads). Actinobacteria (24.5 % of total reads) were mainly represented by OTUs assigned to the family ACK-M1 (60.5 % of Actinobacteria reads). Among Bacteroidetes, which accounted for up to 15.7 % of the total number of reads, Sphingobacteriales were highly represented and were dominated by the family Chitinophagaceae that contributed up to 4.7 % of total number of reads. Other dominant OTUs were within the Verrucomicrobia (6.8 % of total reads) ( Table 1). Among the 30 most abundant taxa, some were highly associated with a specific valley, whereas others were not detected in certain valleys (Fig. 2a). This pattern remained when considering the ensemble of the 2166 OTUs (Fig. S4). Specifically, 272 OTUs (11.3 % of the 2166 detected in this data set) showed a significant association in the indicator value analysis (the point biserial statistic r.g.) considering habitat combinations. Among the 272 OTUs showing a significant habitat preference, 246 were associated with a single valley: 13, 12, 31, 99 and 91 OTUs were associated with the BGR, NAS, KWK, SAS and RBL valleys, respectively. Four OTUs were associated with the discontinuous permafrost landscape and three with the sporadic permafrost landscape (Table 2). There were distinctions between ponds located in the sporadic versus discontinuous permafrost landscapes. In particular, OTUs closely related to methanotrophs were prominent within the sporadic permafrost landscape type: OTUs closely related to Methylotenera (OTU 10) and Methylobacter (OTU 9) were among the five most abundant taxa at SAS sites (3.5 and 3.6 % of the total number of SAS reads, respectively) and OTUs assigned to methanotrophic Verrucomicrobia LD19 (in the class Methylacidiphilae) was one of the most abundant at the KWK site (Fig. 2a, 1.4 % of KWK reads).

Bacterial co-occurrence patterns
To test for differences in co-occurrence patterns between microbial communities across the permafrost landscape, we first selected OTUs that had at least 20 reads and were detected in at least three different ponds. The bacterial Table 1. Five most abundant (number of reads) OTUs across spatial scales. Finest taxonomy assignments are presented with a minimum confidence of 0.8. OTUs were not randomly distributed among the different valleys when considering the entire region (C score = 35.7, P < 0.0001, SES = 25.4). At the individual valley scale, the OTUs were not randomly distributed among ponds except for BGR valley (Table 3). No significant relationship was detected between the level of OTUs segregation, determined by SES, and the overall environmental heterogeneity, and both abiotic and biotic heterogeneity. In addition, no significant relationship between SES and individual environmental variables was detected.
The OTU co-occurrence patterns as well as the relationships among both biotic and abiotic variables were investigated by network analysis. The most connected nodes (degree > 10) were related to three abiotic variables (DOC, conductivity and TP) and one biotic variable (phototrophic picoeukaryotes). The topology of the networks is presented in Table 4. For the whole regional network, a total of 248 nodes and 968 edges were detected, which was fragmented in 3 components including 2 small components composed of 2 and 3 nodes (Fig. S5). The observed characteristic Table 2. Results of indicator species analysis. Valley refers to the valley (or combination of valleys) for which the OTUs obtained the highest correlation. We indicate the correlation value (r.g.) and its statistical significance (P ) at α = 0.05. Only OTUs with r.g. ≥ 0.6 are presented when associated with one valley (top 10 are presented for the KWK and SAS valleys). OTUs were classified at their finest taxonomic levels based on similarity to sequences in GenBank. path length of 3.06 and clustering coefficient of 0.25 were both greater than estimates originating from the random network of similar size. In addition, the observed : random network clustering coefficient ratio (log response ratio of 0.92) showed that the network had "small world" properties -i.e., the nodes were more connected than expected in a random network (Table 4). The frequency distribution of nodes followed a power law function, which indicated that the network was composed of a few highly connected nodes, as opposed to an even distribution of connectivity (Fig. S6). Four main bacterial phyla were well represented in the networks: Proteobacteria (83 nodes), Bacteroidetes (56 nodes), Actinobacteria (42 nodes), and Verrucomicrobia (24 nodes). Although edges between nodes that referred to bacterial OTUs dominated the network, connections between bacterial OTUs and both biotic and abiotic variables were detected (Fig. S5). For example, conductivity and DOC were amongst the most connected nodes, illustrating their importance in the network. The subnetwork built around DOC showed a diverse bacterial consortium with a slight dominance of Acti-  (Fig. 3a). Phototrophic picoeukaryotes were the most connected node among biotic variables. The subnetwork built around that variable showed strong co-occurrence between picoeukaryotes and Actinobacteria (Fig. 3b). The co-occurrence network around the group Chitinophagaceae showed that these OTUs were associated with different environmental variables including DOC, dissolved oxygen, conductivity, abundance of phototrophic picoeukaryotes, clado- Table 4. Topology of the thermokarst systems co-occurrence networks. Regional corresponds to a network built around the selected 294 OTUs whereas Hubs refers to a network where the most connected 24 OTUs from the whole network (Fig. S5a)  cerans and rotifers (Fig. 4a) and had recurrent, strong cooccurrences with Actinobacteria, especially with organisms closely related to ACK-M1 (Fig. 4b). The analysis of the linearity of the latter association indicated a positive cooccurrence between OTUs closely related to members affiliated with the ACK-M1 (aka AcI) group of Actinobacteria and Chitinophagaceae (Fig. 5c). Other examples of strong linkages between OTUs are given in Fig. 5, with illustrations of positive co-occurrence (Fig. 5a) and non-coexistence (Fig. 5b). In general, our results indicated that the most abundant OTUs were also the most connected ones (R 2 = 0.25, P < 0.001, Fig. S7). However, some of the most connected nodes (OTUs) had low abundance (Table S3, Fig. 2b). It is worth noting that some of these bacterial hubs showed some level of habitat preference, especially within KWK valley (Fig. 2b). In addition, these "valley-specific" hubs were mainly related to Actinobacteria and Betaproteobacteria (Fig. 2b).
We further investigated the implications of the removal of the top 24 connected OTU nodes (hubs), which represented a removal of 10 % of nodes, and the results showed a high level of fragmentation of the network and a drop in node degree (Table 4, Fig. S8).
Analysis of the network hubs further showed that the top 24 were mainly composed of Actinobacteria OTUs, in particular members of Actinomycetales and Acidimicrobiales. In addition, OTUs assigned to Betaproteobacteria represented a large fraction of these highly connected OTUs, including the typical freshwater Limnohabitans, whereas Verrucomicrobia and Bacteroidetes were represented by only a few highly connected OTUs. Interestingly, the anaerobic photosynthetic sulfur bacteria Chloroflexi were also identified as a hub in the overall network (Table S3).

Discussion
The main goal of the present study was to identify cooccurrence patterns among bacterial communities in thaw ponds and lakes in the changing subarctic landscape. Consistent with our first hypothesis, there was a nonrandom distribution of bacterial taxa across the distinct valleys sampled in this study. The results showed that thaw ponds communities from the same valley, especially those located in the sporadic permafrost landscape, tended to be more similar in terms of bacterial community composition than communities originating from ponds located in other valleys. Furthermore, the thaw ponds differed taxonomically from the rock-basin reference lakes, with specific bacterial OTUs associated with a particular valley or permafrost landscape type. Contrary to our second hypothesis, that differences in habitat preferences among bacterial communities were related to distinct phylogenetic structure, we found no evidence for differences in the community phylogenetic relatedness between the different valleys. The same bacterial phyla occurred throughout the region, and variability among ponds in the same valley was greater than the differences among valleys.

Local community composition uniqueness and habitat preference among bacterial communities
Non-random distribution patterns among bacterial taxa were detected, indicating that bacterial taxa in our study region tended to co-occur more than expected by chance. Nonrandom assembly patterns indicate the dominance of deterministic processes such as environmental filtering in shaping community composition (Horner-Devine et al., 2007). The bacterial communities of freshwater ecosystems elsewhere (Eiler et al., 2011), as well as in certain terrestrial (Barberan et al., 2012) and marine (Steele et al., 2011) ecosystems, have also been reported to have distributional patterns that relate to the environment. Such patterns may depend on niche breadth and competitive abilities (Székely et al., 2013), grazing and viral lysis susceptibilities (Chow et al., 2014;Miki, 2008) and dispersal capabilities (Fahlgren et al., 2010;Hervàs and Casamayor, 2009). The patterns described here are for the free-living fraction of bacterial assemblages, which raises the question of whether such patterns remain for the attached Figure 3. Subnetworks organized around DOC (a) and phototrophic picoeukaryotes (b). Subnetworks were extracted from the entire cooccurrence network (Fig. S5). In (a), edge color refers to the type of relationship with significant connection between OTUs and both biotic and abiotic variables presented in black, whereas relationships between bacterial taxa are presented in grey. In (b), edge color is proportional to the association strength, with strong associations shown in black. The size of the nodes is proportional to node degree (the number of connections that a node has with other nodes). fraction of the communities. The latter may represent a substantial part of the total communities given that these waterbodies can contain a large content of suspended solids. Previous studies comparing the compositional patterns in bacterial communities between the free-living and attached fractions showed that these two distinct lifestyle have a similar community composition , indicating that the patterns described here may reflect patterns for the entire community. No significant relationship was found between distribution patterns and environmental heterogeneity. This was unexpected, as previous studies have shown that thermokarst systems are heterogeneous environments with marked differences in community composition across the different valleys associated with distinct environmental variables Comte et al., 2015). In agreement with Heino and Grönroos (2013), we suggest that the relationship between distribution pattern and environmental heterogeneity may be scale-dependent such that environmental heterogeneity may have effects on the bacterial taxa distribution patterns at the overall study region scale and not at the valley scale as tested here. The results did show differences in the phylogenetic composition of bacterial communities among the different valleys, which highlight distinct habitat preferences among taxa (Figs. 2, S4). In particular, the combination of LCBD and regression analyses indicated that the compositional uniqueness of thaw ponds and lakes was positively related to DOC concentrations, a well-known determinant of bacterial communities and processes (Kritzberg et al., 2006;Ruiz-González et al., 2015). Along with the variations in permafrost degradation state across the study region, there were also differences among valleys in terms of availability and origin of carbon subsidies. The northern sites are located within the discontinuous permafrost area where most of the soil remains frozen and is thus not available for microbial degradation, while in the southern sporadic area, permafrost is highly degraded (Bouchard et al., 2014) and large amounts of ancient permafrost carbon may be available for microbial processes. Consistent with this pattern, elevated concentrations and high rates of CO 2 and CH 4 emission to the atmosphere have been reported among the southern sites within the most degraded area of permafrost (Laurion et al., 2010;Deshpande et al., 2015). This may in turn explain the significantly higher bacterial richness and diversity observed in SAS thaw ponds communities and why OTUs assigned to methanotrophic bacteria such as Methylobacter and Methylotenera were amongst the most abundant detected in this valley (Fig. 2). In addition, SAS sites originated from palsas (organic permafrost mounds) and were likely different in DOC composition relative to other valleys, where the ponds were formed by the thawing of lithalsas (mineral permafrost mounds). This is consistent with recent observation of a direct link between community composition and the degradation of terrestrially derived DOM (Logue et al., 2015).

Bacterial phylogenetic structure
The mean NRI across all communities was significantly greater than zero. This provides evidence for a dominant role of environmental filtering on community composition (Kembel, 2009). The corollary is that a set of environmental variables constrained community composition, resulting in taxa that were closer phylogenetically and more ecologically similar than if stochastic processes (including dispersal) drove community assembly. In fact, there is no corridor such as streams that connects the ponds, and thus local dispersal processes are unlikely to explain the local phylogenetic structure of the thaw pond communities . Similar results were obtained for microbial community studies in the ocean  and on groundwater communities (Stegen et al., 2012).
No significant difference in NRI was found among the different valleys, but this result likely reflects the high variability within individual valleys. In particular, two ponds in the NAS valley had higher values of NRI in comparison to their neighboring ponds. These two ponds had specific environmental characteristics including high concentrations of suspended clay particles and low phytoplankton concentrations, which may have favored certain environmental specialists. The rock-basin waters had higher NRI values than the thaw ponds, indicating that their assemblages were more ecologically similar to each other than those originating from thaw ponds and lakes. This could relate to their respective histories in that the rock-basin lakes originate from deglaciation followed by retreat of the Tyrrell Sea ca. 8000 years ago and have thus been exposed to longer-term ecological processes.
The extent of permafrost erosion (permafrost landscape type) appeared to influence phylogenetic structure. When controlling for the two outliers mentioned above (NAS-A and NAS-B), the northern communities (BGR, NAS) had a greater phylogenetic distance among co-occurring taxa than expected by chance (lower NRIs) in comparison to communities from the thaw ponds located in valleys from sporadic permafrost (KWK, SAS). This suggests that taxa from SAS valley (and to a lesser extent KWK) tend to be more ecologically similar to each other than those from northern valleys, reflecting strong environmental filtering by variables such as DOC concentration, as previously documented in this valley . These findings are in line with studies elsewhere that showed that clustered communities are mainly retrieved from environments that have constrained environmental conditions .

Network associations
The extent to which closely related bacterial taxa may coexist is still a subject of considerable discussion (Mayfield and Levine, 2010). Previous studies on aquatic microbial communities have shown that closely related taxa have coherent temporal dynamics and share similar ecological niches (Andersson et al., 2009;Eiler et al., 2011). Cooccurrence networks enable the depiction and visualization of co-occurrence patterns among OTUs, and they provide a way of identifying potential ecological niches within microbial consortia. Network analyses have recently been applied to a wide range of microbial communities and biomes, and specific associations among bacterial OTUs and with environmental variables have been reported (Barberan et al., 2012;Chow et al., 2014;Eiler et al., 2011;Steele et al., 2011).
Our results point toward the importance of environmental filtering for community assembly in thaw ponds and lakes. In co-occurrence networks, correlations between OTUs and environmental variables highlight the conditions that may favor particular assemblages. Specifically, our co-occurrence networks identified two abiotic variables (DOC and conductivity) to be among the most connected nodes (Fig. S5b), and these variables separated according to landscape type: the northern ponds located in the discontinuous permafrost landscape had high conductivity and low DOC, whereas southern sites within the sporadic permafrost landscape had high DOC and lower conductivity (Table S2; further details are given in Comte et al., 2015). The analysis of the DOC subnetwork showed that only a few OTUs were significantly and directly related to DOC; these included OTUs assigned to Actinobacteria as well as OTUs closely related to bacterial methanotrophs and taxa involved in the degradation of complex organic polymers (Fig. 3a). Among phylogenetically related microbes, unique combinations tended to co-occur (Fig. 4a). For example, some OTUs assigned to the Chitinophagaceae appeared to be significantly related to different abiotic and biotic variables, which in turn suggested niche separation.
In addition to the bottom-up factors that shape bacterial communities, recent work on microbial networks has highlighted the role of top down processes such as grazing and viral lysis in affecting prokaryotic community structure and cooccurrence patterns (Chow et al., 2014;Steele et al., 2011). In the present study, phototrophic picoeukaryote abundance (degree = 14) was the most connected biotic node. Only phototrophic picoeukaryotes were enumerated in this study, and although some may have a mixotrophic grazing capacity, their network importance may be the result of other factors, for example the release of photosynthate or their occurrence under conditions that mutually favor both themselves and certain bacterial taxa.
In general, relationships among microbes dominated the network, rather than those between microbes and abiotic or biotic environmental parameters (Fig. S5). There was overlap in terms of community composition among the different valleys (Fig. 1), with shared dominant taxa (Table 1, Fig. S2). Although this may indicate that some OTUs may respond similarly to specific environmental factors and outcompete others, some associations may be the result of substrate interdependencies. One example is the relationship between bacteria able to degrade chitin and others that take up the resulting hydrolysis products (Beier and Bertilsson, 2013). OTUs closely related to bacteria in the Chitinophagaceae, a group known to be involved in the degradation of chitin and other complex polymeric organic matter (del Rio et al., 2010), were well represented in our study area, and have also been found in other cold terrestrial environments (Franzetti et al., 2013;Ganzert et al., 2011). The subnetwork built around this group showed that these OTUs are linked to other phyla (Fig. 4a), notably certain Actinobacteria (Fig. 4b). The dominants were closely related to clade Ac1, which is known to include specialists that use hydrolysis products from chitinolytic bacteria (Beier and Bertilsson, 2011). The analysis of linearity of the associations between the corresponding OTUs showed a positive co-occurrence (Fig. 5c), consistent with bacterial network relationships. Although other examples of positive co-occurrence among bacterial OTUs were identified in the data set (Fig. 5a), there was also evidence of "non-coexistence" (sensu Reshef et al., 2011) among certain OTUs: in the northern, less degraded permafrost valley (BGR), OTU 1242 (Betaproteobacteria Limnohabitans) dominated, whereas in the southern highly degraded permafrost valleys (SAS, KWK), OTU 14 (Actinobacteria ACK-M1) dominated (Fig. 5b). These tradeoffs among OTUs were partially explained by the geographic location of the valleys, suggesting that environmental variables not only drive the composition of the bacterial assemblages within the individual valleys but may also determine the ecological associations within microbial consortia. Furthermore, the positive relationship found between the connectivity and the habitat specificity among the most abundant OTUs (Fig. 2a) is most likely driven by the dominance of highly connected OTUs in the southern highly degraded permafrost valleys in comparison to the northern less degraded permafrost valleys. In addition, the OTUs retrieved from the southern thaw ponds were closely related to specific bacterial functional groups such as methanotrophs and nitrogen fixing bacteria (Fig. 5).
The microbial networks for the thermokarst systems had "small world" properties, with only a few, highly connected nodes, which can be viewed as "keystone species". This property would render the networks more resistant to environmental change, but vulnerable to the loss of these nodal species (Montoya et al., 2006). The bacterial hubs were identified as typical freshwater, terrestrial and marine taxa (Table S3), and some of them were closely related to taxa that are involved in key biogeochemical processes such as nitrogen fixation and degradation of complex polymers, or that are known to be restricted in niche breadth, for example to cold environments. In accordance with Peura et al. (2015), the importance of a taxon in a microbial network may be less associated with its abundance but instead determined by its connectivity, as represented by node degree for example. Thus many of the hub taxa identified in this study could be defined as keystone microbial species (Table S3). These "keystone" OTUs identified as hubs were not merely the abundant OTUs (Fig. 2b), but some were rare and potentially important actors for the functioning of these ecosystems. For example, the nitrogen-fixing bacterium Beijerinckia was among the most connected nodes in the co-occurrence network despite its low relative abundance. This in turn highlights the potentially important ecological role of diazotrophs in these nutrient-rich aquatic systems.

Conclusions
The thaw ponds and lakes sampled in the present study showed large variability in their bacterial community structure, even among waterbodies in a single valley. This underscores the heterogeneous nature of permafrost aquatic environments, and is consistent with their known limnological variability. A small number of taxa occurred in high abundance and dominated many of the communities; these northern dominants included members of the betaproteobacterial order Burkholderiales and the actinobacterial family ACK-M1; other dominants included members of the Bacteroidetes family Chitinophagaceae and Verrucomicrobia. Despite this variability and the existence of common taxa, there were taxonomic differences among different valleys and between permafrost landscape types, implying some degree of habitat selection.
The bacterial networks further showed that DOC and conductivity played an important role in the co-occurrence patterns of bacterial OTUs, corresponding at least in part to differences in these two environmental variables among valleys (Table S2). Strong positive associations as well as noncoexistence among OTUs were detected, and the resultant networks were composed of a limited number of highly connected OTUs. This "small world network" property would render these communities more resistant to environmental change but sensitive to the loss of their hub OTUs, which themselves showed some degree of habitat specificity. With ongoing global warming, these waters are likely to experience the effects of increased permafrost erosion and associated changes in their chemical environment, including shifts in DOC and conductivity. If such changes eventually cause the loss of "keystone species" that form the hubs of the present microbial networks, there would be a major disruption of community structure, with potentially large biogeochemical consequences.
The Supplement related to this article is available online at doi:10.5194/bg-13-175-2016-supplement.