This is an R Markdown Notebook. When you execute code within the notebook, the results appear beneath the code.

Try executing this chunk by clicking the Run button within the chunk or by placing your cursor inside it and pressing Ctrl+Shift+Enter.

For further analysis and data, please refer to the accompanied github repo:

https://github.com/CoraHoerstmann/Kerguelen2021

Data:

The datasets can be downloaded from PANGEAE: https://doi.org/10.1594/PANGAEA.885896 (Hoerstmann et al. 2018)

genomic sequencing data: https://www.ebi.ac.uk/ena/browser/view/PRJEB29488 (Study name: MD206_genomics, accession no: PRJEB29488)

References for used Rpackages:

! For package version use packageVersion() and see within notebook

tidyverse: Wickham et al., (2019) (https://doi.org/10.21105/joss.01686)

ggplot2: Wickham (2016) (https://ggplot2.tidyverse.org)

vegan: Oksanen et al. 2019 (https://CRAN.R-project.org/package=vegan)

zCompositions: Palarea-Albaladejo et al. (2015) (http://dx.doi.org/10.1016/j.chemolab.2015.02.019)

iNEXT: Hsieh et al. (2020) (http://chao.stat.nthu.edu.tw/wordpress/software-download/)

geodist: Padgham & Sumner (2019) (https://CRAN.R-project.org/package=geodist)

reshape2: Wickham (2007) (http://www.jstatsoft.org/v21/i12/)

ape: Paradis & Schliep (2018)

gdm: Fitzpatrick et al. (2020) (//CRAN.R-project.org/package=gdm)

upsetR: Gehlenborg (2019) (https://CRAN.R-project.org/package=UpSetR)

phyloseq: McMurdie & Holmes (2013) (http://dx.plos.org/10.1371/journal.pone.0061217)

magrittr: Bache & Wickham (2014) (https://CRAN.R-project.org/package=magrittr)

Acknowledgement for decisions for statistical analyses:

Buttigieg PL, Ramette A (2014) A Guide to Statistical Analysis in Microbial Ecology: a community-focused, living review of multivariate data analyses. FEMS Microbiol Ecol. 90: 543–550. https://mb3is.megx.net/gustame/home

Productivity analysis

this is kept seperate from microbial analysis because in microbial analysis we only use the mean vaules of the rates. In this part of the analysis we plot the biological rates against environmental data (temperature) and check for significance between ocean provinces and watermasses.

import datasets

import, rename the columnheaders and associate provincecs and watermasses

## [1] change column names
## [1] associate provinces and watermasses

Nutrient plots

Dissolved inorganic nutrients (nitrate, phsophate and silicate) and particulate organic carbon to particulate organic nitrogen ratio are plotted against sea surface temperature. Temperature is the best predictor of watermasses and strongest correlate with environmental data and was thus used in our dataset for data representation. Samples are colored according to water mass (WM)

We performed two sample t-test to check for significant differences between ocean provinces.

## # A tibble: 13 x 11
##    Event mean_NO3 min_NO3 max_NO3 mean_P min_P max_P mean_Si min_Si max_Si     n
##  * <chr>    <dbl>   <dbl>   <dbl>  <dbl> <dbl> <dbl>   <dbl>  <dbl>  <dbl> <int>
##  1 OISO~    24.9    23.8    25.6  1.66   1.64  1.68  20.0    19.2   20.5       3
##  2 OISO~     8.12    6.98    9.53 0.75   0.71  0.8    0.493   0.365  0.685     6
##  3 OISO~     1.49    1.10    1.87 0.267  0.24  0.3    0.0267  0      0.06      3
##  4 OISO~     0       0       0    0.158  0.13  0.205  0.415   0.315  0.48      6
##  5 OISO~     0       0       0    0.09   0.05  0.17   1.42    1.36   1.48      6
##  6 OISO2     0       0       0    0.0333 0.03  0.04   0.937   0.905  0.965     6
##  7 OISO3     0       0       0    0.0617 0.055 0.07   0.698   0.68   0.73      3
##  8 OISO~    24.9    24.4    25.3  1.70   1.7   1.70  26.9    26.4   27.3       6
##  9 OISO4     0       0       0    0.103  0.1   0.11   0.06    0.035  0.075     6
## 10 OISO6    20.2    19.2    20.8  1.45   1.40  1.48   2.56    2.4    2.66      6
## 11 OISO7    22.4    22.4    22.6  1.54   1.52  1.55   6.05    5.92   6.14      6
## 12 OISO9    22.1    20.8    23.5  1.59   1.54  1.64   7.14    6.74   7.54      2
## 13 OISOE    19.2    17.7    20.0  1.40   1.34  1.43   1.23    1.10   1.30      6
## # A tibble: 6 x 11
##   Event  mean_NO3 min_NO3 max_NO3 mean_P min_P max_P mean_Si min_Si max_Si     n
## * <chr>     <dbl>   <dbl>   <dbl>  <dbl> <dbl> <dbl>   <dbl>  <dbl>  <dbl> <int>
## 1 OISO11     24.9    24.9    24.9   1.66  1.66  1.66   20.0   20.0   20.0      1
## 2 OISO37     24.9    24.9    24.9   1.70  1.70  1.70   26.9   26.9   26.9      1
## 3 OISO6      20.2    20.2    20.2   1.45  1.45  1.45    2.56   2.56   2.56     1
## 4 OISO7      22.4    22.4    22.4   1.54  1.54  1.54    6.05   6.05   6.05     1
## 5 OISO9      22.1    22.1    22.1   1.59  1.59  1.59    7.14   7.14   7.14     1
## 6 OISOE      19.2    19.2    19.2   1.40  1.40  1.40    1.23   1.23   1.23     1

Rate measurements

calculate specific PP and specific N2 fix

To calculate biomass specific primary productivity (PB) and chlorophyll-specific N2 fixation we normalized the rate measurements by chlorophyll concentration.

PB is mainly a function of irradiance but can also sometimes be used to check phtoplankton efficiency

N2 fixation, normalized by chlorophyll a, gives insights about the relative importance of N2 fixation to primary productivity. As seen in our data, although we were able to measure N2 fixation throughout the entire sampling region, the relative importance of N2 fixation to PP in the Southern Ocean (SO) is low because other N sources (nitrate) are available and PP is >> N2 fixation.

Carbonfix$PB <- Carbonfix$cfix.day/Carbonfix$chl.a_ug.l

plot rates against sea surface temperature

We plotted rate measurements and specific rate measurements against sea surface temperature and colored the samples according to water masses (WM). (Fig. 3 and Fig. A10)

## Loading required package: rcompanion
## # A tibble: 13 x 5
##    Event  mean_c.fix min_PP max_PP     n
##  * <chr>       <dbl>  <dbl>  <dbl> <int>
##  1 OISO11      1542. 1021.   2065.     3
##  2 OISO14      2636.  994.   3847.     6
##  3 OISO15      1885. 1580.   2342.     3
##  4 OISO16       378.  170.    538.     6
##  5 OISO18       301.  226.    371.     6
##  6 OISO2        190.   63.2   325.     6
##  7 OISO3        643.  350.    846.     3
##  8 OISO37      1186.  587.   1876.     6
##  9 OISO4       1069.  524.   1877.     6
## 10 OISO6       1978.  676.   4242.     6
## 11 OISO7       2129.  943.   4305.     6
## 12 OISO9       3395. 3237.   3553.     2
## 13 OISOE       2646. 2212.   3115.     6
## # A tibble: 13 x 5
##    Event  mean_PB min_PB max_PB     n
##  * <chr>    <dbl>  <dbl>  <dbl> <int>
##  1 OISO11   1683.  1115.  2255.     3
##  2 OISO14    665.   248.   979.     6
##  3 OISO15    477.   400.   593.     3
##  4 OISO16    790.   271.  1341.     6
##  5 OISO18    563.   364.   762.     6
##  6 OISO2     484.   257.   762.     6
##  7 OISO3    1215.   662.  1599.     3
##  8 OISO37    795.   118.  1628.     6
##  9 OISO4     531.   315.   841.     6
## 10 OISO6     784.   296.  1609.     6
## 11 OISO7     843.   283.  1889.     6
## 12 OISO9    1924.  1834.  2013.     2
## 13 OISOE     567.   477.   762.     6
## 
##  Pearson's product-moment correlation
## 
## data:  Cfix_all$MLD..m. and Cfix_all$mean_PB
## t = 0.7288, df = 11, p-value = 0.4814
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.3814760  0.6846442
## sample estimates:
##       cor 
## 0.2146197
## # A tibble: 13 x 7
##    Event  MQR_station mean_N2.fix     min   max     n na.rm
##  * <chr>        <dbl>       <dbl>   <dbl> <dbl> <int> <lgl>
##  1 OISO11       1.22        0.892  0.225   2.20     3 TRUE 
##  2 OISO14       0.718       0.782 -0.0200  2.26     6 TRUE 
##  3 OISO15       1.16        0.239 -0.397   1.43     3 TRUE 
##  4 OISO16       1.32        1.05   0.388   2.21     6 TRUE 
##  5 OISO18       5.00        4.04   0.939   7.92     6 TRUE 
##  6 OISO2        2.62        1.58   0.701   2.88     6 TRUE 
##  7 OISO3        5.44        2.81   0.649   6.91     3 TRUE 
##  8 OISO37       1.22        1.97   0.759   3.09     6 TRUE 
##  9 OISO4        3.54        2.01   0.109   4.92     6 TRUE 
## 10 OISO6        0.879       0.927  0.174   3.25     6 TRUE 
## 11 OISO7        1.19        1.75   1.00    4.39     6 TRUE 
## 12 OISO9        0.753       0.882  0.193   2.15     3 TRUE 
## 13 OISOE        0.651       0.924  0.177   2.09     6 TRUE 
## # A tibble: 5 x 5
##   WM     mean   min   max     n
## * <chr> <dbl> <dbl> <dbl> <int>
## 1 AZ    1.34  0.892 1.97     15
## 2 ISSG  2.31  1.05  4.04     21
## 3 PFZ   1.27  0.882 1.75     14
## 4 SAF   0.601 0.239 0.782     9
## 5 STF   2.01  2.01  2.01      6

PN and delta 15N ratios (Fig. A4)

##     Event    PN_CHL province   WM
## 1   OISO3 11.482539     ISSG ISSG
## 2  OISO16 11.459569     ISSG ISSG
## 3   OISO2 12.158775     ISSG ISSG
## 4  OISO18 29.737791     ISSG ISSG
## 5   OISO4  7.197732     SSTC  STF
## 6  OISO14  5.919934     SSTC  SAF
## 7  OISO15  2.708091     SSTC  SAF
## 8   OISO7  4.701511       SO  PFZ
## 9   OISO9  6.729712       SO  PFZ
## 10  OISO6  5.616233       SO  PFZ
## 11 OISO37  2.805612       SO   AZ
## 12 OISO11 15.286551       SO   AZ
## 13  OISOE  3.942750       SO   AZ

### import microbial datasets files and data prep

metdata calculates mean for carbon fixation and N2 fixation for using it as metadata for microbial analyses.

Pigment analyses

We used High throughput pigment liquid chromatography (HPLC) to identify different pigments in our samples. Pigments can be used as markers for phytoplankton taxa and size classes and were calculated after Hirata et al. 2011 and Uitz et al. 2006. How to calculate the diagnostic pigments see supplementary table 4

We tested significance of pigment composition between ocean provinces with PERMANOVA and checked for individual pigments using Welch two-sample t-test.

Size classes had a significant second order polynominal fit.

(Fig. A5)

## Permutation test for adonis under reduced model
## Terms added sequentially (first to last)
## Permutation: free
## Number of permutations: 999
## 
## adonis2(formula = all_pig_T0[7:27] ~ province, data = all_pig_T0, method = "bray")
##          Df SumOfSqs      R2    F Pr(>F)    
## province  2  1.74815 0.76496 17.9  0.001 ***
## Residual 11  0.53714 0.23504                
## Total    13  2.28529 1.00000                
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Permutation test for adonis under reduced model
## Terms added sequentially (first to last)
## Permutation: free
## Number of permutations: 999
## 
## adonis2(formula = all_pig_T0[7:27] ~ WM, data = all_pig_T0, method = "bray")
##          Df SumOfSqs      R2      F Pr(>F)    
## WM        4  1.86158 0.81459 9.8856  0.001 ***
## Residual  9  0.42371 0.18541                  
## Total    13  2.28529 1.00000                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##     Event  chl.c3    fuco     zea dv_chla province   WM
## 1  OISO10 0.08030 0.10100 0.00000 0.00000       SO   AZ
## 2  OISO11 0.03684 0.06139 0.00161 0.00000       SO   AZ
## 3  OISO14 0.10000 0.08157 0.03120 0.00000     SSTC  SAF
## 4  OISO15 0.18600 0.06843 0.04070 0.00716     SSTC  SAF
## 5  OISO16 0.00000 0.00325 0.03430 0.01553     ISSG ISSG
## 6  OISO18 0.00000 0.00000 0.05660 0.03250     ISSG ISSG
## 7   OISO2 0.00000 0.00806 0.05720 0.02627     ISSG ISSG
## 8   OISO3 0.00000 0.00614 0.03810 0.01772     ISSG ISSG
## 9  OISO37 0.07509 0.54235 0.01090 0.00000       SO   AZ
## 10  OISO4 0.02440 0.01970 0.04040 0.00000     SSTC  STF
## 11  OISO6 0.04130 0.10500 0.00417 0.00000       SO  PFZ
## 12  OISO7 0.07770 0.13400 0.00578 0.00000       SO  PFZ
## 13  OISO9 0.07260 0.07100 0.00098 0.00000       SO  PFZ
## 14  OISOE 0.14434 0.43000 0.00117 0.00000       SO   AZ
## 
## Call:
## lm(formula = pigments_T0$Microplankton ~ poly(pigments_T0$Temp, 
##     2))
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.157301 -0.095663  0.006094  0.052645  0.230833 
## 
## Coefficients:
##                            Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                 0.29052    0.03259   8.914  2.3e-06 ***
## poly(pigments_T0$Temp, 2)1 -0.68428    0.12194  -5.611 0.000158 ***
## poly(pigments_T0$Temp, 2)2  0.25995    0.12194   2.132 0.056411 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1219 on 11 degrees of freedom
## Multiple R-squared:  0.7661, Adjusted R-squared:  0.7236 
## F-statistic: 18.02 on 2 and 11 DF,  p-value: 0.0003384

## 
## Call:
## lm(formula = pigments_T0$Picoplankton ~ poly(pigments_T0$Temp, 
##     2))
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.054313 -0.012681  0.000045  0.016527  0.062581 
## 
## Coefficients:
##                            Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                0.248327   0.008615  28.824 1.03e-11 ***
## poly(pigments_T0$Temp, 2)1 0.906679   0.032236  28.127 1.34e-11 ***
## poly(pigments_T0$Temp, 2)2 0.120994   0.032236   3.753  0.00319 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.03224 on 11 degrees of freedom
## Multiple R-squared:  0.9865, Adjusted R-squared:  0.9841 
## F-statistic: 402.6 on 2 and 11 DF,  p-value: 5.162e-11

## 
## Call:
## lm(formula = pigments_T0$Nanoplankton ~ poly(pigments_T0$Temp, 
##     2))
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.190742 -0.058067 -0.009707  0.074205  0.168298 
## 
## Coefficients:
##                            Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                 0.46115    0.02997  15.385 8.72e-09 ***
## poly(pigments_T0$Temp, 2)1 -0.22240    0.11215  -1.983  0.07289 .  
## poly(pigments_T0$Temp, 2)2 -0.38094    0.11215  -3.397  0.00596 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1122 on 11 degrees of freedom
## Multiple R-squared:  0.5844, Adjusted R-squared:  0.5089 
## F-statistic: 7.735 on 2 and 11 DF,  p-value: 0.007989

GENOMIC ANALYSES

data transformation clr and hellinger

to account for compositionality of sequencing data, data needs to be normalized. Most robust is centered-log-ratio transformation, however, it gives negative values and thus, hellinger is better for some analyses.

We choose 3 as the appropriate row-sum cutoff to exclude low abundant ASVs.

source("C:/Users/choerstm/Documents/core_PS113/Genomics/RScripts/general_clr_hellinger_transformation.R", encoding = "UTF-8")
ASV_16S_hellinger <- hellinger(ASV_16S,3)
ASV_18S_hellinger <- hellinger(ASV_18S,3)
ASV_16S_clr <- clr(ASV_16S,3)
## No. corrected values:  27
ASV_18S_clr <- clr(ASV_18S,3)
## No. corrected values:  138
#it'll print the number of removed rows for 16S and 18S, respectively

metadata z-scoring

z scoring unifies metadata by shifting the mean of all variables to 0 and the standard deviation to 1.

meta_16S.z <- meta_16S
meta_18S.z <- meta_18S
meta_16S.z[c(5:15,17, 21, 24,28,30, 31)] <- lapply(meta_16S.z[c(5:15,17, 21, 24,28,30, 31)], function(x) {y <-scale(x, center = TRUE, scale = TRUE)})
meta_18S.z[c(5:15,17, 21, 24,28,30, 31)] <- lapply(meta_18S.z[c(5:15,17, 21, 24,28,30, 31)], function(x) {y <-scale(x, center = TRUE, scale = TRUE)})
rownames(meta_16S.z) <- meta_16S.z$Event
rownames(meta_18S.z) <- meta_18S.z$Event

diversity calculations

iNEXT focuses on three measures of Hill numbers of order q: species richness (q = 0), Shannon diversity (q = 1, the exponential of Shannon entropy) and Simpson diversity (q = 2, the inverse of Simpson concentration). For each diversity measure, iNEXT uses the observed sample of abundance or incidence data (called the “reference sample”) to compute diversity estimates and the associated 95% confidence intervals for the following two types of rarefaction and extrapolation (R/E):

  1. Sample‐size‐based R/E sampling curves: iNEXT computes diversity estimates for rarefied and extrapolated samples up to an appropriate size. This type of sampling curve plots the diversity estimates with respect to sample size.

  2. Coverage‐based R/E sampling curves: iNEXT computes diversity estimates for rarefied and extrapolated samples with sample completeness (as measured by sample coverage) up to an appropriate coverage. This type of sampling curve plots the diversity estimates with respect to sample coverage.

iNEXT also plots the above two types of sampling curves and a sample completeness curve. The sample completeness curve provides a bridge between these two types of curves.

(Fig. A6)

correlation analyses

A key objective of this study was to investigate correlations between microbial diversity (reflected in richness q=0 and Inverse Simpson q=2, the latter to better reflect taxonomic evenness across a sample).

(Fig. A7, A8)

## 
##  Pearson's product-moment correlation
## 
## data:  meta.16S.div$Temp and meta.16S.div$Richness
## t = 2.5826, df = 9, p-value = 0.02957
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.08632694 0.90004306
## sample estimates:
##       cor 
## 0.6524163
## 
##  Pearson's product-moment correlation
## 
## data:  meta.16S.div$Temp and meta.16S.div$Simpson
## t = -0.32251, df = 9, p-value = 0.7544
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.6641759  0.5267633
## sample estimates:
##        cor 
## -0.1068872
## 
##  Pearson's product-moment correlation
## 
## data:  meta.16S.div$mean_c.fix and meta.16S.div$Simpson
## t = 1.1568, df = 9, p-value = 0.2771
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.3061684  0.7893074
## sample estimates:
##       cor 
## 0.3597859
## 
##  Pearson's product-moment correlation
## 
## data:  meta.16S.div$mean_N2fix and meta.16S.div$Simpson
## t = -2.9624, df = 9, p-value = 0.0159
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.9162977 -0.1776338
## sample estimates:
##        cor 
## -0.7026372
## Call:corr.test(x = meta.16S.div[c(8, 26, 28)], y = meta.16S.div$Simpson, 
##     method = "pearson", adjust = "holm")
## Correlation matrix 
##             [,1]
## Temp       -0.11
## C_umol.L    0.09
## mean_c.fix  0.36
## Sample Size 
## [1] 11
## Probability values  adjusted for multiple tests. 
##            [,1]
## Temp       1.00
## C_umol.L   1.00
## mean_c.fix 0.83
## 
##  To see confidence intervals of the correlations, print with the short=FALSE option
## 
##  Pearson's product-moment correlation
## 
## data:  meta.18S.div$Temp and meta.18S.div$Richness
## t = 4.9994, df = 10, p-value = 0.0005378
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.5266713 0.9555621
## sample estimates:
##       cor 
## 0.8451269
## 
##  Pearson's product-moment correlation
## 
## data:  meta.18S.div$Temp and meta.18S.div$Simpson
## t = 4.5414, df = 10, p-value = 0.001072
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.4664076 0.9480460
## sample estimates:
##       cor 
## 0.8206454
## 
##  Pearson's product-moment correlation
## 
## data:  meta.18S.div$mean_c.fix and meta.18S.div$Simpson
## t = -2.7942, df = 10, p-value = 0.01898
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.8956849 -0.1423436
## sample estimates:
##        cor 
## -0.6621531
## 
##  Pearson's product-moment correlation
## 
## data:  meta.18S.div$mean_N2fix and meta.18S.div$Simpson
## t = 3.4567, df = 10, p-value = 0.006157
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.2843266 0.9215218
## sample estimates:
##       cor 
## 0.7378321
## Call:corr.test(x = meta.18S.div[c(8, 26, 28)], y = meta.18S.div$Simpson, 
##     method = "pearson", adjust = "holm")
## Correlation matrix 
##             [,1]
## Temp        0.82
## C_umol.L   -0.49
## mean_c.fix -0.66
## Sample Size 
## [1] 12
## Probability values  adjusted for multiple tests. 
##            [,1]
## Temp       0.00
## C_umol.L   0.11
## mean_c.fix 0.04
## 
##  To see confidence intervals of the correlations, print with the short=FALSE option

gdm (general dissimilarity model)

another way of analysing the effect of environmental dissmilarity on microbial community dissimilarity is using a general dissimilarity model, since it is more flexible to non-linear effects.

(Fig. 4 c,d; Fig. 11)

source("D:/Kerguelen_project/analysis_with_R/Rscripts/gdm_environmental_distance.R", encoding = "UTF-8")
#decide input metadata for gdm analysis
gdm_input <- c("Event","Latitude","Longitude","Sal", "MLD..m.", "Temp", "oxygen", "NO3umol.l", "NH4_umol.l", "P_umol.l", "Si_umol.l", "chl.a_ug.l", "POCPN_ratio", "mean_c.fix", "mean_PB", "mean_N2fix")
print("prokaryotes")
## [1] "prokaryotes"
gdm_analysis(ASV_16S_hellinger, meta_16S[gdm_input])
## Warning: package 'gdm' was built under R version 4.0.5
## 
## To cite package 'gdm' in publications use:
## 
##   Matthew C. Fitzpatrick, Karel Mokany, Glenn Manion, Matthew Lisk,
##   Simon Ferrier and Diego Nieto-Lugilde (2021). gdm: Generalized
##   Dissimilarity Modeling. R package version 1.4.2.2.
##   https://CRAN.R-project.org/package=gdm
## 
## A BibTeX entry for LaTeX users is
## 
##   @Manual{,
##     title = {gdm: Generalized Dissimilarity Modeling},
##     author = {Matthew C. Fitzpatrick and Karel Mokany and Glenn Manion and Matthew Lisk and Simon Ferrier and Diego Nieto-Lugilde},
##     year = {2021},
##     note = {R package version 1.4.2.2},
##     url = {https://CRAN.R-project.org/package=gdm},
##   }
## 
## ATTENTION: This citation information has been auto-generated from the
## package DESCRIPTION file and may need manual editing, see
## 'help("citation")'.
## 
## [1] 
## [1] 
## [1] GDM Modelling Summary
## [1] Creation Date:  Sun May 09 17:50:54 2021
## [1] 
## [1] Name:  gdm.1
## [1] 
## [1] Data:  gdmTab
## [1] 
## [1] Samples:  55
## [1] 
## [1] Geographical distance used in model fitting?  TRUE
## [1] 
## [1] NULL Deviance:  9.96217539248304
## [1] GDM Deviance:  0.338783639265586
## [1] Percent Deviance Explained:  96.5993005953176
## [1] 
## [1] Intercept:  0.0215196942572105
## [1] 
## [1] Predictor 1: Geographic
## [1] Splines: 3
## [1] Min Knot: 2.92718633674046
## [1] 50% Knot: 17.7354532857212
## [1] Max Knot: 30.3392664092262
## [1] Coefficient[1]: 0
## [1] Coefficient[2]: 0
## [1] Coefficient[3]: 0
## [1] 
## [1] Predictor 2: Sal
## [1] Splines: 3
## [1] Min Knot: 33.73584
## [1] 50% Knot: 34.39721
## [1] Max Knot: 35.85502
## [1] Coefficient[1]: 0.192002997451449
## [1] Coefficient[2]: 0.363645361448136
## [1] Coefficient[3]: 0.172358116206037
## [1] 
## [1] Predictor 3: MLD..m.
## [1] Splines: 3
## [1] Min Knot: 7.945
## [1] 50% Knot: 49.576
## [1] Max Knot: 82.281
## [1] Coefficient[1]: 0
## [1] Coefficient[2]: 0
## [1] Coefficient[3]: 0.0608797779263614
## [1] 
## [1] Predictor 4: Temp
## [1] Splines: 3
## [1] Min Knot: 3.08
## [1] 50% Knot: 11.59
## [1] Max Knot: 27.38
## [1] Coefficient[1]: 0
## [1] Coefficient[2]: 0
## [1] Coefficient[3]: 0
## [1] 
## [1] Predictor 5: oxygen
## [1] Splines: 3
## [1] Min Knot: 203.99899
## [1] 50% Knot: 274.29599
## [1] Max Knot: 324.05301
## [1] Coefficient[1]: 0
## [1] Coefficient[2]: 0.0710671106952922
## [1] Coefficient[3]: 0.635253969815272
## [1] 
## [1] Predictor 6: NO3umol.l
## [1] Splines: 3
## [1] Min Knot: 0
## [1] 50% Knot: 9.53
## [1] Max Knot: 25.585
## [1] Coefficient[1]: 0
## [1] Coefficient[2]: 0
## [1] Coefficient[3]: 0
## [1] 
## [1] Predictor 7: NH4_umol.l
## [1] Splines: 3
## [1] Min Knot: 0
## [1] 50% Knot: 0.08
## [1] Max Knot: 1.43
## [1] Coefficient[1]: 0
## [1] Coefficient[2]: 0
## [1] Coefficient[3]: 0
## [1] 
## [1] Predictor 8: P_umol.l
## [1] Splines: 3
## [1] Min Knot: 0.04
## [1] 50% Knot: 0.8
## [1] Max Knot: 1.705
## [1] Coefficient[1]: 0.219554335438145
## [1] Coefficient[2]: 0
## [1] Coefficient[3]: 0
## [1] 
## [1] Predictor 9: Si_umol.l
## [1] Splines: 3
## [1] Min Knot: 0
## [1] 50% Knot: 1.29
## [1] Max Knot: 27.34
## [1] Coefficient[1]: 0.191369066479726
## [1] Coefficient[2]: 0
## [1] Coefficient[3]: 0.0206927190764605
## [1] 
## [1] Predictor 10: chl.a_ug.l
## [1] Splines: 3
## [1] Min Knot: 0.48725
## [1] 50% Knot: 2.2327
## [1] Max Knot: 4.96042
## [1] Coefficient[1]: 0.196728675054061
## [1] Coefficient[2]: 0
## [1] Coefficient[3]: 0.256623591273278
## [1] 
## [1] Predictor 11: POCPN_ratio
## [1] Splines: 3
## [1] Min Knot: 5.90375923981978
## [1] 50% Knot: 6.17551726614751
## [1] Max Knot: 8.44946949997637
## [1] Coefficient[1]: 0.414586603405584
## [1] Coefficient[2]: 0.0983292595546666
## [1] Coefficient[3]: 0.127969937128137
## [1] 
## [1] Predictor 12: mean_c.fix
## [1] Splines: 3
## [1] Min Knot: 190.3836
## [1] 50% Knot: 1541.95352
## [1] Max Knot: 3395.06772
## [1] Coefficient[1]: 0.636704365876361
## [1] Coefficient[2]: 0.00612600848050092
## [1] Coefficient[3]: 0
## [1] 
## [1] Predictor 13: mean_PB
## [1] Splines: 3
## [1] Min Knot: 477.38724318191
## [1] 50% Knot: 665.104094312654
## [1] Max Knot: 1923.75820626583
## [1] Coefficient[1]: 0
## [1] Coefficient[2]: 0
## [1] Coefficient[3]: 0
## [1] 
## [1] Predictor 14: mean_N2fix
## [1] Splines: 3
## [1] Min Knot: 0.23895
## [1] 50% Knot: 1.58218833333333
## [1] Max Knot: 4.038265
## [1] Coefficient[1]: 0
## [1] Coefficient[2]: 0.111875297133606
## [1] Coefficient[3]: 0
## [1] 
## List of 15
##  $ dataname    : symbol gdmTab
##  $ geo         : logi TRUE
##  $ sample      : int 55
##  $ gdmdeviance : num 0.339
##  $ nulldeviance: num 9.96
##  $ explained   : num 96.6
##  $ intercept   : num 0.0215
##  $ predictors  : chr [1:14] "Geographic" "Sal" "MLD..m." "Temp" ...
##  $ coefficients: num [1:42] 0 0 0 0.192 0.364 ...
##  $ knots       : num [1:42] 2.93 17.74 30.34 33.74 34.4 ...
##  $ splines     : num [1:14] 3 3 3 3 3 3 3 3 3 3 ...
##  $ creationdate: chr "Sun May 09 17:50:54 2021"
##  $ observed    : num [1:55] 0.685 0.796 0.921 0.925 0.908 ...
##  $ predicted   : num [1:55] 0.67 0.816 0.93 0.911 0.876 ...
##  $ ecological  : num [1:55] 1.11 1.69 2.66 2.42 2.09 ...
##  - attr(*, "class")= chr [1:2] "gdm" "list"
## NULL

## List of 2
##  $ x: num [1:200, 1:14] 2.93 3.06 3.2 3.34 3.48 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : NULL
##   .. ..$ : chr [1:14] "Geographic" "Sal" "MLD..m." "Temp" ...
##  $ y: num [1:200, 1:14] 0 0 0 0 0 0 0 0 0 0 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : NULL
##   .. ..$ : chr [1:14] "Geographic" "Sal" "MLD..m." "Temp" ...
## NULL
## NULL
## NULL
print("eukaryotes")
## [1] "eukaryotes"
gdm_analysis(ASV_18S_hellinger, meta_18S[gdm_input])
## Warning: package 'gdm' was built under R version 4.0.5

## 
## To cite package 'gdm' in publications use:
## 
##   Matthew C. Fitzpatrick, Karel Mokany, Glenn Manion, Matthew Lisk,
##   Simon Ferrier and Diego Nieto-Lugilde (2021). gdm: Generalized
##   Dissimilarity Modeling. R package version 1.4.2.2.
##   https://CRAN.R-project.org/package=gdm
## 
## A BibTeX entry for LaTeX users is
## 
##   @Manual{,
##     title = {gdm: Generalized Dissimilarity Modeling},
##     author = {Matthew C. Fitzpatrick and Karel Mokany and Glenn Manion and Matthew Lisk and Simon Ferrier and Diego Nieto-Lugilde},
##     year = {2021},
##     note = {R package version 1.4.2.2},
##     url = {https://CRAN.R-project.org/package=gdm},
##   }
## 
## ATTENTION: This citation information has been auto-generated from the
## package DESCRIPTION file and may need manual editing, see
## 'help("citation")'.
## 
## [1] 
## [1] 
## [1] GDM Modelling Summary
## [1] Creation Date:  Sun May 09 17:50:55 2021
## [1] 
## [1] Name:  gdm.1
## [1] 
## [1] Data:  gdmTab
## [1] 
## [1] Samples:  66
## [1] 
## [1] Geographical distance used in model fitting?  TRUE
## [1] 
## [1] NULL Deviance:  10.0922602806298
## [1] GDM Deviance:  0.681693129804237
## [1] Percent Deviance Explained:  93.2453869514977
## [1] 
## [1] Intercept:  0.288371586039048
## [1] 
## [1] Predictor 1: Geographic
## [1] Splines: 3
## [1] Min Knot: 5.03513663071818
## [1] 50% Knot: 17.7298314879299
## [1] Max Knot: 30.3392664092262
## [1] Coefficient[1]: 0
## [1] Coefficient[2]: 0.207551168078672
## [1] Coefficient[3]: 0
## [1] 
## [1] Predictor 2: Sal
## [1] Splines: 3
## [1] Min Knot: 33.73584
## [1] 50% Knot: 34.501835
## [1] Max Knot: 35.85502
## [1] Coefficient[1]: 0.210265649532649
## [1] Coefficient[2]: 0
## [1] Coefficient[3]: 0
## [1] 
## [1] Predictor 3: MLD..m.
## [1] Splines: 3
## [1] Min Knot: 0
## [1] 50% Knot: 48.566
## [1] Max Knot: 82.281
## [1] Coefficient[1]: 0.0413560912989392
## [1] Coefficient[2]: 0.433302265916552
## [1] Coefficient[3]: 0.0207487042140491
## [1] 
## [1] Predictor 4: Temp
## [1] Splines: 3
## [1] Min Knot: 3.08
## [1] 50% Knot: 10.87
## [1] Max Knot: 27.38
## [1] Coefficient[1]: 0
## [1] Coefficient[2]: 0.400610092223674
## [1] Coefficient[3]: 0
## [1] 
## [1] Predictor 5: oxygen
## [1] Splines: 3
## [1] Min Knot: 203.99899
## [1] 50% Knot: 280.792995
## [1] Max Knot: 324.05301
## [1] Coefficient[1]: 0
## [1] Coefficient[2]: 0
## [1] Coefficient[3]: 0.146795437409257
## [1] 
## [1] Predictor 6: NO3umol.l
## [1] Splines: 3
## [1] Min Knot: 0
## [1] 50% Knot: 10.745
## [1] Max Knot: 25.585
## [1] Coefficient[1]: 0
## [1] Coefficient[2]: 0
## [1] Coefficient[3]: 0
## [1] 
## [1] Predictor 7: NH4_umol.l
## [1] Splines: 3
## [1] Min Knot: 0
## [1] 50% Knot: 0.105
## [1] Max Knot: 1.43
## [1] Coefficient[1]: 0
## [1] Coefficient[2]: 0
## [1] Coefficient[3]: 0
## [1] 
## [1] Predictor 8: P_umol.l
## [1] Splines: 3
## [1] Min Knot: 0.04
## [1] 50% Knot: 0.845
## [1] Max Knot: 1.705
## [1] Coefficient[1]: 0.413494785996043
## [1] Coefficient[2]: 0.212583789962081
## [1] Coefficient[3]: 0
## [1] 
## [1] Predictor 9: Si_umol.l
## [1] Splines: 3
## [1] Min Knot: 0
## [1] 50% Knot: 1.3275
## [1] Max Knot: 27.34
## [1] Coefficient[1]: 0.471212795627758
## [1] Coefficient[2]: 0
## [1] Coefficient[3]: 0.389875728821161
## [1] 
## [1] Predictor 10: chl.a_ug.l
## [1] Splines: 3
## [1] Min Knot: 0.4012
## [1] 50% Knot: 1.998755
## [1] Max Knot: 4.96042
## [1] Coefficient[1]: 0.121823492050891
## [1] Coefficient[2]: 0
## [1] Coefficient[3]: 0.351256310665088
## [1] 
## [1] Predictor 11: POCPN_ratio
## [1] Splines: 3
## [1] Min Knot: 5.90375923981978
## [1] 50% Knot: 6.1985956585102
## [1] Max Knot: 8.44946949997637
## [1] Coefficient[1]: 0.0283205827227883
## [1] Coefficient[2]: 0
## [1] Coefficient[3]: 0.13814288024884
## [1] 
## [1] Predictor 12: mean_c.fix
## [1] Splines: 3
## [1] Min Knot: 190.3836
## [1] 50% Knot: 1363.77054
## [1] Max Knot: 3395.06772
## [1] Coefficient[1]: 0.169528177183697
## [1] Coefficient[2]: 0
## [1] Coefficient[3]: 0
## [1] 
## [1] Predictor 13: mean_PB
## [1] Splines: 3
## [1] Min Knot: 477.38724318191
## [1] 50% Knot: 786.977552148935
## [1] Max Knot: 1923.75820626583
## [1] Coefficient[1]: 0.152837160870431
## [1] Coefficient[2]: 0
## [1] Coefficient[3]: 0
## [1] 
## [1] Predictor 14: mean_N2fix
## [1] Splines: 3
## [1] Min Knot: 0.23895
## [1] 50% Knot: 1.3156875
## [1] Max Knot: 4.038265
## [1] Coefficient[1]: 0
## [1] Coefficient[2]: 0.0149004109553122
## [1] Coefficient[3]: 0.177511227651542
## [1] 
## List of 15
##  $ dataname    : symbol gdmTab
##  $ geo         : logi TRUE
##  $ sample      : int 66
##  $ gdmdeviance : num 0.682
##  $ nulldeviance: num 10.1
##  $ explained   : num 93.2
##  $ intercept   : num 0.288
##  $ predictors  : chr [1:14] "Geographic" "Sal" "MLD..m." "Temp" ...
##  $ coefficients: num [1:42] 0 0.208 0 0.21 0 ...
##  $ knots       : num [1:42] 5.04 17.73 30.34 33.74 34.5 ...
##  $ splines     : num [1:14] 3 3 3 3 3 3 3 3 3 3 ...
##  $ creationdate: chr "Sun May 09 17:50:55 2021"
##  $ observed    : num [1:66] 0.926 0.903 0.948 0.943 0.929 ...
##  $ predicted   : num [1:66] 0.917 0.92 0.93 0.939 0.92 ...
##  $ ecological  : num [1:66] 2.49 2.53 2.66 2.8 2.52 ...
##  - attr(*, "class")= chr [1:2] "gdm" "list"
## NULL

## List of 2
##  $ x: num [1:200, 1:14] 5.04 5.16 5.29 5.42 5.54 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : NULL
##   .. ..$ : chr [1:14] "Geographic" "Sal" "MLD..m." "Temp" ...
##  $ y: num [1:200, 1:14] 0.00 1.03e-05 4.14e-05 9.31e-05 1.65e-04 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : NULL
##   .. ..$ : chr [1:14] "Geographic" "Sal" "MLD..m." "Temp" ...
## NULL
## NULL
## NULL

multivariate analysis

RDA and PERMANOVA test

We perfromed PCA analyses on metadata and excluded data which were correlating with other variables. Input variables are listed uder “RDA input” We checked whether residulas of metadata were normally distributed and used the ones for further analysis in RDA

Significances of differences were tested using PERMANOVA

(Fig. 4 a,b; Fig. A9)

source("D:/Kerguelen_project/analysis_with_R/Rscripts/actual_multivariate_test.R", encoding = "UTF-8")
## Loading required package: ggvegan
## Warning in library(package, lib.loc = lib.loc, character.only = TRUE,
## logical.return = TRUE, : there is no package called 'ggvegan'
## Loading required package: vegan
## This is vegan 2.5-7
## Loading required package: rgr
## Loading required package: fastICA
## 
## Attaching package: 'rgr'
## The following object is masked _by_ '.GlobalEnv':
## 
##     clr
## The following object is masked from 'package:dplyr':
## 
##     syms
## Loading required package: ggplot2
## 
## Attaching package: 'ggplot2'
## The following object is masked from 'package:rgr':
## 
##     syms
## Loading required package: phyloseq
## 
## Attaching package: 'phyloseq'
## The following object is masked from 'package:raster':
## 
##     distance
## Loading required package: adespatial
## Registered S3 methods overwritten by 'adegraphics':
##   method         from
##   biplot.dudi    ade4
##   kplot.foucart  ade4
##   kplot.mcoa     ade4
##   kplot.mfa      ade4
##   kplot.pta      ade4
##   kplot.sepan    ade4
##   kplot.statis   ade4
##   scatter.coa    ade4
##   scatter.dudi   ade4
##   scatter.nipals ade4
##   scatter.pco    ade4
##   score.acm      ade4
##   score.mix      ade4
##   score.pca      ade4
##   screeplot.dudi ade4
## Registered S3 method overwritten by 'gdata':
##   method         from     
##   reorder.factor DescTools
## Registered S3 method overwritten by 'spdep':
##   method   from
##   plot.mst ape
## Registered S3 methods overwritten by 'adespatial':
##   method             from       
##   plot.multispati    adegraphics
##   print.multispati   ade4       
##   summary.multispati ade4
## Loading required package: geodist
RDA_input <- c("Event","Temp","oxygen", "NO3umol.l", "NH4_umol.l","Si_umol.l", "chl.a_ug.l","mean_c.fix", "mean_N2fix")
RDA_analysis(ASV_16S_clr, ASV_16S_hellinger, meta_16S.z, meta_16S.z[RDA_input])
## Call: rda(formula = abundance.clr.t ~ Temp + oxygen + NO3umol.l +
## NH4_umol.l + Si_umol.l + chl.a_ug.l + mean_c.fix + mean_N2fix, data =
## meta.sub)
## 
##                 Inertia Proportion Rank
## Total         2775.5079     1.0000     
## Constrained   2490.2661     0.8972    8
## Unconstrained  285.2417     0.1028    2
## Inertia is variance 
## 
## Eigenvalues for constrained axes:
##   RDA1   RDA2   RDA3   RDA4   RDA5   RDA6   RDA7   RDA8 
## 1268.8  601.4  234.0  116.4   89.8   86.4   66.8   26.6 
## 
## Eigenvalues for unconstrained axes:
##    PC1    PC2 
## 160.86 124.39
## Permutation test for rda under reduced model
## Permutation: free
## Number of permutations: 999
## 
## Model: rda(formula = abundance.clr.t ~ Temp + oxygen + NO3umol.l + NH4_umol.l + Si_umol.l + chl.a_ug.l + mean_c.fix + mean_N2fix, data = meta.sub)
##          Df Variance      F Pr(>F)  
## Model     8  2490.27 2.1826  0.071 .
## Residual  2   285.24                
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## [1] 0.897229
## Permutation test for adonis under reduced model
## Terms added sequentially (first to last)
## Permutation: free
## Number of permutations: 999
## 
## adonis2(formula = abundance.hellinger.t ~ mean_c.fix, data = meta, method = "bray")
##            Df SumOfSqs     R2      F Pr(>F)  
## mean_c.fix  1  0.89007 0.3152 4.1426  0.011 *
## Residual    9  1.93371 0.6848                
## Total      10  2.82378 1.0000                
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Permutation test for adonis under reduced model
## Terms added sequentially (first to last)
## Permutation: free
## Number of permutations: 999
## 
## adonis2(formula = abundance.hellinger.t ~ mean_N2fix, data = meta, method = "bray")
##            Df SumOfSqs      R2      F Pr(>F)  
## mean_N2fix  1  0.56611 0.20048 2.2567  0.074 .
## Residual    9  2.25767 0.79952                
## Total      10  2.82378 1.00000                
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Permutation test for adonis under reduced model
## Terms added sequentially (first to last)
## Permutation: free
## Number of permutations: 999
## 
## adonis2(formula = abundance.hellinger.t ~ province, data = meta, method = "bray")
##          Df SumOfSqs      R2      F Pr(>F)    
## province  2   1.7729 0.62784 6.7481  0.001 ***
## Residual  8   1.0509 0.37216                  
## Total    10   2.8238 1.00000                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Permutation test for adonis under reduced model
## Terms added sequentially (first to last)
## Permutation: free
## Number of permutations: 999
## 
## adonis2(formula = abundance.hellinger.t ~ WM, data = meta, method = "bray")
##          Df SumOfSqs      R2      F Pr(>F)    
## WM        4  2.09517 0.74197 4.3133  0.001 ***
## Residual  6  0.72861 0.25803                  
## Total    10  2.82378 1.00000                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

RDA_analysis(ASV_18S_clr, ASV_18S_hellinger, meta_18S.z, meta_18S.z[RDA_input])
## Call: rda(formula = abundance.clr.t ~ Temp + oxygen + NO3umol.l +
## NH4_umol.l + Si_umol.l + chl.a_ug.l + mean_c.fix + mean_N2fix, data =
## meta.sub)
## 
##                 Inertia Proportion Rank
## Total         4648.3774     1.0000     
## Constrained   3757.7714     0.8084    8
## Unconstrained  890.6060     0.1916    3
## Inertia is variance 
## 
## Eigenvalues for constrained axes:
##   RDA1   RDA2   RDA3   RDA4   RDA5   RDA6   RDA7   RDA8 
## 1813.0  635.6  429.0  291.0  216.8  148.5  147.5   76.5 
## 
## Eigenvalues for unconstrained axes:
##   PC1   PC2   PC3 
## 448.6 330.3 111.8
## Permutation test for rda under reduced model
## Permutation: free
## Number of permutations: 999
## 
## Model: rda(formula = abundance.clr.t ~ Temp + oxygen + NO3umol.l + NH4_umol.l + Si_umol.l + chl.a_ug.l + mean_c.fix + mean_N2fix, data = meta.sub)
##          Df Variance      F Pr(>F)  
## Model     8   3757.8 1.5823  0.091 .
## Residual  3    890.6                
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## [1] 0.808405
## Permutation test for adonis under reduced model
## Terms added sequentially (first to last)
## Permutation: free
## Number of permutations: 999
## 
## adonis2(formula = abundance.hellinger.t ~ mean_c.fix, data = meta, method = "bray")
##            Df SumOfSqs      R2      F Pr(>F)   
## mean_c.fix  1   1.0095 0.27315 3.7581  0.006 **
## Residual   10   2.6862 0.72685                 
## Total      11   3.6957 1.00000                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Permutation test for adonis under reduced model
## Terms added sequentially (first to last)
## Permutation: free
## Number of permutations: 999
## 
## adonis2(formula = abundance.hellinger.t ~ mean_N2fix, data = meta, method = "bray")
##            Df SumOfSqs      R2     F Pr(>F)
## mean_N2fix  1   0.4957 0.13412 1.549  0.125
## Residual   10   3.2000 0.86588             
## Total      11   3.6957 1.00000             
## Permutation test for adonis under reduced model
## Terms added sequentially (first to last)
## Permutation: free
## Number of permutations: 999
## 
## adonis2(formula = abundance.hellinger.t ~ province, data = meta, method = "bray")
##          Df SumOfSqs      R2      F Pr(>F)    
## province  2   2.0031 0.54201 5.3256  0.001 ***
## Residual  9   1.6926 0.45799                  
## Total    11   3.6957 1.00000                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Permutation test for adonis under reduced model
## Terms added sequentially (first to last)
## Permutation: free
## Number of permutations: 999
## 
## adonis2(formula = abundance.hellinger.t ~ WM, data = meta, method = "bray")
##          Df SumOfSqs      R2      F Pr(>F)    
## WM        4   2.4801 0.67109 3.5706  0.001 ***
## Residual  7   1.2155 0.32891                  
## Total    11   3.6957 1.00000                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

upset plots

alternative to venn diagram but gives more information through set size and intersection size. Yet, it is only a vague estimate in our analysis, since provinces represent sum of all ASVs detected across samples but not neccessarily occur in other samples. Thus, upset plots should only be performed when differences between provinces were significant (in PERMANOVA) How well each sample fit the sum of its province is summarised below

Reference: https://jokergoo.github.io/ComplexHeatmap-reference/book/upset-plot.html

(Fig. 4 e,f)

source("D:/Kerguelen_project/analysis_with_R/Rscripts/upsetR.R", encoding = "UTF-8")
upset_16Splots(ASV_16S)
## 
## Attaching package: 'UpSetR'
## The following object is masked from 'package:lattice':
## 
##     histogram
## Loading required package: usethis
## 
## Attaching package: 'devtools'
## The following object is masked from 'package:permute':
## 
##     check
## ========================================
## ComplexHeatmap version 2.6.2
## Bioconductor page: http://bioconductor.org/packages/ComplexHeatmap/
## Github page: https://github.com/jokergoo/ComplexHeatmap
## Documentation: http://jokergoo.github.io/ComplexHeatmap-reference
## 
## If you use it in published research, please cite:
## Gu, Z. Complex heatmaps reveal patterns and correlations in multidimensional 
##   genomic data. Bioinformatics 2016.
## 
## This message can be suppressed by:
##   suppressPackageStartupMessages(library(ComplexHeatmap))
## ========================================
upset_18Splots(ASV_18S)

Table A5

because it is biased you can check how well each sample is represented in the combined sample. You can only use this analysis if the PERMANOVA is actually significant

for the analysis I choose union mode: 1 means in that set and 0 is not taken into account. When there are multiple 1, the relationship is OR. Then, 1 1 0 means a set of elements in set A or B, and they can also in C or not in C (union(A, B)). Under this mode, the seven combination sets can overlap.

16S

SO S37 - 0 1 2 520 237 218 - “of total presence (455) : 48% S11 - 0 1 2 520 237 218 -”of total presence (455) : 48% SE - 0 1 2 520 202 253 - “of total presence (455) : 56% S7 - 0 1 2 520 214 241 -”of total presence (455) : 53% S9 - 0 1 2 520 228 227 - “of total presence (455) : 50% S10 - 0 1 2 520 201 254 -”of total presence (455) : 56%

SSTC S4 - 0 1 2 456 218 301 - “of total presence (519) : 58% S14 - 0 1 2 456 200 319 -”of total presence (519) : 61% S15 - 0 1 2 456 184 335 - "of total presence (519) : 65%

ISSG S3 - 0 1 2 418 188 369 - “of total presence (557) : 66% S2 - 0 1 2 418 295 262 -”of total presence (557) : 47% S18 - 0 1 2 418 254 303 - "of total presence (557) : 54%

18S

SO S37 - 0 1 2 1923 439 139 - “of total presence (578) : 24% S11 - 0 1 2 1923 342 236 -”of total presence (578) : 41% SE - 0 1 2 1923 216 362 - “of total presence (578) : 63% S7 - 0 1 2 1923 275 303 -”of total presence (578) : 52% S9 - 0 1 2 1923 229 349 - "of total presence (578) : 60%

SSTC S4 - 0 1 2 1917 263 321 - “of total presence (584) : 55% S15 - 0 1 2 1917 145 439 -”of total presence (584) : 75%

ISSG S3 - 0 1 2 586 852 1063 - “of total presence (1915) : 56% S2 - 0 1 2 586 1031 884 -”of total presence (1915) : 46% S16 - 0 1 2 586 1325 590 - “of total presence (1915) : 31% S18 - 0 1 2 586 1002 913 -”of total presence (1915) : 48%

SIMPER

when differences between provinces are significant it is valid to perform a SIMilarity PERcentage (SIMPER) analysis to see which entities contribute most to the observed dissimilarity (Clarke 1993)

[1] SIMPER analysis 16S
SSTC vs SO
otu contribution ASV kingdom phylum class genus family species
>ASV_8 -1.254761e+12 >ASV_8 Bacteria Proteobacteria Alphaproteobacteria SAR11 clade Clade II
>ASV_149 -1.608580e+12 >ASV_149 Bacteria Proteobacteria Alphaproteobacteria Sphingomonadales Sphingomonadaceae Novosphingobium
>ASV_71 -1.883222e+12 >ASV_71 Bacteria Proteobacteria Alphaproteobacteria SAR11 clade Clade II
>ASV_105 -1.946567e+12 >ASV_105 Bacteria Proteobacteria Alphaproteobacteria Rhodobacterales Rhodobacteraceae uncultured
>ASV_146 -1.946567e+12 >ASV_146 Bacteria Bacteroidetes Bacteroidia Flavobacteriales Flavobacteriaceae NS5 marine group
>ASV_156 -1.946567e+12 >ASV_156 Bacteria Cyanobacteria Oxyphotobacteria Synechococcales Cyanobiaceae Prochlorococcus MIT9313
>ASV_157 -1.946567e+12 >ASV_157 Bacteria Bacteroidetes Bacteroidia Cytophagales Cyclobacteriaceae Marinoscillum
>ASV_173 -1.946567e+12 >ASV_173 Bacteria Proteobacteria Gammaproteobacteria Cellvibrionales Porticoccaceae SAR92 clade
>ASV_223 -1.946567e+12 >ASV_223 Archaea Euryarchaeota Thermoplasmata Marine Group II
>ASV_233 -1.946567e+12 >ASV_233 Bacteria Proteobacteria Gammaproteobacteria SAR86 clade
ISSG vs SO
otu contribution ASV kingdom phylum class genus family species
>ASV_81 6.951505e+12 >ASV_81 Bacteria Proteobacteria Gammaproteobacteria Betaproteobacteriales Methylophilaceae OM43 clade
>ASV_137 6.654805e+12 >ASV_137 Bacteria Proteobacteria Gammaproteobacteria Thiotrichales Thiotrichaceae uncultured
>ASV_176 6.613872e+12 >ASV_176 Bacteria Proteobacteria Alphaproteobacteria Puniceispirillales SAR116 clade
>ASV_218 6.577739e+12 >ASV_218 Bacteria Proteobacteria Alphaproteobacteria Puniceispirillales SAR116 clade
>ASV_88 6.446653e+12 >ASV_88 Bacteria Bacteroidetes Bacteroidia Flavobacteriales NS7 marine group
>ASV_230 6.218666e+12 >ASV_230 Bacteria Proteobacteria Alphaproteobacteria Puniceispirillales SAR116 clade
>ASV_257 6.204857e+12 >ASV_257 Bacteria Bacteroidetes Bacteroidia Flavobacteriales Flavobacteriaceae NS5 marine group
>ASV_322 6.183627e+12 >ASV_322 Bacteria Proteobacteria Alphaproteobacteria Rhodobacterales Rhodobacteraceae Rhodovulum
>ASV_186 6.166903e+12 >ASV_186 Bacteria Proteobacteria Gammaproteobacteria Alteromonadales Alteromonadaceae Alteromonas
>ASV_141 6.127317e+12 >ASV_141 Bacteria Proteobacteria Alphaproteobacteria SAR11 clade Clade IV
ISSG vs SSTC
otu contribution ASV kingdom phylum class genus family species
>ASV_304 1.038439e+13 >ASV_304 Bacteria Bacteroidetes Bacteroidia Flavobacteriales Flavobacteriaceae NS2b marine group
>ASV_348 1.001642e+13 >ASV_348 Bacteria Bacteroidetes Bacteroidia Flavobacteriales Flavobacteriaceae NS4 marine group
>ASV_390 9.719405e+12 >ASV_390 Bacteria Bacteroidetes Bacteroidia Flavobacteriales Cryomorphaceae uncultured
>ASV_421 9.449232e+12 >ASV_421 Bacteria Proteobacteria Gammaproteobacteria SAR86 clade
>ASV_426 9.366773e+12 >ASV_426 Bacteria Proteobacteria Gammaproteobacteria Betaproteobacteriales Methylophilaceae OM43 clade
>ASV_550 8.424683e+12 >ASV_550 Bacteria Cyanobacteria Oxyphotobacteria Synechococcales Cyanobiaceae Prochlorococcus MIT9313
>ASV_475 7.989456e+12 >ASV_475 Bacteria Bacteroidetes Bacteroidia Flavobacteriales Flavobacteriaceae NS5 marine group
>ASV_618 7.989456e+12 >ASV_618 Bacteria Proteobacteria Gammaproteobacteria SAR86 clade
>ASV_645 7.767083e+12 >ASV_645 Bacteria Proteobacteria Gammaproteobacteria Cellvibrionales Halieaceae OM60(NOR5) clade
>ASV_663 7.645234e+12 >ASV_663 Bacteria Proteobacteria Deltaproteobacteria PB19
[1] SIMPER analysis 18S
SSTC vs SO
otu contribution ASV Confidence Taxon clade kingdom phylum genus class family species
ASV_48 8.868194e+12 ASV_48 1.0000000 Eukaryota Alveolata Dinoflagellata Dinophyceae Gonyaulacales Gonyaulacaceae Gonyaulax
ASV_31 8.529136e+12 ASV_31 0.8023807 Eukaryota Stramenopiles Ochrophyta Bacillariophyta Bacillariophyta_X Raphid-pennate Fragilariopsis
ASV_53 8.519608e+12 ASV_53 0.9999952 Eukaryota Alveolata Dinoflagellata Syndiniales Dino-Group-I Dino-Group-I-Clade-1 Dino-Group-I-Clade-1_X Dino-Group-I-Clade-1_X_sp.
ASV_11 8.314783e+12 ASV_11 0.9999997 Eukaryota Alveolata Dinoflagellata Syndiniales Dino-Group-I Dino-Group-I-Clade-1 Dino-Group-I-Clade-1_X Dino-Group-I-Clade-1_X_sp.
ASV_76 8.230866e+12 ASV_76 0.9417963 Eukaryota Alveolata Dinoflagellata Dinophyceae Dinophyceae_X Dinophyceae_XX Dinophyceae_XXX Dinophyceae_XXX_sp.
ASV_50 8.223533e+12 ASV_50 0.9928710 Eukaryota Alveolata Dinoflagellata Dinophyceae Dinophyceae_X Dinophyceae_XX Dinophyceae_XXX Dinophyceae_XXX_sp.
ASV_22 8.192052e+12 ASV_22 0.9361058 Eukaryota Alveolata Dinoflagellata Dinophyceae Dinophyceae_X Dinophyceae_XX Dinophyceae_XXX Dinophyceae_XXX_sp.
ASV_94 8.182193e+12 ASV_94 0.9563041 Eukaryota Archaeplastida Chlorophyta Chloropicophyceae Chloropicales Chloropicaceae Chloroparvula Chloroparvula_pacifica
ASV_51 7.943274e+12 ASV_51 0.9992372 Eukaryota Alveolata Dinoflagellata Dinophyceae Dinophyceae_X Dinophyceae_XX Dinophyceae_XXX Dinophyceae_XXX_sp.
ASV_170 7.862158e+12 ASV_170 0.9707207 Eukaryota Alveolata Dinoflagellata Dinophyceae Dinophyceae_X Dinophyceae_XX Dinophyceae_XXX Dinophyceae_XXX_sp.
ISSG vs SO
otu contribution ASV Confidence Taxon clade kingdom phylum genus class family species
ASV_33 3.672376e+13 ASV_33 1.0000000 Eukaryota Stramenopiles Ochrophyta Bacillariophyta Bacillariophyta_X Polar-centric-Mediophyceae Chaetoceros
ASV_10 3.621182e+13 ASV_10 0.9994667 Eukaryota Hacrobia Haptophyta Prymnesiophyceae Phaeocystales Phaeocystaceae Phaeocystis Phaeocystis_rex
ASV_25 3.620973e+13 ASV_25 0.9997619 Eukaryota Stramenopiles Ochrophyta Bacillariophyta Bacillariophyta_X Radial-centric-basal-Coscinodiscophyceae Actinocyclus Actinocyclus_curvatulus
ASV_134 3.465054e+13 ASV_134 0.9999991 Eukaryota Alveolata Dinoflagellata Syndiniales Dino-Group-I Dino-Group-I-Clade-1 Dino-Group-I-Clade-1_X Dino-Group-I-Clade-1_X_sp.
ASV_78 3.413063e+13 ASV_78 1.0000000 Eukaryota
ASV_31 3.354308e+13 ASV_31 0.8023807 Eukaryota Stramenopiles Ochrophyta Bacillariophyta Bacillariophyta_X Raphid-pennate Fragilariopsis
ASV_26 3.297447e+13 ASV_26 0.9994394 Eukaryota Hacrobia Haptophyta Prymnesiophyceae Phaeocystales Phaeocystaceae Phaeocystis Phaeocystis_rex
ASV_217 3.289846e+13 ASV_217 0.9317330 Eukaryota Hacrobia Haptophyta Prymnesiophyceae Prymnesiales Chrysochromulinaceae Chrysochromulina
ASV_206 3.284959e+13 ASV_206 0.8823163 Eukaryota Hacrobia Picozoa Picozoa_X Picozoa_XX Picozoa_XXX Picomonas Picomonas_judraskeda
ASV_18 3.283145e+13 ASV_18 0.9999653 Eukaryota Stramenopiles Ochrophyta Bacillariophyta Bacillariophyta_X Raphid-pennate
ISSG vs SSTC
otu contribution ASV Confidence Taxon clade kingdom phylum genus class family species
ASV_23 1.127798e+13 ASV_23 0.9891629 Eukaryota Archaeplastida Chlorophyta Chloropicophyceae Chloropicales Chloropicaceae Chloroparvula Chloroparvula_pacifica
ASV_94 1.003371e+13 ASV_94 0.9563041 Eukaryota Archaeplastida Chlorophyta Chloropicophyceae Chloropicales Chloropicaceae Chloroparvula Chloroparvula_pacifica
ASV_170 9.797047e+12 ASV_170 0.9707207 Eukaryota Alveolata Dinoflagellata Dinophyceae Dinophyceae_X Dinophyceae_XX Dinophyceae_XXX Dinophyceae_XXX_sp.
ASV_176 9.153583e+12 ASV_176 1.0000000 Eukaryota
ASV_162 9.111550e+12 ASV_162 0.9542062 Eukaryota Alveolata Dinoflagellata Dinophyceae Dinophyceae_X Dinophyceae_XX Dinophyceae_XXX Dinophyceae_XXX_sp.
ASV_2 8.983205e+12 ASV_2 0.9975556 Eukaryota Alveolata Dinoflagellata Dinophyceae Dinophyceae_X Dinophyceae_XX Dinophyceae_XXX Dinophyceae_XXX_sp.
ASV_66 8.840453e+12 ASV_66 0.7037306 Eukaryota Stramenopiles Ochrophyta Bacillariophyta Bacillariophyta_X Polar-centric-Mediophyceae
ASV_18 8.792462e+12 ASV_18 0.9999653 Eukaryota Stramenopiles Ochrophyta Bacillariophyta Bacillariophyta_X Raphid-pennate
ASV_5 8.757161e+12 ASV_5 0.9999997 Eukaryota Archaeplastida Chlorophyta Palmophyllophyceae Prasinococcales Prasinococcales-Clade-B Prasinoderma Prasinoderma_singularis
ASV_375 8.753520e+12 ASV_375 0.9423032 Eukaryota Alveolata Dinoflagellata Dinophyceae Gonyaulacales Pyrophacaceae Fragilidium
## 
##  Welch Two Sample t-test
## 
## data:  Bacteroidetes_SO$Abundance and Bacteroidetes_ISSG$Abundance
## t = 4.5828, df = 396.05, p-value = 6.157e-06
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.3565637 0.8923132
## sample estimates:
## mean of x mean of y 
##  3.285904  2.661465 
## 
## 
##  Welch Two Sample t-test
## 
## data:  Cyano_SO$Abundance and Cyano_ISSG$Abundance
## t = -4.7419, df = 42.62, p-value = 2.385e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -2.953793 -1.190715
## sample estimates:
## mean of x mean of y 
##  1.660626  3.732879 
## 
## 
##  Welch Two Sample t-test
## 
## data:  Cyano_SO$Abundance and Cyano_SSTC$Abundance
## t = -3.8645, df = 42.821, p-value = 0.0003723
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -2.7558208 -0.8656909
## sample estimates:
## mean of x mean of y 
##  1.660626  3.471382