the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Weekly reconstruction of pH and total alkalinity in an upwelling-dominated coastal ecosystem through neural networks (ATpH-NN): The case of Ría de Vigo (NW Spain) between 1992 and 2019
Abstract. Short and long-term variability of seawater carbon dioxide (CO2) system shows large differences between different ecosystems which are derived from the characteristic processes of each area. The high variability of coastal ecosystems, their ecological and economic significance, the anthropogenic influence on them and their behavior as sources or sinks of atmospheric CO2, highlight the relevance to better understand the processes that underlie the variability and the alterations of the CO2 system at different spatiotemporal scales. To confidently achieve this purpose, it is necessary to have high-frequency data sustained over several years in different regions. In this work, we contribute to this need by configuring and training two neural networks with the capacity to model the weekly variability of pH and total alkalinity (AT) in the upper 50 m of the water column of the Ría de Vigo (NW Spain), with an error of 0.031 pH units and 10.9 µmol kg−1 respectively. With these networks, we generated weekly time series of pH and AT in seven locations of the Ría de Vigo in three depth ranges (0–5 m, 5–10 m and 10–15 m), which adequately represent independent discrete measurements. In a first analysis of the time series, a high short-term variability is observed, being larger for the inner stations of the Ría de Vigo. The lowest values of pH and AT were obtained for the inner zone, showing a progressive increase towards the outer/middle zone of the ría. The mean seasonal cycle also reflects the gradient between both zones, with a larger amplitude and variability for both variables in the inner zone. On the other hand, the long-term trends derived from the time series of pH show a higher acidification than that obtained for the open ocean, with surface trends ranging from −0.020 pH units per year in the outer/middle zone to −0.032 pH units per year in the inner zone. In addition, positive long-term trends of AT were obtained ranging from 0.39 µmol kg−1 per year in the outer/middle zone to 2.86 µmol kg−1 per year in the inner zone. The results presented in this study show the changing conditions both in the short and long-term variability as well as the spatial differentiation between the inner and outer/middle zone to which the organisms of the Ría de Vigo are subjected. The neural networks and the database provided in this study offer the opportunity to evaluate the CO2 system in an environment of high ecological and economic relevance, to validate high-resolution regional biogeochemical models and to evaluate the impacts on organisms of the Ría de Vigo by refining the ranges of the biogeochemical variables included in experiments.
This preprint has been withdrawn.
-
Withdrawal notice
This preprint has been withdrawn.
-
Preprint
(2734 KB)
-
Supplement
(1763 KB)
-
This preprint has been withdrawn.
- Preprint
(2734 KB) - Metadata XML
-
Supplement
(1763 KB) - BibTeX
- EndNote
Interactive discussion
Status: closed
-
RC1: 'Comment on bg-2021-33', Anonymous Referee #1, 09 Mar 2021
Summary and overall impression
The manuscript makes use of ship measurements of physical and biogeochemical properties (temperature, salinity, phosphate, nitrate, silicate, dissolved oxygen, pH, total alkalinity AT), in combination with measurements taken at 7 time-series stations (same variables as above but without pH and AT) in the Ría de Vigo, an embayment in NW Spain. They use the ship data to train a feed-forward neural network to then estimate time-series of pH and AT at the 7 stations. They validate the time-series estimates with ship measurements of pH and AT near the time-series stations. They then present the time-series estimates with a focus on the seasonal cycle and long-term trends.
This is an interesting study, presenting a new dataset of high-frequency time-series of pH and AT in the upper 15 m of the Ría de Vigo. The findings of the paper help to quantify and better understand the variability and long-term trend in regional acidification, and could be important for stakeholders with an economic or ecologic interest in the Ría de Vigo. It may also be of interest to scientists who want to estimate time-series based on available proxy data in different regions. However, I do not believe this paper is ready for publication yet. I have some major concerns about the method and its validation, as well as about some of the results, as outlined in the general comments below.
General comments
For training the network, data from the Ría de Vigo and beyond (offshore Galicia) was used. There should be a discussion on the validity of the statistical relationships between the predictor and training data from within and outside of the Ría de Vigo. Due to the special conditions in the embayment, the statistical relations from outside the embayment might not hold inside the embayment. Presumably, inside the embayment, the effect of the freshwater influx would be a lot stronger than offshore. The fact that salinity is a weak predictor of AT in the network (Table 2), while salinity has a high correlation with AT in the embayment (L.276 & L. 315) is a further concern.
The training data extends until 50 m, while the labeling data only extends until 15 m. Similarly, the periods of the two datasets are considerably different (1976-2018 vs. 1992-2019). There should be a discussion on why this is ok. As both time and depth are inputs into the network, this is probably not an issue. It is likely to be a good choice as it increases the amount of available training data. Nonetheless, this should be discussed.
The statistical properties of different set-ups including different predictors and number of neurons are presented (Table 1) but the final neural network is only run once. If a different set of testing, training, and validation data was chosen, i.e. if the network was run again, the statistics could change, and the output would be slightly different, and in the worst case completely different. Thus, the robustness of the training should be checked, e.g., by conducting a boot-strapping approach where each time the network is run, the data that is assigned as “training“ and “validation“ and “testing“ data is different in each run. I suspect that the statistics might change slightly when re-running the network as I find it odd that cW and sW have very different weights when they have a very strong statistic relationship to each other. It might also be that in a different run, salinity would be a more important driver for AT than it currently is.
I appreciate the transparency of presenting how the final set-up was chosen (e.g., which predictors and how many neurons). However, I think that just letting the network “decide“ on the best performance can be dangerous, if it doesn’t make physical or biogeochemical sense. E.g., what if the set-up with only DTS had been the best set-up according to the tests? Then the biogeochemistry would have been completely ignored in the network, and there might have been some overfitting. In addition, the list of tested predictor combinations does not include all possible combinations, e.g., DTS are predictors in all of them. My suggestion would be to use all available predictors that make physically and biogeochemically sense (T/S, all nutrients, and oxygen). If some of them don’t have a strong statistical relationship to the target variables, the network will weigh them less. Then it can be tested with the presented method which parameters on position or time further improve the network (as well as testing the number of neurons).
The validation of the time-series estimates is done with observations from the IIM database (i.e., the training data) near each of the stations. It is not stated explicitly if this data is independent, i.e. if it is not part of the training data set. As it was not stated otherwise, I assume that the validation data is part of the training data and is thus not independent. This is a major concern as validation should be conducted with independent data. Further, the validation of the time-series estimates is very limited, which is understandable due to the data sparsity. However, with the current testing, we do not know if the trend and the interannual variability is correctly captured.
Table 5 shows that the trend in the pH increases with depth. This is contrary to what I would have expected and should be discussed. Does the trend in pH also increase with depth at other locations, e.g., BATS?
Specific Comments
Title: Consider adding “surface“ in the title (unless the Ría de Vigo is only ~15 m deep…)
Title: Consider changing “coastal ecosystem“ to “coastal embayment“
Abstract: Mention (for the non-Spanish speakers) that the Ría de Vigo is an embayment.
L.22: From Fig. 5, I cannot see that AT is significantly lower in the inner stations compared to the outer ones. What is this statement based on?
L.28: -0.0020 (or -0.0019?) and -0.0032
L.28-30: Add uncertainties
L.32: What use is the trained neural network to the community? Can it be applied to other data? If so, it should be stated, if not, that part of the sentence removed.
Introduction: Add the significance of knowing about changes in AT (why not e.g., DIC)?
L.85: I would have liked to find out how deep the Ría de Vigo is. Is it only 15 m? That matters for the interpretation of the findings.
L.87: Name the citations
L.94: Consider discussing the benefit of high-frequency local data. Why would monthly fields not have been “good enough“ to investigate the seasonal cycle and long-term trend?
L.98: Consider specifying what these ecological and economic services are
L.103: Add “e.g.“ before the references as there are a few more
L.109: Remove “global“
L.132: Should be Sect. 3.1(?)
Eq. 1 and 2: Specify that week here refers to the week of the year, going from 1 to 53 (if that is the case)
L.142: It should be stated why this approach of taking neurons between 28 and 52 was taken. In the lead author’s previous study, the approach by Velo et al. (2013) was taken, where the number of neurons that were tested is 32, 64, 128 and 264.
L.197: The details of the Monte Carlo simulations have to be stated here.
Table 1: It looks like network 4 has better statistics than network 3.
L.232: It would be interesting to know if that is a specific region in the embayment, maybe it is a region with lots of freshwater influx. What implication does this finding have on the uncertainty of the time-series estimates of pH and AT?
L.324: It should be stated where the DIC data comes from. Was this directly measured on the ships? Which dataset is it from?
L.326: Is this difference significant? (0.89 vs. 0.91)
L.334: in L. 47, it says it’s between -0.0017 and -0.0026
L.341: How does a lack of data increase the pH?
L.356: There is also a global trend in deoxygenation linked to climate change (e.g., Keeling et al., 2010)
L.373: Add “in the upper 15 m“
L.383: Consider adding implications of these findings, e.g., for the economic and ecological services of the embayment
L.384: Consider stating explicitly what other questions could be answered with the data
Fig. 2: The lines in the inserts are different than in the large plots in a and b, and the x-axis label is missing
Fig. 4 and 5: The trendline and a running mean could be added, as well as the RMSE for each of the subplots.
All Figures should have a higher resolution.
Table B1: Why is P not included as a predictor for Si and N? Why is Si not a predictor for P?
Fig. S5 and S6: Consider changing the y-axis to make it easier to see the data points
Citation: https://doi.org/10.5194/bg-2021-33-RC1 -
RC2: 'Comment on bg-2021-33', Anonymous Referee #2, 10 Mar 2021
The reconstruction of regional pH and AT time series could be important for policy makers and the public to evaluate and understand the long-term acidification in local coastal regions. In this study, the authors use a neural network approach to reconstruct pH and AT time series in the Ria de Vigo from 1992 to 2019. The seasonal cycle and long-term trends of pH and AT are then determined and discussed using the reconstructed time series data.
I have some major concerns that need to be addressed before the publication of this manuscript.
Concerns:
The training, testing, and validation datasets all come from the IIM database and are separated randomly. However, randomly splitting the IIM data into three datasets can be very risky due to the potential auto-correlation among these three datasets, especially for periodic data. If the testing dataset is correlated with the training dataset, then it is not suitable to be used to assess the generalization of the network. In addition, this could be dangerous to the determination of long-term trends. A complete independent dataset could be used for testing. If an independent dataset is not available, the model could be built on anomalies with periodic cycles removed. Alternately, the training dataset and the testing dataset could be separated in a way that there is no time overlap between them.
Please explain why you choose 28, 34, 40, 46, and 52 as the tested number of neurons in the hidden layer. The number of neurons in the hidden layer seems too large considering the size of the input variables. This will likely lead to overfitting. Please justify that overfitting is not an issue for this model.
I am particularly concerned about the relative importance of input variables on AT. I assume that the overall connection weights have been calculated. The weak connection between salinity and AT is a red flag to the neural network. However, both strong salinity-AT correlation and the connection weights/inputs relative importance, which are contrary, have been used to explain the change in AT.
The description of the best network selection should be clarified. According to Table 1, LLDTSPNSiOYW trained with 34 neurons instead of DTSPNSiOYW with 28 neurons provide the lowest RMSE and highest r2 on the test dataset for pH. There is no need to trade-off between RMSE and r2 for pH model selection. For AT, DTSPNSiOW with 46 neurons is better than LLDTSPNSiYW with 52 neurons considering RMSE and r2 for the testing dataset. The best networks in Table 1 are different from the best networks stated in lines 225-231. By the way, it is not clear what “the most accurate representation of the validation data” means. As stated in lines 155-157, the validation dataset is used to stop the training process when the performance on this dataset does not improve during six consecutive iterations of the training process. So why it is used as one of the criteria to determine the best networks? It should be more clear how the best networks are selected.
It is worth calculating and discussing the long-term changes in the seasonal cycle amplitudes of pH and AT. The pH seasonal cycle amplitude change can be predicted according to the change in buffer capacity. Therefore, it could be used as a potential way to check the robustness of the model.
Below are some minor comments:
Section 2.2, what are the uncertainties for pH and AT measurements?
Provide a table with the number of observations in the training data for each depth range (0-5 m, 5-10 m and 10-15m) and zones (inner and outer/middle). This will be helpful to model performance discussion.
Figure 2, specify in the caption the differences between (a) and (b) as well as between (c) and (d).
Citation: https://doi.org/10.5194/bg-2021-33-RC2
Interactive discussion
Status: closed
-
RC1: 'Comment on bg-2021-33', Anonymous Referee #1, 09 Mar 2021
Summary and overall impression
The manuscript makes use of ship measurements of physical and biogeochemical properties (temperature, salinity, phosphate, nitrate, silicate, dissolved oxygen, pH, total alkalinity AT), in combination with measurements taken at 7 time-series stations (same variables as above but without pH and AT) in the Ría de Vigo, an embayment in NW Spain. They use the ship data to train a feed-forward neural network to then estimate time-series of pH and AT at the 7 stations. They validate the time-series estimates with ship measurements of pH and AT near the time-series stations. They then present the time-series estimates with a focus on the seasonal cycle and long-term trends.
This is an interesting study, presenting a new dataset of high-frequency time-series of pH and AT in the upper 15 m of the Ría de Vigo. The findings of the paper help to quantify and better understand the variability and long-term trend in regional acidification, and could be important for stakeholders with an economic or ecologic interest in the Ría de Vigo. It may also be of interest to scientists who want to estimate time-series based on available proxy data in different regions. However, I do not believe this paper is ready for publication yet. I have some major concerns about the method and its validation, as well as about some of the results, as outlined in the general comments below.
General comments
For training the network, data from the Ría de Vigo and beyond (offshore Galicia) was used. There should be a discussion on the validity of the statistical relationships between the predictor and training data from within and outside of the Ría de Vigo. Due to the special conditions in the embayment, the statistical relations from outside the embayment might not hold inside the embayment. Presumably, inside the embayment, the effect of the freshwater influx would be a lot stronger than offshore. The fact that salinity is a weak predictor of AT in the network (Table 2), while salinity has a high correlation with AT in the embayment (L.276 & L. 315) is a further concern.
The training data extends until 50 m, while the labeling data only extends until 15 m. Similarly, the periods of the two datasets are considerably different (1976-2018 vs. 1992-2019). There should be a discussion on why this is ok. As both time and depth are inputs into the network, this is probably not an issue. It is likely to be a good choice as it increases the amount of available training data. Nonetheless, this should be discussed.
The statistical properties of different set-ups including different predictors and number of neurons are presented (Table 1) but the final neural network is only run once. If a different set of testing, training, and validation data was chosen, i.e. if the network was run again, the statistics could change, and the output would be slightly different, and in the worst case completely different. Thus, the robustness of the training should be checked, e.g., by conducting a boot-strapping approach where each time the network is run, the data that is assigned as “training“ and “validation“ and “testing“ data is different in each run. I suspect that the statistics might change slightly when re-running the network as I find it odd that cW and sW have very different weights when they have a very strong statistic relationship to each other. It might also be that in a different run, salinity would be a more important driver for AT than it currently is.
I appreciate the transparency of presenting how the final set-up was chosen (e.g., which predictors and how many neurons). However, I think that just letting the network “decide“ on the best performance can be dangerous, if it doesn’t make physical or biogeochemical sense. E.g., what if the set-up with only DTS had been the best set-up according to the tests? Then the biogeochemistry would have been completely ignored in the network, and there might have been some overfitting. In addition, the list of tested predictor combinations does not include all possible combinations, e.g., DTS are predictors in all of them. My suggestion would be to use all available predictors that make physically and biogeochemically sense (T/S, all nutrients, and oxygen). If some of them don’t have a strong statistical relationship to the target variables, the network will weigh them less. Then it can be tested with the presented method which parameters on position or time further improve the network (as well as testing the number of neurons).
The validation of the time-series estimates is done with observations from the IIM database (i.e., the training data) near each of the stations. It is not stated explicitly if this data is independent, i.e. if it is not part of the training data set. As it was not stated otherwise, I assume that the validation data is part of the training data and is thus not independent. This is a major concern as validation should be conducted with independent data. Further, the validation of the time-series estimates is very limited, which is understandable due to the data sparsity. However, with the current testing, we do not know if the trend and the interannual variability is correctly captured.
Table 5 shows that the trend in the pH increases with depth. This is contrary to what I would have expected and should be discussed. Does the trend in pH also increase with depth at other locations, e.g., BATS?
Specific Comments
Title: Consider adding “surface“ in the title (unless the Ría de Vigo is only ~15 m deep…)
Title: Consider changing “coastal ecosystem“ to “coastal embayment“
Abstract: Mention (for the non-Spanish speakers) that the Ría de Vigo is an embayment.
L.22: From Fig. 5, I cannot see that AT is significantly lower in the inner stations compared to the outer ones. What is this statement based on?
L.28: -0.0020 (or -0.0019?) and -0.0032
L.28-30: Add uncertainties
L.32: What use is the trained neural network to the community? Can it be applied to other data? If so, it should be stated, if not, that part of the sentence removed.
Introduction: Add the significance of knowing about changes in AT (why not e.g., DIC)?
L.85: I would have liked to find out how deep the Ría de Vigo is. Is it only 15 m? That matters for the interpretation of the findings.
L.87: Name the citations
L.94: Consider discussing the benefit of high-frequency local data. Why would monthly fields not have been “good enough“ to investigate the seasonal cycle and long-term trend?
L.98: Consider specifying what these ecological and economic services are
L.103: Add “e.g.“ before the references as there are a few more
L.109: Remove “global“
L.132: Should be Sect. 3.1(?)
Eq. 1 and 2: Specify that week here refers to the week of the year, going from 1 to 53 (if that is the case)
L.142: It should be stated why this approach of taking neurons between 28 and 52 was taken. In the lead author’s previous study, the approach by Velo et al. (2013) was taken, where the number of neurons that were tested is 32, 64, 128 and 264.
L.197: The details of the Monte Carlo simulations have to be stated here.
Table 1: It looks like network 4 has better statistics than network 3.
L.232: It would be interesting to know if that is a specific region in the embayment, maybe it is a region with lots of freshwater influx. What implication does this finding have on the uncertainty of the time-series estimates of pH and AT?
L.324: It should be stated where the DIC data comes from. Was this directly measured on the ships? Which dataset is it from?
L.326: Is this difference significant? (0.89 vs. 0.91)
L.334: in L. 47, it says it’s between -0.0017 and -0.0026
L.341: How does a lack of data increase the pH?
L.356: There is also a global trend in deoxygenation linked to climate change (e.g., Keeling et al., 2010)
L.373: Add “in the upper 15 m“
L.383: Consider adding implications of these findings, e.g., for the economic and ecological services of the embayment
L.384: Consider stating explicitly what other questions could be answered with the data
Fig. 2: The lines in the inserts are different than in the large plots in a and b, and the x-axis label is missing
Fig. 4 and 5: The trendline and a running mean could be added, as well as the RMSE for each of the subplots.
All Figures should have a higher resolution.
Table B1: Why is P not included as a predictor for Si and N? Why is Si not a predictor for P?
Fig. S5 and S6: Consider changing the y-axis to make it easier to see the data points
Citation: https://doi.org/10.5194/bg-2021-33-RC1 -
RC2: 'Comment on bg-2021-33', Anonymous Referee #2, 10 Mar 2021
The reconstruction of regional pH and AT time series could be important for policy makers and the public to evaluate and understand the long-term acidification in local coastal regions. In this study, the authors use a neural network approach to reconstruct pH and AT time series in the Ria de Vigo from 1992 to 2019. The seasonal cycle and long-term trends of pH and AT are then determined and discussed using the reconstructed time series data.
I have some major concerns that need to be addressed before the publication of this manuscript.
Concerns:
The training, testing, and validation datasets all come from the IIM database and are separated randomly. However, randomly splitting the IIM data into three datasets can be very risky due to the potential auto-correlation among these three datasets, especially for periodic data. If the testing dataset is correlated with the training dataset, then it is not suitable to be used to assess the generalization of the network. In addition, this could be dangerous to the determination of long-term trends. A complete independent dataset could be used for testing. If an independent dataset is not available, the model could be built on anomalies with periodic cycles removed. Alternately, the training dataset and the testing dataset could be separated in a way that there is no time overlap between them.
Please explain why you choose 28, 34, 40, 46, and 52 as the tested number of neurons in the hidden layer. The number of neurons in the hidden layer seems too large considering the size of the input variables. This will likely lead to overfitting. Please justify that overfitting is not an issue for this model.
I am particularly concerned about the relative importance of input variables on AT. I assume that the overall connection weights have been calculated. The weak connection between salinity and AT is a red flag to the neural network. However, both strong salinity-AT correlation and the connection weights/inputs relative importance, which are contrary, have been used to explain the change in AT.
The description of the best network selection should be clarified. According to Table 1, LLDTSPNSiOYW trained with 34 neurons instead of DTSPNSiOYW with 28 neurons provide the lowest RMSE and highest r2 on the test dataset for pH. There is no need to trade-off between RMSE and r2 for pH model selection. For AT, DTSPNSiOW with 46 neurons is better than LLDTSPNSiYW with 52 neurons considering RMSE and r2 for the testing dataset. The best networks in Table 1 are different from the best networks stated in lines 225-231. By the way, it is not clear what “the most accurate representation of the validation data” means. As stated in lines 155-157, the validation dataset is used to stop the training process when the performance on this dataset does not improve during six consecutive iterations of the training process. So why it is used as one of the criteria to determine the best networks? It should be more clear how the best networks are selected.
It is worth calculating and discussing the long-term changes in the seasonal cycle amplitudes of pH and AT. The pH seasonal cycle amplitude change can be predicted according to the change in buffer capacity. Therefore, it could be used as a potential way to check the robustness of the model.
Below are some minor comments:
Section 2.2, what are the uncertainties for pH and AT measurements?
Provide a table with the number of observations in the training data for each depth range (0-5 m, 5-10 m and 10-15m) and zones (inner and outer/middle). This will be helpful to model performance discussion.
Figure 2, specify in the caption the differences between (a) and (b) as well as between (c) and (d).
Citation: https://doi.org/10.5194/bg-2021-33-RC2
Data sets
INTECMAR_NN-DATABASE.csv Broullón, D., Pérez, F. F., and Doval, M. D. http://dx.doi.org/10.20350/digitalCSIC/12642
Model code and software
Source_code.rar Broullón, D., Pérez, F. F., and Doval, M. D. http://dx.doi.org/10.20350/digitalCSIC/12642
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
803 | 347 | 51 | 1,201 | 147 | 46 | 41 |
- HTML: 803
- PDF: 347
- XML: 51
- Total: 1,201
- Supplement: 147
- BibTeX: 46
- EndNote: 41
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1
Cited
2 citations as recorded by crossref.
- Advancing real-time pH sensing capabilities to monitor coastal acidification as measured in a productive and dynamic estuary (Ría de Arousa, NW Spain) A. Velo & X. Padin 10.3389/fmars.2022.941359
- pH trends and seasonal cycle in the coastal Balearic Sea reconstructed through machine learning S. Flecha et al. 10.1038/s41598-022-17253-5
Daniel Broullón
Fiz F. Pérez
María Dolores Doval
This preprint has been withdrawn.
- Preprint
(2734 KB) - Metadata XML
-
Supplement
(1763 KB) - BibTeX
- EndNote