the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Predicting dominant terrestrial biomes at a global scale using machine learning algorithms, climate variable indices, and extreme climate indices
Abstract. Several methods have been proposed for modelling global biome distribution. Climate data are typically summarised in terms of a few climate indices. However, with the recent advancement of machine learning algorithms, such summarisation is no longer required. Extreme climate events such as intense droughts and very low temperatures cannot be captured by monthly mean climate data, which may limit the applicability of biome boundaries. In this study, I assessed the influences of machine learning algorithms, climate variable indices, and extreme climate indices on the accuracy and robustness of global biome modelling. I found that the random forest and convolutional neural network algorithms produced highly accurate models for reconstructing the global biome distribution. However, the convolutional neural network algorithm was preferable, because the random forest algorithm substantially overfit the training data relative to the other machine learning algorithms examined. Including indexed climate data slightly reduced model accuracy, whereas including extreme climate data slightly improved it. However, there were significant deviations in the distribution of values between the observed and predicted climate when extreme climate data was included; this fatally reduced the robustness of the models, which were evaluated in terms of prediction consistency. Therefore, I recommend that extreme climate data not be considered in global-scale biome prediction applications.
This preprint has been withdrawn.
-
Withdrawal notice
This preprint has been withdrawn.
-
Preprint
(2243 KB)
-
Supplement
(1992 KB)
-
This preprint has been withdrawn.
- Preprint
(2243 KB) - Metadata XML
-
Supplement
(1992 KB) - BibTeX
- EndNote
Interactive discussion
Status: closed
-
RC1: 'Comment on bg-2023-106', Anonymous Referee #1, 08 Feb 2024
Review: SATO: Predicting dominant terrestrial biomes at a global scale using machine learning algorithms, climate variable indices, and extreme climate indices
Â
 The author presents an ML study for modeling land-use based on climate data. ML models are used out of the box, trained, and then evaluated against land-use data. Land-cover predictions based on future climate are also calculated. Overall, I feel that the current manuscript is underdeveloped and would benefit from more careful model setup and validation.Â
Â
 General comments
Â
 Model setup: This work used several ML models out of the box, without parameter refinement, which I think severely limits the value of the derived conclusions. I would generally expect some careful and thoughtful testing of different models using parameter grid search or similar, to make sure that models are indeed used to their capability. For example, the RF model is severely overfitted (100 % training accuracy), which is avoidable by limiting tree depths, etc. Hence the conclusion that CNN is preferable is not really supported here.Â
Â
 Feature selection: The author states that because of ML, there is little necessity in limiting the number of climate indices to model land-cover. However, a large number of the climate indices that are being used in this study are highly correlated and add little additional information. In the case of the Naive Bayes that is likely a problem, because it assumes independence of features, while random forests and SVM are less affected by this. It would have been useful to determine feature importance for the ML models. Right now, we are mostly talking about black boxes, for which we don't know what features were most predictive and what causes the models to fail in the edge cases.Â
Â
 Model Validation: The manuscript lumps model validation into simple accuracy scores and presents maps of deviations. It is not clear to me what the maps show (the true land-use or the false ML land use). It is also difficult to integrate over the different classes. Here something like a confusion matrix would be helpful. Generally speaking, additional validation of the ML models. Another useful validation exercise would be to compare results to a most simple model (i.e. a classical P-T relationship).  Â
Â
 Purpose: I am struggling to understand the purpose of this study. If it is to evaluate the ML models, then more careful model setup and validation is desired. If it is to show predicted change of land use with climate, then importance of features for the prediction and a careful analysis of model behaviors is needed.  I am trying to understand what the use case for these models is and what is being learned that can be applied elsewhere.Â
Â
 Specific comments
Â
 2.2. Climate data: I would prefer to have the variable tables moved into the main body. This makes it much easier to understand what is being used.Â
 L 88: "Tn10p, Tx10p, Tn90p, Tx90p, WSDI, and CSDI" > need to be defined
 L101: Citation software artifact
 L110: "they were transformed onto 10 min × 10 min grids through conservative interpolation and then resampled to 50 km × 50 km grids using nearest-neighbour interpolation" > I am a bit confused here, because 10x10 minutes is smaller than 50 x 50 km, so I don't understand how nearest neighbor interpolation makes sense.
Â
 L129: "I used the default model parameters for simplicity and to prevent potential overfitting, i.e., training the model too closely to a particular dataset, thereby creating a model that might fail to fit additional data or reliably predict future observations." > This is almost certainly a bad choice and causes an unfair comparison to the CNN.Â
Â
 L 136: (Sato and Ise, 2022): I am still a bit confused about this paper. CNNs are being used to evaluate graphical information. I still don't understand the advantage of coding numeric values in graphical information, which is then turned back into numeric information. How is this advantageous compared to for example a ANN architecture? CNNs are powerful, because photos etc. have very complex abstract features, which have to be learned by the model, but the synthetic images from climate data only have a few bits of information. Â
Â
 L167: "The low test accuracy of the NV model was caused by an overestimation of areas dominated by boreal forest, tropical rainforest, and deciduous broadleaf forest" > generally confusion matrices or Sankey diagrams would help to understand patterns in miss-classification of the models.Â
Â
 L202: "Excluding models trained with the NV algorithm and CEI dataset produced highly coincident PNV distributions under a future 204 climate (accuracy, 51.7%–82.8%, Table 5)." > is 52% agreement really highly coincident. Again, it would be good to actually be presented disagreement/agreement by class rather than maps. That would actually be helpful in understanding these models.Â
Â
 L235: "Adding extreme climate data improved test accuracy rates slightly but it can fatally reduce model robustness, which was defined as the consistency of model prediction under forecast climate conditions" > this is a good point, but also expected since statistical models will have a hard time making predictions/ extrapolations outside the training space. In a warming climate, extreme values will be outside what the models are trained on.Â
Â
 Tables 4+5: I don't understand the meaning of the *. Is this just for emphasis (i.e. considered better based on author's judgement?)
 Figures 2-9 should have color bars added.ÂCitation: https://doi.org/10.5194/bg-2023-106-RC1 -
AC1: 'Reply on RC1', Hisashi Sato, 23 Feb 2024
The comment was uploaded in the form of a supplement: https://bg.copernicus.org/preprints/bg-2023-106/bg-2023-106-AC1-supplement.pdf
-
AC1: 'Reply on RC1', Hisashi Sato, 23 Feb 2024
-
RC2: 'Comment on bg-2023-106', Anonymous Referee #3, 16 Feb 2024
Referee report
This manuscript aims to predict the global biome distribution using machine learning method based on climate characteristics and estimates its accuracy. Although CNN is a promising technology for image based vegetation classification, it remains limitedly underutilized for climate envelope modeling. I think it is good to see them being used here.
Unfortunately however, I have some concerns regarding the comparison of model performance. The authors need to clearly state that different type of input data was used to CNN and other models. The authors should also explain how these differences in data sources affected the results of the model performance comparisons.
Â
I have four main concerns:
The input data for training is not clear — The author explains that CNN uses graphical images of climate data as training data. I am not sure what graphical images of climate data meant. Is this RGB transformed climate data? If so, the authors need to clearly explain how they converted the climate data to graphical image data. The input data for RF, SVM, and NV also unclear. Are these models trained by climate data variable itself? If so, the authors also need to clearly state CNN and the other models were trained by different type of data. The authors should provide more detailed explanation about how the models used in this study trained.
Â
One of my main concerns is that the fairness of model performance comparison. My understanding is that in this paper, CNN model was trained by graphical images of climate data while the other models were trained by climate data itself. Therefore, difference of accuracy or robustness among CNN and the other models seen to reflect not only model performance but also input data difference. Since model performance comparison generally aims to evaluate the performance of algorithms, I am not completely sure whether the model performance comparison in this study is truly meaningful or not. First, the author needs to clarify the reason why convert the climate data into graphical image. Second, the authors should also explain how these differences in data sources affected the results of the model performance comparisons.
Â
P5 L129-132:Â Authors should explain the model setting so that other researchers can check the validity of their methods without checking code. I feel that the descriptions of the settings of machine learning methods other than CNN are insufficient. The authors should clarify more detail about the settings of machine learning methods, such as the information of the parameters or the type of kernel they employed.
Â
P5 L129-132: I also have concern regarding the parameter optimization of ML. In this study, author used ML without optimizing parameters. I would suggest that the author try to optimizing parameters of ML using commonly used method such as grid search.
Â
Minor points
P4 L101: Please check the format of references.
Citation: https://doi.org/10.5194/bg-2023-106-RC2 - AC2: 'Reply on RC2', Hisashi Sato, 15 Mar 2024
-
EC1: 'Comment on bg-2023-106', Semeena Valiyaveetil Shamsudheen, 28 Mar 2024
The author has attended the comments from both reviewers in a neat and constructive manner.  I will recomment for publication after submission of a revised draft incorporating all the comments from reviewers. Â
Best regards,
Semeena
Citation: https://doi.org/10.5194/bg-2023-106-EC1 -
AC3: 'Reply on EC1', Hisashi Sato, 28 Mar 2024
Dear Editor,
Thank you very much for handling my manuscript.
I am ready to submit a revised manuscript, which incorporates all the comments from reviewers.Best,
Hisashi SATOCitation: https://doi.org/10.5194/bg-2023-106-AC3
-
AC3: 'Reply on EC1', Hisashi Sato, 28 Mar 2024
Interactive discussion
Status: closed
-
RC1: 'Comment on bg-2023-106', Anonymous Referee #1, 08 Feb 2024
Review: SATO: Predicting dominant terrestrial biomes at a global scale using machine learning algorithms, climate variable indices, and extreme climate indices
Â
 The author presents an ML study for modeling land-use based on climate data. ML models are used out of the box, trained, and then evaluated against land-use data. Land-cover predictions based on future climate are also calculated. Overall, I feel that the current manuscript is underdeveloped and would benefit from more careful model setup and validation.Â
Â
 General comments
Â
 Model setup: This work used several ML models out of the box, without parameter refinement, which I think severely limits the value of the derived conclusions. I would generally expect some careful and thoughtful testing of different models using parameter grid search or similar, to make sure that models are indeed used to their capability. For example, the RF model is severely overfitted (100 % training accuracy), which is avoidable by limiting tree depths, etc. Hence the conclusion that CNN is preferable is not really supported here.Â
Â
 Feature selection: The author states that because of ML, there is little necessity in limiting the number of climate indices to model land-cover. However, a large number of the climate indices that are being used in this study are highly correlated and add little additional information. In the case of the Naive Bayes that is likely a problem, because it assumes independence of features, while random forests and SVM are less affected by this. It would have been useful to determine feature importance for the ML models. Right now, we are mostly talking about black boxes, for which we don't know what features were most predictive and what causes the models to fail in the edge cases.Â
Â
 Model Validation: The manuscript lumps model validation into simple accuracy scores and presents maps of deviations. It is not clear to me what the maps show (the true land-use or the false ML land use). It is also difficult to integrate over the different classes. Here something like a confusion matrix would be helpful. Generally speaking, additional validation of the ML models. Another useful validation exercise would be to compare results to a most simple model (i.e. a classical P-T relationship).  Â
Â
 Purpose: I am struggling to understand the purpose of this study. If it is to evaluate the ML models, then more careful model setup and validation is desired. If it is to show predicted change of land use with climate, then importance of features for the prediction and a careful analysis of model behaviors is needed.  I am trying to understand what the use case for these models is and what is being learned that can be applied elsewhere.Â
Â
 Specific comments
Â
 2.2. Climate data: I would prefer to have the variable tables moved into the main body. This makes it much easier to understand what is being used.Â
 L 88: "Tn10p, Tx10p, Tn90p, Tx90p, WSDI, and CSDI" > need to be defined
 L101: Citation software artifact
 L110: "they were transformed onto 10 min × 10 min grids through conservative interpolation and then resampled to 50 km × 50 km grids using nearest-neighbour interpolation" > I am a bit confused here, because 10x10 minutes is smaller than 50 x 50 km, so I don't understand how nearest neighbor interpolation makes sense.
Â
 L129: "I used the default model parameters for simplicity and to prevent potential overfitting, i.e., training the model too closely to a particular dataset, thereby creating a model that might fail to fit additional data or reliably predict future observations." > This is almost certainly a bad choice and causes an unfair comparison to the CNN.Â
Â
 L 136: (Sato and Ise, 2022): I am still a bit confused about this paper. CNNs are being used to evaluate graphical information. I still don't understand the advantage of coding numeric values in graphical information, which is then turned back into numeric information. How is this advantageous compared to for example a ANN architecture? CNNs are powerful, because photos etc. have very complex abstract features, which have to be learned by the model, but the synthetic images from climate data only have a few bits of information. Â
Â
 L167: "The low test accuracy of the NV model was caused by an overestimation of areas dominated by boreal forest, tropical rainforest, and deciduous broadleaf forest" > generally confusion matrices or Sankey diagrams would help to understand patterns in miss-classification of the models.Â
Â
 L202: "Excluding models trained with the NV algorithm and CEI dataset produced highly coincident PNV distributions under a future 204 climate (accuracy, 51.7%–82.8%, Table 5)." > is 52% agreement really highly coincident. Again, it would be good to actually be presented disagreement/agreement by class rather than maps. That would actually be helpful in understanding these models.Â
Â
 L235: "Adding extreme climate data improved test accuracy rates slightly but it can fatally reduce model robustness, which was defined as the consistency of model prediction under forecast climate conditions" > this is a good point, but also expected since statistical models will have a hard time making predictions/ extrapolations outside the training space. In a warming climate, extreme values will be outside what the models are trained on.Â
Â
 Tables 4+5: I don't understand the meaning of the *. Is this just for emphasis (i.e. considered better based on author's judgement?)
 Figures 2-9 should have color bars added.ÂCitation: https://doi.org/10.5194/bg-2023-106-RC1 -
AC1: 'Reply on RC1', Hisashi Sato, 23 Feb 2024
The comment was uploaded in the form of a supplement: https://bg.copernicus.org/preprints/bg-2023-106/bg-2023-106-AC1-supplement.pdf
-
AC1: 'Reply on RC1', Hisashi Sato, 23 Feb 2024
-
RC2: 'Comment on bg-2023-106', Anonymous Referee #3, 16 Feb 2024
Referee report
This manuscript aims to predict the global biome distribution using machine learning method based on climate characteristics and estimates its accuracy. Although CNN is a promising technology for image based vegetation classification, it remains limitedly underutilized for climate envelope modeling. I think it is good to see them being used here.
Unfortunately however, I have some concerns regarding the comparison of model performance. The authors need to clearly state that different type of input data was used to CNN and other models. The authors should also explain how these differences in data sources affected the results of the model performance comparisons.
Â
I have four main concerns:
The input data for training is not clear — The author explains that CNN uses graphical images of climate data as training data. I am not sure what graphical images of climate data meant. Is this RGB transformed climate data? If so, the authors need to clearly explain how they converted the climate data to graphical image data. The input data for RF, SVM, and NV also unclear. Are these models trained by climate data variable itself? If so, the authors also need to clearly state CNN and the other models were trained by different type of data. The authors should provide more detailed explanation about how the models used in this study trained.
Â
One of my main concerns is that the fairness of model performance comparison. My understanding is that in this paper, CNN model was trained by graphical images of climate data while the other models were trained by climate data itself. Therefore, difference of accuracy or robustness among CNN and the other models seen to reflect not only model performance but also input data difference. Since model performance comparison generally aims to evaluate the performance of algorithms, I am not completely sure whether the model performance comparison in this study is truly meaningful or not. First, the author needs to clarify the reason why convert the climate data into graphical image. Second, the authors should also explain how these differences in data sources affected the results of the model performance comparisons.
Â
P5 L129-132:Â Authors should explain the model setting so that other researchers can check the validity of their methods without checking code. I feel that the descriptions of the settings of machine learning methods other than CNN are insufficient. The authors should clarify more detail about the settings of machine learning methods, such as the information of the parameters or the type of kernel they employed.
Â
P5 L129-132: I also have concern regarding the parameter optimization of ML. In this study, author used ML without optimizing parameters. I would suggest that the author try to optimizing parameters of ML using commonly used method such as grid search.
Â
Minor points
P4 L101: Please check the format of references.
Citation: https://doi.org/10.5194/bg-2023-106-RC2 - AC2: 'Reply on RC2', Hisashi Sato, 15 Mar 2024
-
EC1: 'Comment on bg-2023-106', Semeena Valiyaveetil Shamsudheen, 28 Mar 2024
The author has attended the comments from both reviewers in a neat and constructive manner.  I will recomment for publication after submission of a revised draft incorporating all the comments from reviewers. Â
Best regards,
Semeena
Citation: https://doi.org/10.5194/bg-2023-106-EC1 -
AC3: 'Reply on EC1', Hisashi Sato, 28 Mar 2024
Dear Editor,
Thank you very much for handling my manuscript.
I am ready to submit a revised manuscript, which incorporates all the comments from reviewers.Best,
Hisashi SATOCitation: https://doi.org/10.5194/bg-2023-106-AC3
-
AC3: 'Reply on EC1', Hisashi Sato, 28 Mar 2024
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
386 | 95 | 50 | 531 | 43 | 29 | 42 |
- HTML: 386
- PDF: 95
- XML: 50
- Total: 531
- Supplement: 43
- BibTeX: 29
- EndNote: 42
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1
This preprint has been withdrawn.
- Preprint
(2243 KB) - Metadata XML
-
Supplement
(1992 KB) - BibTeX
- EndNote