Predicting dominant terrestrial biomes at a global scale using machine learning algorithms, climate variable indices, and extreme climate indices

Sato, Hisashi

doi:https://doi.org/10.5194/bg-2023-106

Preprints

https://doi.org/10.5194/bg-2023-106

Preprints

25 Jan 2024

| 25 Jan 2024

Status: this preprint has been withdrawn by the authors.

Predicting dominant terrestrial biomes at a global scale using machine learning algorithms, climate variable indices, and extreme climate indices

Hisashi Sato

Abstract. Several methods have been proposed for modelling global biome distribution. Climate data are typically summarised in terms of a few climate indices. However, with the recent advancement of machine learning algorithms, such summarisation is no longer required. Extreme climate events such as intense droughts and very low temperatures cannot be captured by monthly mean climate data, which may limit the applicability of biome boundaries. In this study, I assessed the influences of machine learning algorithms, climate variable indices, and extreme climate indices on the accuracy and robustness of global biome modelling. I found that the random forest and convolutional neural network algorithms produced highly accurate models for reconstructing the global biome distribution. However, the convolutional neural network algorithm was preferable, because the random forest algorithm substantially overfit the training data relative to the other machine learning algorithms examined. Including indexed climate data slightly reduced model accuracy, whereas including extreme climate data slightly improved it. However, there were significant deviations in the distribution of values between the observed and predicted climate when extreme climate data was included; this fatally reduced the robustness of the models, which were evaluated in terms of prediction consistency. Therefore, I recommend that extreme climate data not be considered in global-scale biome prediction applications.

This preprint has been withdrawn.

Received: 05 Jul 2023 – Discussion started: 25 Jan 2024

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Preprint (PDF, 2243 KB)

Withdrawal notice
This preprint has been withdrawn.
Preprint (2243 KB)

Supplement (1992 KB)

Download & links

This preprint has been withdrawn.

Hisashi Sato

Interactive discussion

Status: closed

RC1:
'Comment on bg-2023-106', Anonymous Referee #1, 08 Feb 2024

Review: SATO: Predicting dominant terrestrial biomes at a global scale using machine learning algorithms, climate variable indices, and extreme climate indices

The author presents an ML study for modeling land-use based on climate data. ML models are used out of the box, trained, and then evaluated against land-use data. Land-cover predictions based on future climate are also calculated. Overall, I feel that the current manuscript is underdeveloped and would benefit from more careful model setup and validation.

General comments

Model setup: This work used several ML models out of the box, without parameter refinement, which I think severely limits the value of the derived conclusions. I would generally expect some careful and thoughtful testing of different models using parameter grid search or similar, to make sure that models are indeed used to their capability. For example, the RF model is severely overfitted (100 % training accuracy), which is avoidable by limiting tree depths, etc. Hence the conclusion that CNN is preferable is not really supported here.

Feature selection: The author states that because of ML, there is little necessity in limiting the number of climate indices to model land-cover. However, a large number of the climate indices that are being used in this study are highly correlated and add little additional information. In the case of the Naive Bayes that is likely a problem, because it assumes independence of features, while random forests and SVM are less affected by this. It would have been useful to determine feature importance for the ML models. Right now, we are mostly talking about black boxes, for which we don't know what features were most predictive and what causes the models to fail in the edge cases.

Model Validation: The manuscript lumps model validation into simple accuracy scores and presents maps of deviations. It is not clear to me what the maps show (the true land-use or the false ML land use). It is also difficult to integrate over the different classes. Here something like a confusion matrix would be helpful. Generally speaking, additional validation of the ML models. Another useful validation exercise would be to compare results to a most simple model (i.e. a classical P-T relationship).

Purpose: I am struggling to understand the purpose of this study. If it is to evaluate the ML models, then more careful model setup and validation is desired. If it is to show predicted change of land use with climate, then importance of features for the prediction and a careful analysis of model behaviors is needed. I am trying to understand what the use case for these models is and what is being learned that can be applied elsewhere.

Specific comments

2.2. Climate data: I would prefer to have the variable tables moved into the main body. This makes it much easier to understand what is being used.

L 88: "Tn10p, Tx10p, Tn90p, Tx90p, WSDI, and CSDI" > need to be defined

L101: Citation software artifact

L110: "they were transformed onto 10 min × 10 min grids through conservative interpolation and then resampled to 50 km × 50 km grids using nearest-neighbour interpolation" > I am a bit confused here, because 10x10 minutes is smaller than 50 x 50 km, so I don't understand how nearest neighbor interpolation makes sense.

L129: "I used the default model parameters for simplicity and to prevent potential overfitting, i.e., training the model too closely to a particular dataset, thereby creating a model that might fail to fit additional data or reliably predict future observations." > This is almost certainly a bad choice and causes an unfair comparison to the CNN.

L 136: (Sato and Ise, 2022): I am still a bit confused about this paper. CNNs are being used to evaluate graphical information. I still don't understand the advantage of coding numeric values in graphical information, which is then turned back into numeric information. How is this advantageous compared to for example a ANN architecture? CNNs are powerful, because photos etc. have very complex abstract features, which have to be learned by the model, but the synthetic images from climate data only have a few bits of information.

L167: "The low test accuracy of the NV model was caused by an overestimation of areas dominated by boreal forest, tropical rainforest, and deciduous broadleaf forest" > generally confusion matrices or Sankey diagrams would help to understand patterns in miss-classification of the models.

L202: "Excluding models trained with the NV algorithm and CEI dataset produced highly coincident PNV distributions under a future 204 climate (accuracy, 51.7%–82.8%, Table 5)." > is 52% agreement really highly coincident. Again, it would be good to actually be presented disagreement/agreement by class rather than maps. That would actually be helpful in understanding these models.

L235: "Adding extreme climate data improved test accuracy rates slightly but it can fatally reduce model robustness, which was defined as the consistency of model prediction under forecast climate conditions" > this is a good point, but also expected since statistical models will have a hard time making predictions/ extrapolations outside the training space. In a warming climate, extreme values will be outside what the models are trained on.

Tables 4+5: I don't understand the meaning of the *. Is this just for emphasis (i.e. considered better based on author's judgement?)

Figures 2-9 should have color bars added.

Citation: https://doi.org/10.5194/bg-2023-106-RC1
- AC1: 'Reply on RC1', Hisashi Sato, 23 Feb 2024
  
  The comment was uploaded in the form of a supplement: https://bg.copernicus.org/preprints/bg-2023-106/bg-2023-106-AC1-supplement.pdf
  
  Citation: https://doi.org/10.5194/bg-2023-106-AC1
RC2:
'Comment on bg-2023-106', Anonymous Referee #3, 16 Feb 2024

Referee report
This manuscript aims to predict the global biome distribution using machine learning method based on climate characteristics and estimates its accuracy. Although CNN is a promising technology for image based vegetation classification, it remains limitedly underutilized for climate envelope modeling. I think it is good to see them being used here.
Unfortunately however, I have some concerns regarding the comparison of model performance. The authors need to clearly state that different type of input data was used to CNN and other models. The authors should also explain how these differences in data sources affected the results of the model performance comparisons.

I have four main concerns:
The input data for training is not clear — The author explains that CNN uses graphical images of climate data as training data. I am not sure what graphical images of climate data meant. Is this RGB transformed climate data? If so, the authors need to clearly explain how they converted the climate data to graphical image data. The input data for RF, SVM, and NV also unclear. Are these models trained by climate data variable itself? If so, the authors also need to clearly state CNN and the other models were trained by different type of data. The authors should provide more detailed explanation about how the models used in this study trained.

One of my main concerns is that the fairness of model performance comparison. My understanding is that in this paper, CNN model was trained by graphical images of climate data while the other models were trained by climate data itself. Therefore, difference of accuracy or robustness among CNN and the other models seen to reflect not only model performance but also input data difference. Since model performance comparison generally aims to evaluate the performance of algorithms, I am not completely sure whether the model performance comparison in this study is truly meaningful or not. First, the author needs to clarify the reason why convert the climate data into graphical image. Second, the authors should also explain how these differences in data sources affected the results of the model performance comparisons.

P5 L129-132: Authors should explain the model setting so that other researchers can check the validity of their methods without checking code. I feel that the descriptions of the settings of machine learning methods other than CNN are insufficient. The authors should clarify more detail about the settings of machine learning methods, such as the information of the parameters or the type of kernel they employed.

P5 L129-132: I also have concern regarding the parameter optimization of ML. In this study, author used ML without optimizing parameters. I would suggest that the author try to optimizing parameters of ML using commonly used method such as grid search.

Minor points
P4 L101: Please check the format of references.

Citation: https://doi.org/10.5194/bg-2023-106-RC2
- AC2: 'Reply on RC2', Hisashi Sato, 15 Mar 2024
  
  Please refer to the attached file.
  
  Citation: https://doi.org/10.5194/bg-2023-106-AC2
EC1:
'Comment on bg-2023-106', Semeena Valiyaveetil Shamsudheen, 28 Mar 2024

The author has attended the comments from both reviewers in a neat and constructive manner. I will recomment for publication after submission of a revised draft incorporating all the comments from reviewers.
Best regards,
Semeena

Citation: https://doi.org/10.5194/bg-2023-106-EC1
- AC3: 'Reply on EC1', Hisashi Sato, 28 Mar 2024
  
  Dear Editor,
  Thank you very much for handling my manuscript.
  
  I am ready to submit a revised manuscript, which incorporates all the comments from reviewers.
  Best,
  
  Hisashi SATO
  
  Citation: https://doi.org/10.5194/bg-2023-106-AC3

Interactive discussion

Status: closed

RC1:
'Comment on bg-2023-106', Anonymous Referee #1, 08 Feb 2024

Review: SATO: Predicting dominant terrestrial biomes at a global scale using machine learning algorithms, climate variable indices, and extreme climate indices

The author presents an ML study for modeling land-use based on climate data. ML models are used out of the box, trained, and then evaluated against land-use data. Land-cover predictions based on future climate are also calculated. Overall, I feel that the current manuscript is underdeveloped and would benefit from more careful model setup and validation.

General comments

Model setup: This work used several ML models out of the box, without parameter refinement, which I think severely limits the value of the derived conclusions. I would generally expect some careful and thoughtful testing of different models using parameter grid search or similar, to make sure that models are indeed used to their capability. For example, the RF model is severely overfitted (100 % training accuracy), which is avoidable by limiting tree depths, etc. Hence the conclusion that CNN is preferable is not really supported here.

Feature selection: The author states that because of ML, there is little necessity in limiting the number of climate indices to model land-cover. However, a large number of the climate indices that are being used in this study are highly correlated and add little additional information. In the case of the Naive Bayes that is likely a problem, because it assumes independence of features, while random forests and SVM are less affected by this. It would have been useful to determine feature importance for the ML models. Right now, we are mostly talking about black boxes, for which we don't know what features were most predictive and what causes the models to fail in the edge cases.

Model Validation: The manuscript lumps model validation into simple accuracy scores and presents maps of deviations. It is not clear to me what the maps show (the true land-use or the false ML land use). It is also difficult to integrate over the different classes. Here something like a confusion matrix would be helpful. Generally speaking, additional validation of the ML models. Another useful validation exercise would be to compare results to a most simple model (i.e. a classical P-T relationship).

Purpose: I am struggling to understand the purpose of this study. If it is to evaluate the ML models, then more careful model setup and validation is desired. If it is to show predicted change of land use with climate, then importance of features for the prediction and a careful analysis of model behaviors is needed. I am trying to understand what the use case for these models is and what is being learned that can be applied elsewhere.

Specific comments

2.2. Climate data: I would prefer to have the variable tables moved into the main body. This makes it much easier to understand what is being used.

L 88: "Tn10p, Tx10p, Tn90p, Tx90p, WSDI, and CSDI" > need to be defined

L101: Citation software artifact

L110: "they were transformed onto 10 min × 10 min grids through conservative interpolation and then resampled to 50 km × 50 km grids using nearest-neighbour interpolation" > I am a bit confused here, because 10x10 minutes is smaller than 50 x 50 km, so I don't understand how nearest neighbor interpolation makes sense.

L129: "I used the default model parameters for simplicity and to prevent potential overfitting, i.e., training the model too closely to a particular dataset, thereby creating a model that might fail to fit additional data or reliably predict future observations." > This is almost certainly a bad choice and causes an unfair comparison to the CNN.

L 136: (Sato and Ise, 2022): I am still a bit confused about this paper. CNNs are being used to evaluate graphical information. I still don't understand the advantage of coding numeric values in graphical information, which is then turned back into numeric information. How is this advantageous compared to for example a ANN architecture? CNNs are powerful, because photos etc. have very complex abstract features, which have to be learned by the model, but the synthetic images from climate data only have a few bits of information.

L167: "The low test accuracy of the NV model was caused by an overestimation of areas dominated by boreal forest, tropical rainforest, and deciduous broadleaf forest" > generally confusion matrices or Sankey diagrams would help to understand patterns in miss-classification of the models.

L202: "Excluding models trained with the NV algorithm and CEI dataset produced highly coincident PNV distributions under a future 204 climate (accuracy, 51.7%–82.8%, Table 5)." > is 52% agreement really highly coincident. Again, it would be good to actually be presented disagreement/agreement by class rather than maps. That would actually be helpful in understanding these models.

L235: "Adding extreme climate data improved test accuracy rates slightly but it can fatally reduce model robustness, which was defined as the consistency of model prediction under forecast climate conditions" > this is a good point, but also expected since statistical models will have a hard time making predictions/ extrapolations outside the training space. In a warming climate, extreme values will be outside what the models are trained on.

Tables 4+5: I don't understand the meaning of the *. Is this just for emphasis (i.e. considered better based on author's judgement?)

Figures 2-9 should have color bars added.

Citation: https://doi.org/10.5194/bg-2023-106-RC1
- AC1: 'Reply on RC1', Hisashi Sato, 23 Feb 2024
  
  The comment was uploaded in the form of a supplement: https://bg.copernicus.org/preprints/bg-2023-106/bg-2023-106-AC1-supplement.pdf
  
  Citation: https://doi.org/10.5194/bg-2023-106-AC1
RC2:
'Comment on bg-2023-106', Anonymous Referee #3, 16 Feb 2024

Referee report
This manuscript aims to predict the global biome distribution using machine learning method based on climate characteristics and estimates its accuracy. Although CNN is a promising technology for image based vegetation classification, it remains limitedly underutilized for climate envelope modeling. I think it is good to see them being used here.
Unfortunately however, I have some concerns regarding the comparison of model performance. The authors need to clearly state that different type of input data was used to CNN and other models. The authors should also explain how these differences in data sources affected the results of the model performance comparisons.

I have four main concerns:
The input data for training is not clear — The author explains that CNN uses graphical images of climate data as training data. I am not sure what graphical images of climate data meant. Is this RGB transformed climate data? If so, the authors need to clearly explain how they converted the climate data to graphical image data. The input data for RF, SVM, and NV also unclear. Are these models trained by climate data variable itself? If so, the authors also need to clearly state CNN and the other models were trained by different type of data. The authors should provide more detailed explanation about how the models used in this study trained.

One of my main concerns is that the fairness of model performance comparison. My understanding is that in this paper, CNN model was trained by graphical images of climate data while the other models were trained by climate data itself. Therefore, difference of accuracy or robustness among CNN and the other models seen to reflect not only model performance but also input data difference. Since model performance comparison generally aims to evaluate the performance of algorithms, I am not completely sure whether the model performance comparison in this study is truly meaningful or not. First, the author needs to clarify the reason why convert the climate data into graphical image. Second, the authors should also explain how these differences in data sources affected the results of the model performance comparisons.

P5 L129-132: Authors should explain the model setting so that other researchers can check the validity of their methods without checking code. I feel that the descriptions of the settings of machine learning methods other than CNN are insufficient. The authors should clarify more detail about the settings of machine learning methods, such as the information of the parameters or the type of kernel they employed.

P5 L129-132: I also have concern regarding the parameter optimization of ML. In this study, author used ML without optimizing parameters. I would suggest that the author try to optimizing parameters of ML using commonly used method such as grid search.

Minor points
P4 L101: Please check the format of references.

Citation: https://doi.org/10.5194/bg-2023-106-RC2
- AC2: 'Reply on RC2', Hisashi Sato, 15 Mar 2024
  
  Please refer to the attached file.
  
  Citation: https://doi.org/10.5194/bg-2023-106-AC2
EC1:
'Comment on bg-2023-106', Semeena Valiyaveetil Shamsudheen, 28 Mar 2024

The author has attended the comments from both reviewers in a neat and constructive manner. I will recomment for publication after submission of a revised draft incorporating all the comments from reviewers.
Best regards,
Semeena

Citation: https://doi.org/10.5194/bg-2023-106-EC1
- AC3: 'Reply on EC1', Hisashi Sato, 28 Mar 2024
  
  Dear Editor,
  Thank you very much for handling my manuscript.
  
  I am ready to submit a revised manuscript, which incorporates all the comments from reviewers.
  Best,
  
  Hisashi SATO
  
  Citation: https://doi.org/10.5194/bg-2023-106-AC3

Hisashi Sato

Supplement

https://doi.org/10.5194/bg-2023-106-supplement

Hisashi Sato

Viewed

Total article views: 719 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	Supplement	BibTeX	EndNote
479	186	54	719	72	46	67

HTML: 479
PDF: 186
XML: 54
Total: 719
Supplement: 72
BibTeX: 46
EndNote: 67

Views and downloads (calculated since 25 Jan 2024)

Month	HTML	PDF	XML	Total
Jan 2024	76	19	3	98
Feb 2024	94	16	10	120
Mar 2024	72	11	8	91
Apr 2024	41	9	6	56
May 2024	26	11	3	40
Jun 2024	12	6	2	20
Jul 2024	15	7	10	32
Aug 2024	17	1	6	24
Sep 2024	17	10	2	29
Oct 2024	9	2	0	11
Nov 2024	5	2	0	7
Dec 2024	4	3	0	7
Jan 2025	6	5	0	11
Feb 2025	15	6	0	21
Mar 2025	10	10	0	20
Apr 2025	10	22	0	32
May 2025	5	13	4	22
Jun 2025	23	13	0	36
Jul 2025	17	12	0	29
Aug 2025	5	8	0	13

Cumulative views and downloads (calculated since 25 Jan 2024)

Month	HTML	PDF	XML	Total
Jan 2024	76	19	3	98
Feb 2024	94	16	10	120
Mar 2024	72	11	8	91
Apr 2024	41	9	6	56
May 2024	26	11	3	40
Jun 2024	12	6	2	20
Jul 2024	15	7	10	32
Aug 2024	17	1	6	24
Sep 2024	17	10	2	29
Oct 2024	9	2	0	11
Nov 2024	5	2	0	7
Dec 2024	4	3	0	7
Jan 2025	6	5	0	11
Feb 2025	15	6	0	21
Mar 2025	10	10	0	20
Apr 2025	10	22	0	32
May 2025	5	13	4	22
Jun 2025	23	13	0	36
Jul 2025	17	12	0	29
Aug 2025	5	8	0	13

Viewed (geographical distribution)

Total article views: 716 (including HTML, PDF, and XML) Thereof 716 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 14 Aug 2025

Download

This preprint has been withdrawn.

Preprint (2243 KB)
Metadata XML

Short summary

Modelling potential natural biome distribution is one of the most classical issues in biogeoscience. This study shows how accurate models can be constructed without simplifying climate data by employing machine-learning techniques. While extreme climate data enhance predictions, their inclusion can significantly reduce model reliability. With the convolutional neural network algorithm emerging as the preferred choice, this research paves the way for more robust global-scale biome predictions.


Total:	0
HTML:	0
PDF:	0
XML:	0