<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing with OASIS Tables v3.0 20080202//EN" "https://jats.nlm.nih.gov/nlm-dtd/publishing/3.0/journalpub-oasis3.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:oasis="http://docs.oasis-open.org/ns/oasis-exchange/table" xml:lang="en" dtd-version="3.0" article-type="research-article">
  <front>
    <journal-meta><journal-id journal-id-type="publisher">BG</journal-id><journal-title-group>
    <journal-title>Biogeosciences</journal-title>
    <abbrev-journal-title abbrev-type="publisher">BG</abbrev-journal-title><abbrev-journal-title abbrev-type="nlm-ta">Biogeosciences</abbrev-journal-title>
  </journal-title-group><issn pub-type="epub">1726-4189</issn><publisher>
    <publisher-name>Copernicus Publications</publisher-name>
    <publisher-loc>Göttingen, Germany</publisher-loc>
  </publisher></journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.5194/bg-23-2661-2026</article-id><title-group><article-title>A machine learning approach to driver attribution of dissolved organic matter dynamics in two contrasting freshwater systems</article-title><alt-title>A machine learning approach to driver attribution</alt-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author" corresp="yes" rid="aff1">
          <name><surname>Mercado-Bettín</surname><given-names>Daniel</given-names></name>
          <email>daniel.mercado@ceab.csic.es</email>
        <ext-link>https://orcid.org/0000-0003-4572-3029</ext-link></contrib>
        <contrib contrib-type="author" corresp="no" rid="aff2 aff3">
          <name><surname>Paíz</surname><given-names>Ricardo</given-names></name>
          
        <ext-link>https://orcid.org/0009-0001-4507-0139</ext-link></contrib>
        <contrib contrib-type="author" corresp="no" rid="aff2">
          <name><surname>McCarthy</surname><given-names>Valerie</given-names></name>
          
        </contrib>
        <contrib contrib-type="author" corresp="no" rid="aff3">
          <name><surname>Jennings</surname><given-names>Eleanor</given-names></name>
          
        </contrib>
        <contrib contrib-type="author" corresp="no" rid="aff4">
          <name><surname>de Eyto</surname><given-names>Elvira</given-names></name>
          
        <ext-link>https://orcid.org/0000-0003-2281-2491</ext-link></contrib>
        <contrib contrib-type="author" corresp="no" rid="aff5">
          <name><surname>Gallegos</surname><given-names>Angeles M.</given-names></name>
          
        </contrib>
        <contrib contrib-type="author" corresp="no" rid="aff4">
          <name><surname>Dillane</surname><given-names>Mary</given-names></name>
          
        </contrib>
        <contrib contrib-type="author" corresp="no" rid="aff5">
          <name><surname>Garcia</surname><given-names>Juan C.</given-names></name>
          
        </contrib>
        <contrib contrib-type="author" corresp="no" rid="aff5">
          <name><surname>Rodríguez</surname><given-names>José J.</given-names></name>
          
        </contrib>
        <contrib contrib-type="author" corresp="no" rid="aff1">
          <name><surname>Marcé</surname><given-names>Rafael</given-names></name>
          
        <ext-link>https://orcid.org/0000-0002-7416-4652</ext-link></contrib>
        <aff id="aff1"><label>1</label><institution>Centre for Advanced Studies of Blanes, Spanish National Research Council,  Carrer Accés Cala Sant Francesc, 14, 17300, Blanes, Spain</institution>
        </aff>
        <aff id="aff2"><label>2</label><institution>School of History and Geography, Dublin City University, D09 YT18, Dublin 9, Ireland</institution>
        </aff>
        <aff id="aff3"><label>3</label><institution>Centre for Freshwater and Environmental Studies, Dundalk Institute of Technology, A91 K584 Dundalk, Co. Louth, Ireland</institution>
        </aff>
        <aff id="aff4"><label>4</label><institution>Fisheries &amp; Ecosystem Advisory Services, Marine Institute, F28 PF65 Newport, Co. Mayo, Ireland</institution>
        </aff>
        <aff id="aff5"><label>5</label><institution>Ens d’Abastament d’Aigua Ter-Llobregat, Ctra. Aigües, 6, 08440, Cardedeu, Spain</institution>
        </aff>
      </contrib-group>
      <author-notes><corresp id="corr1">Daniel Mercado-Bettín (daniel.mercado@ceab.csic.es)</corresp></author-notes><pub-date><day>20</day><month>April</month><year>2026</year></pub-date>
      
      <volume>23</volume>
      <issue>8</issue>
      <fpage>2661</fpage><lpage>2685</lpage>
      <history>
        <date date-type="received"><day>19</day><month>August</month><year>2025</year></date>
           <date date-type="rev-request"><day>1</day><month>September</month><year>2025</year></date>
           <date date-type="rev-recd"><day>10</day><month>March</month><year>2026</year></date>
           <date date-type="accepted"><day>31</day><month>March</month><year>2026</year></date>
      </history>
      <permissions>
        <copyright-statement>Copyright: © 2026 Daniel Mercado-Bettín et al.</copyright-statement>
        <copyright-year>2026</copyright-year>
      <license license-type="open-access"><license-p>This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this licence, visit <ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by/4.0/">https://creativecommons.org/licenses/by/4.0/</ext-link></license-p></license></permissions><self-uri xlink:href="https://bg.copernicus.org/articles/23/2661/2026/bg-23-2661-2026.html">This article is available from https://bg.copernicus.org/articles/23/2661/2026/bg-23-2661-2026.html</self-uri><self-uri xlink:href="https://bg.copernicus.org/articles/23/2661/2026/bg-23-2661-2026.pdf">The full text article is available as a PDF file from https://bg.copernicus.org/articles/23/2661/2026/bg-23-2661-2026.pdf</self-uri>
      <abstract><title>Abstract</title>

      <p id="d2e193">Predicting water quality variables in lakes is critical for effective ecosystem management under climatic and human pressures. Dissolved organic matter (DOM) serves as an energy source for aquatic ecosystems and plays a key role in their biogeochemical cycles. However, predicting DOM is challenging due to complex interactions between multiple potential drivers in the aquatic environment and its surrounding terrestrial landscape. This study establishes an open and scalable workflow to identify potential drivers and predict fluorescent DOM (fDOM) in the surface layer of lakes by exploring the use of supervised machine learning models, including random forest, boosting methods, <inline-formula><mml:math id="M1" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-nearest neighbors, support vector regression and linear model. The workflow was validated in two contrasting systems: one natural lake in Ireland with a relatively undisturbed catchment, and one reservoir in Spain with a more human-influenced catchment. A total of 24 potential drivers were obtained from global reanalysis data, and lake and river process-based modelling. SHapley Additive exPlanations (SHAP) were conducted for the most influential drivers identified, with soil moisture, soil temperature, and Julian day being common to both study sites. The best prediction was obtained when using the CatBoost model (during hold-out testing period, Irish site: KGE <inline-formula><mml:math id="M2" display="inline"><mml:mo>&gt;</mml:mo></mml:math></inline-formula> 0.68, <inline-formula><mml:math id="M3" display="inline"><mml:mrow><mml:msup><mml:mi>r</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup><mml:mo>&gt;</mml:mo><mml:mn mathvariant="normal">0.50</mml:mn></mml:mrow></mml:math></inline-formula>; Spanish site: KGE <inline-formula><mml:math id="M4" display="inline"><mml:mo>&gt;</mml:mo></mml:math></inline-formula> 0.66, <inline-formula><mml:math id="M5" display="inline"><mml:mrow><mml:msup><mml:mi>r</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup><mml:mo>&gt;</mml:mo><mml:mn mathvariant="normal">0.54</mml:mn></mml:mrow></mml:math></inline-formula>). Interestingly, when only using drivers from globally accessible climate and soil reanalysis data plus Julian day, the prediction capacity was maintained at both sites, showcasing potential for scalability. Our findings highlight the complex interplay of environmental drivers and processes that govern DOM dynamics in lakes, and contribute to the modelling of carbon cycling in aquatic ecosystems.</p>
  </abstract>
    
<funding-group>
<award-group id="gs1">
<funding-source>HORIZON EUROPE Food, Bioeconomy, Natural Resources, Agriculture and Environment</funding-source>
<award-id>101081728</award-id>
</award-group>
</funding-group>
</article-meta>
  </front>
<body>
      

<sec id="Ch1.S1" sec-type="intro">
  <label>1</label><title>Introduction</title>
      <p id="d2e256">Lakes are an essential component of global biogeochemical cycles, sustain biodiversity, and provide critical ecosystem services, e.g., water supply, fishing and irrigation. However, their water quality is increasingly at risk due to climatic change and human pressures <xref ref-type="bibr" rid="bib1.bibx2" id="paren.1"/>. A key water quality variable is dissolved organic matter (DOM), which influences light penetration, energy, oxygen dynamics and nutrient availability in any lake <xref ref-type="bibr" rid="bib1.bibx47" id="paren.2"/>. The dynamics of DOM in lakes are driven by both external processes in the terrestrial environment and internal processes. Land cover, climate, and topography regulate (either increase or decrease) carbon production in the catchment and carbon inputs into the lake <xref ref-type="bibr" rid="bib1.bibx28" id="paren.3"/>. In the water body, the quantity and quality of DOM are controlled by physical and biogeochemical mechanisms such as photodegradation, microbial processing and mixing dynamics, but can also be impacted by water abstraction or dam regulation <xref ref-type="bibr" rid="bib1.bibx53" id="paren.4"/>.</p>
      <p id="d2e271">Increases in the concentrations of DOM can affect ecosystem stability and human water use (e.g., raw drinking water quality) by reducing oxygen levels, altering microbial communities and nutrient cycling <xref ref-type="bibr" rid="bib1.bibx25" id="paren.5"/>. DOM is also a precursor to disinfection byproducts (DBPs) during drinking water treatment, substances which have negative human health implications <xref ref-type="bibr" rid="bib1.bibx26" id="paren.6"/>. Understanding the dynamics of DOM in lakes is essential for water quality management, especially as climate-driven processes are expected to increasingly influence DOM in freshwater systems <xref ref-type="bibr" rid="bib1.bibx8" id="paren.7"/>. Moreover, the occurrence of extreme events such as eutrophication, algal blooms and hypoxic events, for which levels of DOM play a key role, is also expected to increase <xref ref-type="bibr" rid="bib1.bibx15" id="paren.8"/>. Hence, predicting DOM in lake water can improve water quality mitigation protocols and support adaptive water use management strategies.</p>
      <p id="d2e287">Predicting DOM dynamics remains a challenge as it results from complex interactions in the environment, including multiple biogeochemical processes <xref ref-type="bibr" rid="bib1.bibx52" id="paren.9"/>. Modelling tools offer an approach to simulate DOM in lake water. Process-based models have traditionally been used to better understand lake water quality, including DOM dynamics <xref ref-type="bibr" rid="bib1.bibx31" id="paren.10"/>. However, they generally require a large number of model parameters and governing equations, i.e., extensive parameterisation, to represent these dynamics. On the other hand, machine learning (ML) models do not rely on parameter calibration but instead incorporate large amounts of driver variables and data. This functionality can leverage the increasing amount of data being collected through satellite imagery, high frequency monitoring, and global climate and environmental modelling initiatives <xref ref-type="bibr" rid="bib1.bibx36 bib1.bibx49 bib1.bibx1" id="paren.11"/>.</p>
      <p id="d2e299">ML models have emerged as potential tools for modelling complex environmental variables, including those related to hydrology <xref ref-type="bibr" rid="bib1.bibx37" id="paren.12"/> and water quality <xref ref-type="bibr" rid="bib1.bibx16" id="paren.13"/>. They have been recently employed in environmental applications, showing good predictive capabilities due to their ability to handle high-dimensional data, and capture nonlinear relationships <xref ref-type="bibr" rid="bib1.bibx27" id="paren.14"/>, for a diversity of parameters in lakes such as chlorophyll-<inline-formula><mml:math id="M6" display="inline"><mml:mi>a</mml:mi></mml:math></inline-formula> <xref ref-type="bibr" rid="bib1.bibx5" id="paren.15"/>, turbidity <xref ref-type="bibr" rid="bib1.bibx55" id="paren.16"/>, and nutrient concentrations (e.g., phosphorus) <xref ref-type="bibr" rid="bib1.bibx16" id="paren.17"/>, suggesting potential for predicting DOM <xref ref-type="bibr" rid="bib1.bibx19" id="paren.18"/>.</p>
      <p id="d2e332">This study introduces a workflow for predicting fluorescent DOM (fDOM) (a proxy for DOM) in lakes using a suite of supervised ML models driven by potential drivers either extracted from reanalysis data (climate and soil variables) or outputs from lake and catchment process-based models. The workflow was tested in two different study sites, one in Ireland and one in Spain, that represent contrasting settings for both the potential drivers and DOM dynamics. Model performance was first evaluated at each site using the most influential drivers to predict fDOM. Subsequently, a second simulation was performed using a subset of these drivers, limited to those sourced from reanalysis data, to evaluate the predictive capacity of the model in the context of higher scalability. A primary intention of the study is to use ML models to produce a robust, accessible, and reproducible workflow, rather than to conduct an extensive model benchmark comparison. The aims of this study are: (1) to find the most influential drivers (features) in predicting fDOM, and analyse their importance in two contrasting sites; (2) to test how accurately can supervised ML models predict lake fDOM driven by reanalysis-based data, and hydrologic and lake modeling outputs.</p>
</sec>
<sec id="Ch1.S2">
  <label>2</label><title>Materials and methods</title>
<sec id="Ch1.S2.SS1">
  <label>2.1</label><title>Study sites</title>
      <p id="d2e350">Lough Feeagh and Sau Reservoir are located in western Ireland (53°56<sup>′</sup> N, 9°35<sup>′</sup> W) and northeastern Spain (41°58<sup>′</sup> N, 2°23<sup>′</sup> E), respectively (Fig. 1). The study sites have contrasting attributes. Feeagh (depth of 46.8 m and area of 3.95 km<sup>2</sup>) is a monomictic and oligotrophic lake surrounded by a relatively undisturbed landscape, while Sau (depth of 70 m and area of 5.8 km<sup>2</sup>) is an eutrophic system subjected to human activities and water abstraction. Feeagh has two primary inflows, the Black and the Glenamong rivers, while Sau has one, the Ter river. The catchment of Feeagh is relatively small (84 km<sup>2</sup>), with mid-range hills, and dominated by peatland. The catchment of Sau, in contrast, is larger (1525 km<sup>2</sup>), with a varying topography and land uses (Figs. 1, A1).</p>

      <fig id="F1" specific-use="star"><label>Figure 1</label><caption><p id="d2e428">Two contrasting freshwater ecosystems: Lough Feeagh (Ireland) and Sau Reservoir (Spain). The figure shows their locations and catchment land cover. Lough Feeagh is a humic oligotrophic lake with a peatland-dominated catchment and temperate oceanic climate, whereas Sau is a eutrophic reservoir with a human-influenced catchment and Mediterranean climate. The CORINE Land Cover 2018 <ext-link xlink:href="https://doi.org/10.2909/960998c1-1870-4e82-8051-6485205ebbac">https://doi.org/10.2909/960998c1-1870-4e82-8051-6485205ebbac</ext-link> <xref ref-type="bibr" rid="bib1.bibx13" id="paren.19"/> classes were grouped into the categories shown for visual clarity, following the aggregation approach described in the Appendix (Fig. A2).</p></caption>
          <graphic xlink:href="https://bg.copernicus.org/articles/23/2661/2026/bg-23-2661-2026-f01.png"/>

        </fig>

      <p id="d2e443">The dynamics of DOM in both study sites have been previously explored in <xref ref-type="bibr" rid="bib1.bibx46" id="text.20"/> which identified that natural dynamics related to soil temperature, river discharge and drought were important drivers in Feeagh, and in <xref ref-type="bibr" rid="bib1.bibx30" id="text.21"/>, which showed human activities (e.g., wastewater effluents and agricultural runoff) were important for Sau, in addition to its environmental dynamics. Catchment hydrology is key for carbon transport into both study sites, and contributes to a distinct seasonality related to climate. Feeagh has a wet temperate climate, with cooler air temperatures and higher rainfall levels that occur on more than 75 % of days in the year. The variability of carbon inputs reflects the sensitivity of a peatland-dominated landscape, which exacerbates climate-induced carbon release from the catchment into the lake. In contrast, Sau has a Mediterranean climate, characterised by hot, dry summers and mild, wet winters, dictating  water availability, thermal stratification, and organic matter fluxes in the reservoir.</p>
</sec>
<sec id="Ch1.S2.SS2">
  <label>2.2</label><title>Prediction workflow</title>
      <p id="d2e460">A five-step workflow was implemented to predict fDOM at each site (Fig. 2). First, all potential drivers for fDOM were collected as input data for each site. Second, an exploratory data analysis of fDOM was implemented and the ML models were trained using 85 % of the available time series data (1978 out of 2328 fDOM measurements for Feeagh, and 653 out of 769 for Sau), while the remaining 15 % (350 out of 2328 for Feeagh, and 116 out of 769 for Sau) future time series (hold-out period) was reserved for independent testing to evaluate performance and potential overfitting by comparing with the training period.</p>
      <p id="d2e463">Third, a set of drivers was selected for each site based on the variable importance extracted from the ML models, retaining only those drivers that exceeded an importance threshold of 5 %. SHAP and partial dependence analyses were applied to these specific drivers to evaluate how fDOM predictions at each site varied as a function of individually changing the selected input variables. Fourth, we ran simulations using (i) the drivers with a higher variable importance (<inline-formula><mml:math id="M15" display="inline"><mml:mo lspace="0mm">&gt;</mml:mo></mml:math></inline-formula> 5 %), and (ii) using only drivers extracted from globally accessible reanalysis data, and assessed model.</p>
      <p id="d2e473">The same workflow was applied to both sites using the same data sources, allowing for comparison. Following the FAIR principles, all data and workflow scripts are available and fully reproducible here: <ext-link xlink:href="https://doi.org/10.5281/zenodo.19354955" ext-link-type="DOI">10.5281/zenodo.19354955</ext-link> <xref ref-type="bibr" rid="bib1.bibx32" id="paren.22"/>. Large language models were used in this study to optimise the codes, improve the final plots and, for basic proofreading of the text.</p>

      <fig id="F2" specific-use="star"><label>Figure 2</label><caption><p id="d2e485">The workflow includes five steps: (1) compilation of input variables including meteorology (yellow) and soil (green) reanalysis data from ERA5 (the fifth-generation atmospheric reanalysis product produced by ECMWF), The Generalised Watershed Loading Functions Model (GWLF) outputs (light blue), The General Lake Model (GLM) outputs (blue), and Julian day (light magenta); (2) data inspection and model training using a split between training (85 %) and testing (15 %) periods; (3) selection of influential drivers based on feature-importance metrics; (4) model simulations using selected drivers and a reduced subset based on globally available reanalysis variables and Julian day; and (5) comparison of model outputs. DOC stands for Dissolved Organic Carbon. The workflow is available at: <ext-link xlink:href="https://doi.org/10.5281/zenodo.19354955">https://doi.org/10.5281/zenodo.19354955</ext-link> <xref ref-type="bibr" rid="bib1.bibx32" id="paren.23"/>.</p></caption>
          <graphic xlink:href="https://bg.copernicus.org/articles/23/2661/2026/bg-23-2661-2026-f02.png"/>

        </fig>

</sec>
<sec id="Ch1.S2.SS3">
  <label>2.3</label><title>Data</title>
<sec id="Ch1.S2.SS3.SSS1">
  <label>2.3.1</label><title>Target variable (fDOM)</title>
      <p id="d2e515">Daily surface fDOM values were obtained from high-frequency data for both sites (2 min-resolution data averaged to obtain daily values before analysis). For Feeagh, data were measured at 0.9 m depth, and for Sau, an average value was calculated between the measurements for depths 0–5 m. fDOM data were expressed as quinine sulfate units (QSU) for the analysis. In Feeagh, the data spanned from 1 May 2012 to 19 November 2019 (<inline-formula><mml:math id="M16" display="inline"><mml:mrow><mml:mi>n</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">2328</mml:mn></mml:mrow></mml:math></inline-formula>), and for Sau from 4 February 2017 to 2 March 2020 (<inline-formula><mml:math id="M17" display="inline"><mml:mrow><mml:mi>n</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">769</mml:mn></mml:mrow></mml:math></inline-formula>), with some gaps. All the other data (i.e., driver data) used in the workflow of this study were constrained by the availability of fDOM data. Following exploratory data analysis (Fig. A3), no transformation was applied to the fDOM data prior to training the ML models in the main manuscript, results for Sau are provided with a log<sub>10</sub> transformation in the Appendix for comparison (this is revisited in Results), given the moderate skewness of fDOM at this study site.</p>
      <p id="d2e551">In Feeagh, fDOM data were collected using a Seapoint UV fluorometer sensor (Seapoint Sensors Inc., Exeter, NH, USA) and water temperature data were measured using a Hach Environmental Hydrolab Data Sonde X5 (UK OTT Hydrometry Ltd). In Sau, fDOM and water temperature data were collected using a fDOM Digital Smart Sensor and Multiparameter Sonde (YSI EXO sonde, Yellow Springs, OH, USA), respectively. Raw fDOM data were water temperature-corrected in both sites based on relationships established for each sensor <xref ref-type="bibr" rid="bib1.bibx45" id="paren.24"/>. Details about the fDOM corrections can be found in the Appendix (Figs. A4, A5 and A6).</p>
</sec>
<sec id="Ch1.S2.SS3.SSS2">
  <label>2.3.2</label><title>Driver data</title>
      <p id="d2e565">All input data variables, including their respective units and source, are displayed in Step 1 of Fig. 2. These were grouped into five categories: (1) meteorology, (2) soil, (3) process-based hydrological modelling, (4) process-based lake modelling and (5) Julian day. All input data variables, including their respective units and source, are displayed in Step 1 of Fig. 2. Daily values of meteorology and soil variables were extracted from the European Centre for Medium-Range Weather Forecasts (ECMWF) Reanalysis v5 (ERA5) dataset <xref ref-type="bibr" rid="bib1.bibx18" id="paren.25"/>. This gridded dataset provides (pseudo) observations at a global scale with a spatial resolution of 0.25°. Eight meteorological variables (traditionally employed in water modelling studies) and soil temperature and soil moisture data at four depths (0–7, 7–28, 28–100, and 100–255 cm)  were extracted for the grid-cell which contained each water body.</p>
      <p id="d2e571">Daily values of inflow discharge and inflow DOC concentration into each site were generated using the Generalised Watershed Loading Functions Model (GWLF) coupled with a DOC module (GWLF-DOC). The GWLF-DOC version and calibration strategy applied are described in <xref ref-type="bibr" rid="bib1.bibx38" id="text.26"/>. Calibration results can be found in the Appendix. Daily values of five key lake variables (see Step 1, Fig. 2) were obtained from the General Lake Model (GLM) <xref ref-type="bibr" rid="bib1.bibx20" id="paren.27"/> run for both sites. Calibration strategies applied are described in <xref ref-type="bibr" rid="bib1.bibx33 bib1.bibx39" id="text.28"/>. Calibration results can be found in the Appendix. In addition, the cosine (to avoid an abrupt numerical change at every start of a year) of Julian day was included in the driver data inputs, given that seasonality is expected to influence DOM predictions; the approximate calendar timing corresponds to the cosine of the Julian day: values close to 1 correspond to boreal winter, 0 to spring/autumn, and <inline-formula><mml:math id="M19" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula> to boreal summer.</p>
</sec>
</sec>
<sec id="Ch1.S2.SS4">
  <label>2.4</label><title>Supervised Machine Learning</title>
      <p id="d2e602">Supervised ML models have advantages and limitations for time series prediction. In addition to capturing non-linear relationships typical in aquatic systems and water quality predictions <xref ref-type="bibr" rid="bib1.bibx21 bib1.bibx42" id="paren.29"/>, they provide flexibility to assess multiple drivers, temporal indicators, and variables external to the system <xref ref-type="bibr" rid="bib1.bibx41 bib1.bibx43" id="paren.30"/>. ML models do not require a fixed set of drivers to predict fDOM effectively, unlike process-based models, which typically rely on predefined inputs. Additionally, there is no need for parameter calibration but hyperparameter tuning, simplifying the modelling process. While some may argue that the lack of parameterization suggests a “black box” approach, supervised ML can provide insights into the potential drivers for predicting a target variable <xref ref-type="bibr" rid="bib1.bibx3 bib1.bibx35" id="paren.31"/>.</p>
      <p id="d2e614">However, due to the intrinsic autocorrelation in time series, e.g., when predicting DOM, these models tend to overfit when using out-of-bag samples during training. To overcome this issue, robust validation and training are required. Here, we used a hold-out period for validation during testing at both study sites. Prior to selecting this method, we compared it with two alternative validation methods using random forest: (1) 5-fold cross-validation and (2) rolling window cross-validation with a two-year training period, a one-year testing period, and a window shift every 90 days (see Fig. A7).</p>
      <p id="d2e617">Supervised machine-learning models were applied to predict fDOM dynamics, including Random Forest (RF) <xref ref-type="bibr" rid="bib1.bibx4" id="paren.32"/>, eXtreme Gradient Boosting (XGBoost) <xref ref-type="bibr" rid="bib1.bibx6" id="paren.33"/>, Light Gradient Boosting (LGB) <xref ref-type="bibr" rid="bib1.bibx24" id="paren.34"/>, CatBoost (CTB) <xref ref-type="bibr" rid="bib1.bibx40" id="paren.35"/>, <inline-formula><mml:math id="M20" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-Nearest Neighbors (KNN) <xref ref-type="bibr" rid="bib1.bibx14" id="paren.36"/>, Support Vector Regression (SVR) <xref ref-type="bibr" rid="bib1.bibx7" id="paren.37"/> and linear model. The three gradient-boosting frameworks were included to assess the robustness of results across closely related implementations. Given their similarities, the main results shown are focused on the highest-performing model among the three. Neural networks were not included because they typically require more complex architecture design and calibration, which can increase methodological complexity and may reduce reproducibility of the workflow. In contrast, the selected models allow more reproducible implementation due to their comparatively simpler hyperparameter tuning.</p>
<sec id="Ch1.S2.SS4.SSS1">
  <label>2.4.1</label><title>Hyperparameter tuning</title>
      <p id="d2e653">Hyperparameter tuning was implemented in all ML models to improve accuracy and generalization, and reduce the risk of overfitting. It was conducted in R using the caret framework where possible: for RF <inline-formula><mml:math id="M21" display="inline"><mml:mrow><mml:mi>m</mml:mi><mml:mi>t</mml:mi><mml:mi>r</mml:mi><mml:mi>y</mml:mi><mml:mo>∈</mml:mo><mml:mo mathvariant="italic">{</mml:mo><mml:mn mathvariant="normal">2</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">4</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">6</mml:mn><mml:mo mathvariant="italic">}</mml:mo></mml:mrow></mml:math></inline-formula>; for XGBoost <inline-formula><mml:math id="M22" display="inline"><mml:mrow><mml:mi>n</mml:mi><mml:mi>r</mml:mi><mml:mi>o</mml:mi><mml:mi>u</mml:mi><mml:mi>n</mml:mi><mml:mi>d</mml:mi><mml:mi>s</mml:mi><mml:mo>∈</mml:mo><mml:mo mathvariant="italic">{</mml:mo><mml:mn mathvariant="normal">1000</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">2000</mml:mn><mml:mo mathvariant="italic">}</mml:mo></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M23" display="inline"><mml:mrow><mml:mi>m</mml:mi><mml:mi>a</mml:mi><mml:mi>x</mml:mi><mml:mi mathvariant="italic">_</mml:mi><mml:mi>d</mml:mi><mml:mi>e</mml:mi><mml:mi>p</mml:mi><mml:mi>t</mml:mi><mml:mi>h</mml:mi><mml:mo>∈</mml:mo><mml:mo mathvariant="italic">{</mml:mo><mml:mn mathvariant="normal">3</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">6</mml:mn><mml:mo mathvariant="italic">}</mml:mo></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M24" display="inline"><mml:mrow><mml:mi>e</mml:mi><mml:mi>t</mml:mi><mml:mi>a</mml:mi><mml:mo>∈</mml:mo><mml:mo mathvariant="italic">{</mml:mo><mml:mn mathvariant="normal">0.01</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">0.05</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">0.1</mml:mn><mml:mo mathvariant="italic">}</mml:mo></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M25" display="inline"><mml:mrow><mml:mi>g</mml:mi><mml:mi>a</mml:mi><mml:mi>m</mml:mi><mml:mi>m</mml:mi><mml:mi>a</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">0</mml:mn></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M26" display="inline"><mml:mrow><mml:mi>c</mml:mi><mml:mi>o</mml:mi><mml:mi>l</mml:mi><mml:mi>s</mml:mi><mml:mi>a</mml:mi><mml:mi>m</mml:mi><mml:mi>p</mml:mi><mml:mi>l</mml:mi><mml:mi>e</mml:mi><mml:mi mathvariant="italic">_</mml:mi><mml:mi>b</mml:mi><mml:mi>y</mml:mi><mml:mi>t</mml:mi><mml:mi>r</mml:mi><mml:mi>e</mml:mi><mml:mi>e</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M27" display="inline"><mml:mrow><mml:mi>m</mml:mi><mml:mi>i</mml:mi><mml:mi>n</mml:mi><mml:mi mathvariant="italic">_</mml:mi><mml:mi>c</mml:mi><mml:mi>h</mml:mi><mml:mi>i</mml:mi><mml:mi>l</mml:mi><mml:mi>d</mml:mi><mml:mi mathvariant="italic">_</mml:mi><mml:mi>w</mml:mi><mml:mi>e</mml:mi><mml:mi>i</mml:mi><mml:mi>g</mml:mi><mml:mi>h</mml:mi><mml:mi>t</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula>, and <inline-formula><mml:math id="M28" display="inline"><mml:mrow><mml:mi>s</mml:mi><mml:mi>u</mml:mi><mml:mi>b</mml:mi><mml:mi>s</mml:mi><mml:mi>a</mml:mi><mml:mi>m</mml:mi><mml:mi>p</mml:mi><mml:mi>l</mml:mi><mml:mi>e</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula>; for kNN <inline-formula><mml:math id="M29" display="inline"><mml:mrow><mml:mi>k</mml:mi><mml:mo>∈</mml:mo><mml:mo mathvariant="italic">{</mml:mo><mml:mn mathvariant="normal">3</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">5</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">7</mml:mn><mml:mo mathvariant="italic">}</mml:mo></mml:mrow></mml:math></inline-formula>; for SVR <inline-formula><mml:math id="M30" display="inline"><mml:mrow><mml:mi>C</mml:mi><mml:mo>∈</mml:mo><mml:mo mathvariant="italic">{</mml:mo><mml:mn mathvariant="normal">0.1</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">10</mml:mn><mml:mo mathvariant="italic">}</mml:mo></mml:mrow></mml:math></inline-formula>, and <inline-formula><mml:math id="M31" display="inline"><mml:mrow><mml:mi>s</mml:mi><mml:mi>i</mml:mi><mml:mi>g</mml:mi><mml:mi>m</mml:mi><mml:mi>a</mml:mi><mml:mo>∈</mml:mo><mml:mo mathvariant="italic">{</mml:mo><mml:mn mathvariant="normal">0.01</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">0.1</mml:mn><mml:mo mathvariant="italic">}</mml:mo></mml:mrow></mml:math></inline-formula>. Additional model-specific implementations were made: for LGB <inline-formula><mml:math id="M32" display="inline"><mml:mrow><mml:mi>l</mml:mi><mml:mi>e</mml:mi><mml:mi>a</mml:mi><mml:mi>r</mml:mi><mml:mi>n</mml:mi><mml:mi>i</mml:mi><mml:mi>n</mml:mi><mml:mi>g</mml:mi><mml:mi mathvariant="italic">_</mml:mi><mml:mi>r</mml:mi><mml:mi>a</mml:mi><mml:mi>t</mml:mi><mml:mi>e</mml:mi><mml:mo>∈</mml:mo><mml:mo mathvariant="italic">{</mml:mo><mml:mn mathvariant="normal">0.01</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">0.05</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">0.1</mml:mn><mml:mo mathvariant="italic">}</mml:mo></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M33" display="inline"><mml:mrow><mml:mi>n</mml:mi><mml:mi>u</mml:mi><mml:mi>m</mml:mi><mml:mi mathvariant="italic">_</mml:mi><mml:mi>l</mml:mi><mml:mi>e</mml:mi><mml:mi>a</mml:mi><mml:mi>v</mml:mi><mml:mi>e</mml:mi><mml:mi>s</mml:mi><mml:mo>∈</mml:mo><mml:mo mathvariant="italic">{</mml:mo><mml:mn mathvariant="normal">15</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">31</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">63</mml:mn><mml:mo mathvariant="italic">}</mml:mo></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M34" display="inline"><mml:mrow><mml:mi>m</mml:mi><mml:mi>a</mml:mi><mml:mi>x</mml:mi><mml:mi mathvariant="italic">_</mml:mi><mml:mi>d</mml:mi><mml:mi>e</mml:mi><mml:mi>p</mml:mi><mml:mi>t</mml:mi><mml:mi>h</mml:mi><mml:mo>∈</mml:mo><mml:mo mathvariant="italic">{</mml:mo><mml:mn mathvariant="normal">5</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">10</mml:mn><mml:mo>,</mml:mo><mml:mi mathvariant="normal">−</mml:mi><mml:mn mathvariant="normal">1</mml:mn><mml:mo mathvariant="italic">}</mml:mo></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M35" display="inline"><mml:mrow><mml:mi>f</mml:mi><mml:mi>e</mml:mi><mml:mi>a</mml:mi><mml:mi>t</mml:mi><mml:mi>u</mml:mi><mml:mi>r</mml:mi><mml:mi>e</mml:mi><mml:mi mathvariant="italic">_</mml:mi><mml:mi>f</mml:mi><mml:mi>r</mml:mi><mml:mi>a</mml:mi><mml:mi>c</mml:mi><mml:mi>t</mml:mi><mml:mi>i</mml:mi><mml:mi>o</mml:mi><mml:mi>n</mml:mi><mml:mo>∈</mml:mo><mml:mo mathvariant="italic">{</mml:mo><mml:mn mathvariant="normal">0.8</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo mathvariant="italic">}</mml:mo></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M36" display="inline"><mml:mrow><mml:mi>b</mml:mi><mml:mi>a</mml:mi><mml:mi>g</mml:mi><mml:mi>g</mml:mi><mml:mi>i</mml:mi><mml:mi>n</mml:mi><mml:mi>g</mml:mi><mml:mi mathvariant="italic">_</mml:mi><mml:mi>f</mml:mi><mml:mi>r</mml:mi><mml:mi>a</mml:mi><mml:mi>c</mml:mi><mml:mi>t</mml:mi><mml:mi>i</mml:mi><mml:mi>o</mml:mi><mml:mi>n</mml:mi><mml:mo>∈</mml:mo><mml:mo mathvariant="italic">{</mml:mo><mml:mn mathvariant="normal">0.8</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo mathvariant="italic">}</mml:mo></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M37" display="inline"><mml:mrow><mml:mi>n</mml:mi><mml:mi>r</mml:mi><mml:mi>o</mml:mi><mml:mi>u</mml:mi><mml:mi>n</mml:mi><mml:mi>d</mml:mi><mml:mi>s</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">2000</mml:mn></mml:mrow></mml:math></inline-formula> (fixed); and for CTB <inline-formula><mml:math id="M38" display="inline"><mml:mrow><mml:mi>i</mml:mi><mml:mi>t</mml:mi><mml:mi>e</mml:mi><mml:mi>r</mml:mi><mml:mi>a</mml:mi><mml:mi>t</mml:mi><mml:mi>i</mml:mi><mml:mi>o</mml:mi><mml:mi>n</mml:mi><mml:mi>s</mml:mi><mml:mo>∈</mml:mo><mml:mo mathvariant="italic">{</mml:mo><mml:mn mathvariant="normal">300</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">500</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">1000</mml:mn><mml:mo mathvariant="italic">}</mml:mo></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M39" display="inline"><mml:mrow><mml:mi>d</mml:mi><mml:mi>e</mml:mi><mml:mi>p</mml:mi><mml:mi>t</mml:mi><mml:mi>h</mml:mi><mml:mo>∈</mml:mo><mml:mo mathvariant="italic">{</mml:mo><mml:mn mathvariant="normal">4</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">6</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">8</mml:mn><mml:mo mathvariant="italic">}</mml:mo></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M40" display="inline"><mml:mrow><mml:mi>l</mml:mi><mml:mi>e</mml:mi><mml:mi>a</mml:mi><mml:mi>r</mml:mi><mml:mi>n</mml:mi><mml:mi>i</mml:mi><mml:mi>n</mml:mi><mml:mi>g</mml:mi><mml:mi mathvariant="italic">_</mml:mi><mml:mi>r</mml:mi><mml:mi>a</mml:mi><mml:mi>t</mml:mi><mml:mi>e</mml:mi><mml:mo>∈</mml:mo><mml:mo mathvariant="italic">{</mml:mo><mml:mn mathvariant="normal">0.01</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">0.05</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">0.1</mml:mn><mml:mo mathvariant="italic">}</mml:mo></mml:mrow></mml:math></inline-formula>, and <inline-formula><mml:math id="M41" display="inline"><mml:mrow><mml:mi>l</mml:mi><mml:mn mathvariant="normal">2</mml:mn><mml:mi mathvariant="italic">_</mml:mi><mml:mi>l</mml:mi><mml:mi>e</mml:mi><mml:mi>a</mml:mi><mml:mi>f</mml:mi><mml:mi mathvariant="italic">_</mml:mi><mml:mi>r</mml:mi><mml:mi>e</mml:mi><mml:mi>g</mml:mi><mml:mo>∈</mml:mo><mml:mo mathvariant="italic">{</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">3</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">5</mml:mn><mml:mo mathvariant="italic">}</mml:mo></mml:mrow></mml:math></inline-formula>. Models were trained on the training subset and evaluated on the hold-out test set using RMSE. The parameter configuration minimising test RMSE was selected. The hyperparameter tuning can be reproduced by using the code “2_hyperparameter_tuning.R” in the repository.</p>
      <p id="d2e1429">For each case study, the dataset was chronologically split into training (85 %) and hold-out test (15 %) subsets. The final 15 % of observations were reserved as an independent test set to evaluate predictive performance, mimicking a forecasting exercise. Hyperparameter tuning was conducted exclusively on the training subset. Within the training data, model selection was performed using 5-fold cross-validation. The data were randomly partitioned into five equally sized folds; four folds were used for model fitting and one for validation, iteratively. The average cross-validated performance was used to select the optimal hyperparameter configuration.</p>
</sec>
<sec id="Ch1.S2.SS4.SSS2">
  <label>2.4.2</label><title>Performance metrics</title>
      <p id="d2e1441">To evaluate and facilitate inter-model comparison, three complementary performance metrics were computed for both training and test datasets: Coefficient of determination (<inline-formula><mml:math id="M42" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula>), Root Mean Square Error (RMSE), and Kling–Gupta Efficiency (KGE). These metrics were selected because they capture complementary aspects of model performance such as correlation strength (<inline-formula><mml:math id="M43" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula>), error magnitude (RMSE), and combined accuracy in correlation, bias, and variability (KGE). In addition, they are widely used in water-quality modelling of aquatic ecosystems, allowing comparison of the results with other studies.</p>
</sec>
</sec>
<sec id="Ch1.S2.SS5">
  <label>2.5</label><title>Driver importance and SHAP analysis</title>
      <p id="d2e1476">To assess the influence of the most important drivers, we included SHapley Additive exPlanations (SHAP) plots, using the best boosting method. The SHAP plots measures how much a single driver (feature) value contributes to moving the prediction away from the average value, the <inline-formula><mml:math id="M44" display="inline"><mml:mi>y</mml:mi></mml:math></inline-formula>-axis has the input drivers ranked by overall importance (from top to bottom), <inline-formula><mml:math id="M45" display="inline"><mml:mi>x</mml:mi></mml:math></inline-formula>-axis has the SHAP value representing the impact on model output for a single prediction, each point is a single data instance and the color reflects the driver value (blue <inline-formula><mml:math id="M46" display="inline"><mml:mo>=</mml:mo></mml:math></inline-formula> low, red <inline-formula><mml:math id="M47" display="inline"><mml:mo>=</mml:mo></mml:math></inline-formula> high). In addition, in the Appendix, partial dependence plots were generated to support and illustrate the individual influence of each driver on fDOM predictions by varying the driver's values across its entire range while keeping all other drivers constant in their average value. These partial dependence plots were generated using the outputs from the Random Forest model. To implement SHAP analysis and partial dependence plots, the shap package in Python and the pdp package in R were used, respectively.</p>
</sec>
</sec>
<sec id="Ch1.S3">
  <label>3</label><title>Results</title>
<sec id="Ch1.S3.SS1">
  <label>3.1</label><title>Driver attribution</title>
      <p id="d2e1523">The most influential predictors of fDOM at each site were identified from all 24 potential drivers based on the 5 % threshold of the variable importance extracted from the CTB model. This resulted in seven influential drivers being identified for Feeagh and five for Sau (Fig. 3), four of which were common to both sites. This selection was consistent across multiple machine-learning models (Fig. A8).</p>
      <p id="d2e1526">The variables for the deepest soil layer were relevant for both study sites. Soil temperature and soil moisture at 100–255 cm, were the most influential drivers for Feeagh. Similarly, for Sau, soil moisture and temperature at similar depths were the most influential drivers. Another key driver that was shared between Feeagh and Sau was Julian day. Lake volume was only important for Sau, while solar radiation, the amount of carbon entering the water body (indicated by the DOC inflow concentration) and both soil moisture and temperature at 28–100 cm were only influential for Feeagh.</p>

      <fig id="F3"><label>Figure 3</label><caption><p id="d2e1531">Selection of influential drivers for predicting fDOM. Twenty-four drivers from multiple sources were used to train the machine-learning models, including climate variables, soil variables, hydrologic and water-quality model outputs, lake model outputs, and cosine-transformed Julian day (see Fig. 2 for more details). Feature importance is shown for the best-performing model (CatBoost) using gain contribution, and drivers exceeding 5 % importance are highlighted.</p></caption>
          <graphic xlink:href="https://bg.copernicus.org/articles/23/2661/2026/bg-23-2661-2026-f03.png"/>

        </fig>


</sec>
<sec id="Ch1.S3.SS2">
  <label>3.2</label><title>Driver influence on fDOM predictions</title>
      <p id="d2e1550">Figure 4 introduces SHAP beeswarm plots for the most influential drivers selected in Fig. 3, enabling the assessment of the individual effect of each driver on fDOM predictions.</p>

      <fig id="F4"><label>Figure 4</label><caption><p id="d2e1555">SHAP beeswarm plots showing the contribution of key drivers to fDOM predictions for <bold>(a)</bold> Lough Feeagh and <bold>(b)</bold> Sau Reservoir based on the CatBoost model (CTB). Points represent individual observations, coloured by feature value, and are ordered by mean absolute SHAP value.</p></caption>
          <graphic xlink:href="https://bg.copernicus.org/articles/23/2661/2026/bg-23-2661-2026-f04.png"/>

        </fig>

      <p id="d2e1570">Seasonal patterns in fDOM concentrations were observed in both Feeagh and Sau, with higher values in winter and lower in summer, as reflected in the influence of Julian day. However, the key predictors and their effects differed substantially, shaped by contrasting catchment and climate characteristics. In Feeagh, where precipitation is relatively high and sustained year-round, deep soil temperature (100–255 cm) was the dominant and potentially limiting predictor, with fDOM increasing with temperature up to a threshold, beyond which a drop in the water table may counteract the effect (see partial dependance plots; Figs. A9 and A10). In addition, Feeagh showed minimal influence of mid-depth soil temperature (28–100 cm) and solar radiation on fDOM. This temperature-fDOM relationship was less relevant in Sau, where deep soil temperature (100–255 cm) had less explanatory power. Instead, fDOM dynamics in Sau, were driven primarily by soil moisture at both 28–100   and 100–255 cm depths, potentially depicting a limiting condition by water stress.</p>
      <p id="d2e1574">Water availability also shaped the role of other predictors differently across the two sites. For instance, surface water temperature in Feeagh showed a clear threshold behaviour, with fDOM increasing relatively linearly beyond 6.5 °C and stabilizing around 7.5 °C, while in Sau, water volume acted as a surrogate for fDOM production (see partial dependance plots; Figs. A9 and A10). Lower volumes corresponded to reduced fDOM values, which increase and stabilize beyond 146 hm<sup>3</sup>. Interestingly, inflow DOC concentration influences Feeagh more than Sau, likely due to differences in hydroclimatological processes governing these relationships. These differences highlight how catchment water availability fundamentally alters the relative importance and behavior of fDOM drivers.</p>
</sec>
<sec id="Ch1.S3.SS3">
  <label>3.3</label><title>Predicting DOM using Supervised Machine Learning</title>
      <p id="d2e1594">Table 1 presents a comparative evaluation of the seven ML and statistical models employed in this study. In Lough Feeagh, CatBoost (CTB) demonstrated the best overall performance with the highest <inline-formula><mml:math id="M49" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> (0.50) and KGE (0.68), and a relatively low RMSE (8.17 QSU). Similarly, in Sau Reservoir, CTB again has the highest <inline-formula><mml:math id="M50" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> (0.54) and KGE (0.66), and one of the lowest RMSE (7.11 QSU). While some models like RF and LM showed moderate performance at Feeagh, others such as SVR and XGB performed poorly in Sau, with SVR even having a negative KGE (<inline-formula><mml:math id="M51" display="inline"><mml:mo lspace="0mm">-</mml:mo></mml:math></inline-formula>0.82), suggesting significant model bias. Overall, CatBoost (CTB) consistently outperformed all other models across both sites, supporting its selection for fDOM prediction using the selected environmental drivers.</p>
      <p id="d2e1626">The results during the training phase (Table A2) confirm that most ML models, especially XGB, RF, and KNN, had a high performance during training. However, the performance dropped in some models (e.g., XGB at Sau Reservoir) during the testing phases, underscoring the importance of evaluating models on independent test data to assess generalisability and overfitting. CatBoost (CTB), again, presented a more stable performance, showing slightly lower training metrics (especially in Feeagh) compared with the other ML models and better generalisation when comparing with the metrics during testing. Feeagh exhibited more consistent model performance, with less overfitting between training and testing phases, across the different validation methods (Fig. A7), compared to Sau. This difference is likely attributable to the limited amount of available data at Sau. In addition, we assess whether this overfitting could be related to the moderate skewness of the fDOM data in Sau, however similar results and performance metrics were found when applying log transformation (to remove skewness) before applying the ML models (see Fig. A11 and Tables A3 and A4).</p>

<table-wrap id="T1" specific-use="star"><label>Table 1</label><caption><p id="d2e1632">Model comparison to predict fDOM using all selected drivers in both study sites. Statistic metrics (<inline-formula><mml:math id="M52" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula>, RMSE and KGE) were calculated to compare the performance of the models during the testing (hold-out) period. The table support the selection of the best model for each study site between Random Forest (RF), eXtreme Gradient Boosting (XGBoost), Light Gradient Boosting (LGB), CatBoost (CTB), <inline-formula><mml:math id="M53" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-Nearest Neighbors (KNN), Support Vector Regression (SVR) and linear model. Additionally, we tested other boosting methods, performance metrics for testing phase can be found in Table A2. Table A1  contains the results during the training period for comparison. </p></caption><oasis:table frame="topbot"><oasis:tgroup cols="15">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="right"/>
     <oasis:colspec colnum="3" colname="col3" align="right"/>
     <oasis:colspec colnum="4" colname="col4" align="right"/>
     <oasis:colspec colnum="5" colname="col5" align="right"/>
     <oasis:colspec colnum="6" colname="col6" align="right"/>
     <oasis:colspec colnum="7" colname="col7" align="right"/>
     <oasis:colspec colnum="8" colname="col8" align="right" colsep="1"/>
     <oasis:colspec colnum="9" colname="col9" align="right"/>
     <oasis:colspec colnum="10" colname="col10" align="right"/>
     <oasis:colspec colnum="11" colname="col11" align="right"/>
     <oasis:colspec colnum="12" colname="col12" align="right"/>
     <oasis:colspec colnum="13" colname="col13" align="right"/>
     <oasis:colspec colnum="14" colname="col14" align="right"/>
     <oasis:colspec colnum="15" colname="col15" align="right"/>
     <oasis:thead>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1"/>
         <oasis:entry namest="col2" nameend="col8" align="center" colsep="1">Lough Feeagh </oasis:entry>
         <oasis:entry namest="col9" nameend="col15" align="center">Sau Reservoir </oasis:entry>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">Model/Metric</oasis:entry>
         <oasis:entry colname="col2">RF</oasis:entry>
         <oasis:entry colname="col3">XGB</oasis:entry>
         <oasis:entry colname="col4">LGB</oasis:entry>
         <oasis:entry colname="col5">CTB</oasis:entry>
         <oasis:entry colname="col6">LM</oasis:entry>
         <oasis:entry colname="col7">KNN</oasis:entry>
         <oasis:entry colname="col8">SVR</oasis:entry>
         <oasis:entry colname="col9">RF</oasis:entry>
         <oasis:entry colname="col10">XGB</oasis:entry>
         <oasis:entry colname="col11">LGB</oasis:entry>
         <oasis:entry colname="col12">CTB</oasis:entry>
         <oasis:entry colname="col13">LM</oasis:entry>
         <oasis:entry colname="col14">KNN</oasis:entry>
         <oasis:entry colname="col15">SVR</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1"><inline-formula><mml:math id="M54" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col2">0.37</oasis:entry>
         <oasis:entry colname="col3">0.41</oasis:entry>
         <oasis:entry colname="col4">0.47</oasis:entry>
         <oasis:entry colname="col5">0.51</oasis:entry>
         <oasis:entry colname="col6">0.40</oasis:entry>
         <oasis:entry colname="col7">0.36</oasis:entry>
         <oasis:entry colname="col8">0.34</oasis:entry>
         <oasis:entry colname="col9">0.53</oasis:entry>
         <oasis:entry colname="col10">0.11</oasis:entry>
         <oasis:entry colname="col11">0.45</oasis:entry>
         <oasis:entry colname="col12">0.54</oasis:entry>
         <oasis:entry colname="col13">0.45</oasis:entry>
         <oasis:entry colname="col14">0.33</oasis:entry>
         <oasis:entry colname="col15">0.63</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">RMSE (QSU)</oasis:entry>
         <oasis:entry colname="col2">9.04</oasis:entry>
         <oasis:entry colname="col3">9.27</oasis:entry>
         <oasis:entry colname="col4">8.22</oasis:entry>
         <oasis:entry colname="col5">8.29</oasis:entry>
         <oasis:entry colname="col6">8.52</oasis:entry>
         <oasis:entry colname="col7">9.85</oasis:entry>
         <oasis:entry colname="col8">10.54</oasis:entry>
         <oasis:entry colname="col9">8.24</oasis:entry>
         <oasis:entry colname="col10">12.4</oasis:entry>
         <oasis:entry colname="col11">9.98</oasis:entry>
         <oasis:entry colname="col12">7.11</oasis:entry>
         <oasis:entry colname="col13">6.58</oasis:entry>
         <oasis:entry colname="col14">9.23</oasis:entry>
         <oasis:entry colname="col15">17.05</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">KGE</oasis:entry>
         <oasis:entry colname="col2">0.55</oasis:entry>
         <oasis:entry colname="col3">0.52</oasis:entry>
         <oasis:entry colname="col4">0.63</oasis:entry>
         <oasis:entry colname="col5">0.69</oasis:entry>
         <oasis:entry colname="col6">0.55</oasis:entry>
         <oasis:entry colname="col7">0.12</oasis:entry>
         <oasis:entry colname="col8">0.30</oasis:entry>
         <oasis:entry colname="col9">0.55</oasis:entry>
         <oasis:entry colname="col10">0.21</oasis:entry>
         <oasis:entry colname="col11">0.33</oasis:entry>
         <oasis:entry colname="col12">0.66</oasis:entry>
         <oasis:entry colname="col13">0.31</oasis:entry>
         <oasis:entry colname="col14">0.43</oasis:entry>
         <oasis:entry colname="col15"><inline-formula><mml:math id="M55" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>0.82</oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table></table-wrap>

<sec id="Ch1.S3.SS3.SSS1">
  <label>3.3.1</label><title>Model prediction using all drivers compared to a reduced set</title>
      <p id="d2e1931">Figure 5 presents the fDOM prediction performance of the CatBoost model (best ML model overall) for Feeagh and Sau, using different input configurations. The models were trained on 85 % of the time series (blue points) and tested on the remaining 15 % using a hold-out approach (violet and green points). Two scenarios were compared: one using the most influential drivers (8 for Feeagh, 5 for Sau), and a second using only a reduced subset of reanalysis-based and easily accessible drivers, specifically soil temperature, soil moisture, and Julian day.</p>
      <p id="d2e1934">For both lakes, the reduced driver models showed only a modest decline in predictive performance during the testing period. For example, in Feeagh, the model using all influential drivers achieved <inline-formula><mml:math id="M56" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup><mml:mo>=</mml:mo><mml:mn mathvariant="normal">0.51</mml:mn></mml:mrow></mml:math></inline-formula> and KGE <inline-formula><mml:math id="M57" display="inline"><mml:mo>=</mml:mo></mml:math></inline-formula> 0.69, while the reduced model still attained <inline-formula><mml:math id="M58" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup><mml:mo>=</mml:mo><mml:mn mathvariant="normal">0.48</mml:mn></mml:mrow></mml:math></inline-formula> and KGE <inline-formula><mml:math id="M59" display="inline"><mml:mo>=</mml:mo></mml:math></inline-formula> 0.67. Similarly, in Sau, the full model scored <inline-formula><mml:math id="M60" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup><mml:mo>=</mml:mo><mml:mn mathvariant="normal">0.54</mml:mn></mml:mrow></mml:math></inline-formula> and KGE = 0.66, whereas the reduced model maintained a comparable <inline-formula><mml:math id="M61" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup><mml:mo>=</mml:mo><mml:mn mathvariant="normal">0.50</mml:mn></mml:mrow></mml:math></inline-formula> and KGE <inline-formula><mml:math id="M62" display="inline"><mml:mo>=</mml:mo></mml:math></inline-formula> 0.65. Although the training performance was higher in Sau compared with the testing performance, indicating potential overfitting, the CatBoost model provided informative and generalisable predictions in both study sites.</p>

      <fig id="F5"><label>Figure 5</label><caption><p id="d2e2021">fDOM predictions obtained from the CatBoost model using different driver sets. Panels show training (blue; 85 % of the time series) and testing periods (violet or green; 15 %). <bold>(a)</bold> Feeagh: simulations using the most influential drivers (violet) and a reduced subset of reanalysis-based drivers <inline-formula><mml:math id="M63" display="inline"><mml:mo>+</mml:mo></mml:math></inline-formula> Julian day (green). <bold>(b)</bold> Sau: simulations using the most influential drivers (violet) and a reduced subset of reanalysis-based drivers <inline-formula><mml:math id="M64" display="inline"><mml:mo>+</mml:mo></mml:math></inline-formula> Julian day (green). Model performance metrics (<inline-formula><mml:math id="M65" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula>, KGE, RMSE) are reported for both periods.</p></caption>
            <graphic xlink:href="https://bg.copernicus.org/articles/23/2661/2026/bg-23-2661-2026-f05.png"/>

          </fig>


</sec>
</sec>
</sec>
<sec id="Ch1.S4">
  <label>4</label><title>Discussion</title>
<sec id="Ch1.S4.SS1">
  <label>4.1</label><title>Driver attribution in contrasting conditions</title>
      <p id="d2e2080">The most influential drivers for fDOM identified for each site suggest potential carbon-producing processes that drive DOM dynamics in each lake. Interestingly, the most influential drivers were related to temperature and moisture at the deepest soil layers, indicating a common relevance for carbon inputs from terrestrial processes in both catchments. The Irish site is dominated by peat, a highly organic soil known to be sensitive to temperature-induced carbon release, especially as temperature levels rise. This is supported by the relation of soil temperatures with fDOM at this site <xref ref-type="bibr" rid="bib1.bibx46" id="paren.38"/>. In contrast, water availability constraints are much more pronounced in drier soils such as those of the Spanish site, as is validated by the partial dependence relationship of soil moisture with fDOM <xref ref-type="bibr" rid="bib1.bibx50" id="paren.39"/>. A plausible explanation is that, in Sau, increased soil moisture could boost biological activity, enhancing fDOM production, while in Feeagh, soil moisture showed a bell-shaped relationship, suggesting a more nuanced interplay between oxygen availability and microbial processes (Figs. 4,  A9 and A10). In addition, higher soil moisture can reflect stronger soil–lake hydrological connectivity, particularly during wet periods and flushing events, facilitating the transport of terrestrially derived DOM into the lake.</p>
      <p id="d2e2089">DOM dynamics are driven by physical and biogeochemical processes in the soil that are sensitive to changes in temperature and moisture, e.g., microbial processes that break down organic matter <xref ref-type="bibr" rid="bib1.bibx23" id="paren.40"/>. The fact that the deepest layers of the soil were more important in the model for both sites than those shallower could be linked to potential carbon attenuation processes, such as soil organic matter decomposition and retention in the soil, including sorption processes <xref ref-type="bibr" rid="bib1.bibx12 bib1.bibx44" id="paren.41"/>. In any case, the production of carbon in the catchment that eventually ends up in the lake requires concurrent downstream transport, governed by rainfall events.</p>
      <p id="d2e2098">Climate and topography (see Fig. A1) dictate the flushing of accumulated DOM during rainfall events, but also can influence sustained baseflow DOM contributions. In Feeagh, carbon exports from the catchment have been observed regularly throughout the entire annual cycle, with a seasonal variability <xref ref-type="bibr" rid="bib1.bibx10" id="paren.42"/>. In contrast, DOM in Sau accumulates primarily during the summer and is mainly flushed out via surface runoff during the wetter winter months <xref ref-type="bibr" rid="bib1.bibx30" id="paren.43"/>. These patterns are supported by the relationship between inflow DOC concentration and fDOM at both sites (Figs. 4,  A9 and A10), which shows a slight increase in predicted fDOM under lower carbon input conditions. Thus, inflow DOC concentration could reflect discharge pulses and dilution effects driven by precipitation <xref ref-type="bibr" rid="bib1.bibx22" id="paren.44"/>, following the characteristic seasonality of each site.</p>
      <p id="d2e2110">Seasonality plays a crucial role in fDOM predictions, as evidenced by the relationship of the Julian day driver with DOM dynamics at both sites (Fig. 4). At the Irish site, DOM seasonality is primarily shaped by natural environmental processes, whereas in the Spanish site, human influence plays a much greater role. This distinction helps explain why surface lake water temperature and solar radiation, two variables typically linked to strong seasonal patterns, were important only for Feeagh, while reservoir volume was significant only for Sau (Figs. 3, 4,  A9 and A10). Volume and soil variables produce a similar effect on fDOM as Julian day at Sau, given that higher volumes closer to the winter season can lead to higher fDOM values. Incorporating Julian day into the workflow offers a simple yet effective way to represent seasonality, potentially replacing seasonal variables (e.g., air temperature) (see the correlation matrix of all drivers in Fig. A12). This proves that the use of machine learning approaches opens up opportunities to assess diverse drivers under contrasting conditions.</p>
      <p id="d2e2114">Improving the accuracy of DOM predictions in lakes can enhance efforts to reduce further water quality deterioration and support lake management. This study demonstrated a feasible approach for simulating daily fDOM in two contrasting lakes, especially when using the Catboost model given its good generalisability. The performance metrics (Fig. 5) obtained at each site for the different model simulations lie in a similar or better range than comparable studies that modelled carbon dynamics in lakes <xref ref-type="bibr" rid="bib1.bibx17 bib1.bibx29 bib1.bibx55 bib1.bibx54" id="paren.45"><named-content content-type="pre">e.g.,</named-content></xref>. It is important to be aware, however, that previous studies are based on different frameworks. These include variations in the machine learning algorithms used, the target variable for quantifying carbon dynamics, with most studies having focused on DOC, whereas fDOM is the target variable here, as well as differences in input driver data and site-specific conditions. While both fDOM and DOC are widely used indicators of dissolved organic matter in lakes, they differ in measurement principles and in the fractions of organic matter they represent. Both variables, however, remain ecologically relevant for understanding DOM sources, transport, and transformation processes, which aligns with the context of this work.</p>
</sec>
<sec id="Ch1.S4.SS2">
  <label>4.2</label><title>Scalability</title>
      <p id="d2e2130">Our results suggest high potential for scalability, as predictive performance remained consistent across different driver sets even for two contrasting study sites, and performed good using only reanalysis data that is globally available and Julian day (Fig. 5). Importantly, this consistency remained even when specific highly-influential drivers were removed from the driver set. For instance, in Sau, where human intervention makes future reservoir outflows difficult to predict, avoiding reliance on water volume as a driver proved advantageous, as its removal from the set of drivers maintained model performance, despite being identified as an influential variable. It is likely that soil moisture at the deepest layer, a variable that showed a behavior similar to that of volume (Fig. 4), may have contributed to maintain the predictive performance when volume is removed from the driver set. In addition, for Feeagh, the predictive capacity was also maintained when using only meteorological and soil drivers. This demonstrates that a large driver dataset, such as the 24-variable set used in this study, would not be necessary to produce an accurate prediction when modelling fDOM using supervised machine learning.</p>
</sec>
<sec id="Ch1.S4.SS3">
  <label>4.3</label><title>Limitations and future research</title>
      <p id="d2e2141">Our approach offers the opportunity to validate and deploy a workflow capable of delivering daily DOM predictions in both undisturbed and anthropized sites, even when only limited data on input drivers are available, while at the same time providing insights into the dominant drivers. However, a site-specific model validation, including identification of appropriate training and testing periods, hyperparameter tuning for each specific study site and assessment of overfitting is essential. In terms of driver attribution, it is of note that the relationships identified using machine learning may not always be related at a process level <xref ref-type="bibr" rid="bib1.bibx48" id="paren.46"/>. In our case, however, many of these same drivers had already been identified for river DOC levels in the Feeagh catchment <xref ref-type="bibr" rid="bib1.bibx10 bib1.bibx46" id="paren.47"><named-content content-type="pre">e.g.,</named-content></xref>. While the workflow can be easily replicated, fDOM data or data for another proxy for DOM are required. It is of note that such proxies of DOM are increasingly being incorporated into water quality monitoring programmes, an aspect that is convenient for testing workflows such as the one described <xref ref-type="bibr" rid="bib1.bibx9" id="paren.48"/>.</p>
      <p id="d2e2155">The workflow presented here is not recommended for climate change studies, as the drivers of DOM variability can significantly change under entirely new and unrecorded climatic conditions. Consequently, supervised machine learning may fail to capture the signal from the time series. Moreover, the method's reliance on historical patterns limits its ability to extrapolate beyond the range of observed environmental conditions <xref ref-type="bibr" rid="bib1.bibx34" id="paren.49"/>. Future research could expand the application of this framework to a broader range of lakes, integrate additional drivers such as remote sensing-derived terrestrial and aquatic quantity and quality parameters <xref ref-type="bibr" rid="bib1.bibx11" id="paren.50"/>.</p>
</sec>
</sec>
<sec id="Ch1.S5" sec-type="conclusions">
  <label>5</label><title>Conclusions</title>
      <p id="d2e2174">By identifying the key environmental drivers of lake dissolved organic matter (DOM) dynamics, this study presents an open, robust and scalable workflow for daily DOM prediction using different ML algorithms. Validated in two hydroclimatic contrasting sites in Ireland (Lough Feeagh) and Spain (Sau Reservoir), the approach revealed that deep soil temperature is the dominant driver in the peat-rich, temperate Irish catchment, whereas deep soil moisture plays a more critical role in the drier, Mediterranean setting of the Spanish site. These primary drivers are further shaped by hydrological processes, seasonal variability, and human activities.</p>
      <p id="d2e2177">The workflow showed good predictive performance even when based solely on globally available reanalysis data, supporting its potential applicability to other freshwater systems worldwide. In addition to expanding the set of approaches available for lake DOM prediction, the workflow offers transparent driver attribution, contributing valuable insights into the natural and anthropogenic processes governing carbon cycling in aquatic ecosystems.</p>
</sec>

      
      </body>
    <back><app-group>

<app id="App1.Ch1.S1">
  <label>Appendix A</label><title/>
<sec id="App1.Ch1.S1.SS1">
  <label>A1</label><title>Topography of Feeagh and Sau catchments</title>
      <p id="d2e2198">Figure A1 contains the elevation range for the two contrasting freshwater ecosystems: Lough Feeagh and Sau Reservoir.</p>

      <fig id="FA1"><label>Figure A1</label><caption><p id="d2e2203">Elevation range for the two contrasting freshwater ecosystems: Lough Feeagh and Sau Reservoir.</p></caption>
          
          <graphic xlink:href="https://bg.copernicus.org/articles/23/2661/2026/bg-23-2661-2026-f06.png"/>

        </fig>


</sec>
<sec id="App1.Ch1.S1.SS2">
  <label>A2</label><title>CORINE land-cover classes and aggregation for both study catchments</title>
      <p id="d2e2224">Here, we present Fig. A2 listing all original CORINE Land Cover 2018 classes identified within each catchment and their aggregation into the grouped categories used in Fig. 1. These tables show how individual CORINE classes were consolidated to represent dominant catchment characteristics while maintaining consistency with the original CORINE classification. This information supports the visual interpretation presented in Fig. 1.</p>

      <fig id="FA2"><label>Figure A2</label><caption><p id="d2e2229"><bold>(a)</bold> CORINE Land Cover 2018 classes identified in Feeagh and the grouped categories shown for visual clarity in the main manuscript in Fig. 1; <bold>(b)</bold> CORINE Land Cover 2018 classes identified in Sau and the grouped categories shown for visual clarity in the main manuscript in Fig. 1.</p></caption>
          
          <graphic xlink:href="https://bg.copernicus.org/articles/23/2661/2026/bg-23-2661-2026-f07.png"/>

        </fig>


</sec>
<sec id="App1.Ch1.S1.SS3">
  <label>A3</label><title>Data inspection and exploration</title>
      <p id="d2e2255">Data inspection and exploration: the target variable (fDOM) at one of the study sites (Feeagh) exhibits low to moderate skewness (approximately 0.5) with relatively few outliers, whereas the other study site (Sau) displays higher skewness (greater than 1 but below 2) and several extreme values. Zero-inflation does not appear to be an issue at either site.</p>

      <fig id="FA3"><label>Figure A3</label><caption><p id="d2e2260">Exploratory data analysis for fDOM in both study sites, <bold>(a)</bold> Shows the density plot of fDOM in Feeagh and <bold>(b)</bold> Shows the density plot of fDOM in Sau, <bold>(c)</bold> Shows the density plot of fDOM in Sau using log<sub>10</sub> transformation. The red vertical line separates the outliers.</p></caption>
          
          <graphic xlink:href="https://bg.copernicus.org/articles/23/2661/2026/bg-23-2661-2026-f08.png"/>

        </fig>


</sec>
<sec id="App1.Ch1.S1.SS4">
  <label>A4</label><title>Water temperature correction of fDOM for both study sites</title>
      <p id="d2e2299">In Feeagh, fDOM data was corrected for the temperature quenching effect in previous scientific studies. The raw measurements were corrected using the compensation approach described by <xref ref-type="bibr" rid="bib1.bibx51 bib1.bibx45" id="paren.51"/>. Corrected fluorescence was calculated relative to a reference temperature of 20 °C using: <inline-formula><mml:math id="M67" display="inline"><mml:mrow><mml:mtext>Corrected fDOM</mml:mtext><mml:mo>=</mml:mo><mml:mstyle displaystyle="false"><mml:mfrac style="text"><mml:mtext>raw fDOM</mml:mtext><mml:mrow><mml:mn mathvariant="normal">1</mml:mn><mml:mo>+</mml:mo><mml:mo>(</mml:mo><mml:mi>T</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">20</mml:mn><mml:mo>)</mml:mo><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mi>k</mml:mi></mml:mrow></mml:mfrac></mml:mstyle></mml:mrow></mml:math></inline-formula> where <inline-formula><mml:math id="M68" display="inline"><mml:mi>T</mml:mi></mml:math></inline-formula> is the sonde water temperature (°C) and <inline-formula><mml:math id="M69" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula> is the temperature correction coefficient (<inline-formula><mml:math id="M70" display="inline"><mml:mrow><mml:mi>k</mml:mi><mml:mo>=</mml:mo><mml:mi mathvariant="normal">−</mml:mi><mml:mn mathvariant="normal">0.0168</mml:mn></mml:mrow></mml:math></inline-formula> for Feeagh, derived as the average of monthly values reported by <xref ref-type="bibr" rid="bib1.bibx45" id="text.52"/>).</p>

      <fig id="FA4"><label>Figure A4</label><caption><p id="d2e2371">Upper panel: Raw (blue) and temperature-corrected (red) fDOM measurements for Feeagh; lower panel: surface water temperature sonde measurements at Feeagh.</p></caption>
          <graphic xlink:href="https://bg.copernicus.org/articles/23/2661/2026/bg-23-2661-2026-f09.png"/>

        </fig>

      <p id="d2e2380">In Sau, the fDOM data were corrected for the temperature quenching effect, following a test provided by the fDOM sensor manufacturer, where they use a 300 QSU sample of water and change the temperature to get the effect of temperature in the measurement, results can be found in Fig. A5.</p><fig id="FA5"><label>Figure A5</label><caption><p id="d2e2386">fDOM relation with temperature with a sample of 300 QSU provided by the manufacturer of the fDOM sensor in Sau.</p></caption>
          
          <graphic xlink:href="https://bg.copernicus.org/articles/23/2661/2026/bg-23-2661-2026-f10.png"/>

        </fig>

      <fig id="FA6"><label>Figure A6</label><caption><p id="d2e2399">Uncorrected (blue) and corrected (red) fDOM data for Sau.</p></caption>
          
          <graphic xlink:href="https://bg.copernicus.org/articles/23/2661/2026/bg-23-2661-2026-f11.png"/>

        </fig>

      <p id="d2e2410">Then, fDOM data in Sau was corrected following this linear regression and surface water temperature on the lake. Figure A6 presents the uncorrected values in blue and corrected values in red (8 negative values were removed from the total sample of 777)</p>
</sec>
<sec id="App1.Ch1.S1.SS5">
  <label>A5</label><title>Hydrologic Modelling</title>
      <p id="d2e2422">Daily time series of inflow discharge and inflow DOC concentration into each site were generated using the Generalised Watershed Loading Functions Model (GWLF) coupled with a DOC module (GWLF-DOC). This model simulates catchment hydrology (water balance and water distribution among the different hydrological pathways) and DOC dynamics (DOC production and DOC washout) in a daily time step. The model input requirements include daily time series of two meteorological variables: total precipitation and air temperature; as well as land cover, land use, and soil characterisation.</p>
      <p id="d2e2429">GWLF-DOC was applied to Feeagh based on previous model applications in the Irish catchment <xref ref-type="bibr" rid="bib1.bibx38" id="paren.53"/>, for which measured discharge data were used to calibrate and validate the hydrology (2013–2018 and 2019–2023, respectively), and DOC concentration data were used to calibrate the DOC module (2016–2023). In Sau, observed inflow discharge data were used to calibrate and validate the hydrology (2008–2011 and 2011–2024, respectively), and measured DOC concentration data were used to calibrate and validate the DOC module (2008–2014 and 2016–2018, respectively) using the same calibration strategy than for the Irish site. Calibration results were satisfactory for both hydrology (Feeagh: <inline-formula><mml:math id="M71" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup><mml:mo>=</mml:mo><mml:mn mathvariant="normal">0.64</mml:mn></mml:mrow></mml:math></inline-formula> and NSE <inline-formula><mml:math id="M72" display="inline"><mml:mo>=</mml:mo></mml:math></inline-formula> 0.64; Sau: <inline-formula><mml:math id="M73" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup><mml:mo>=</mml:mo><mml:mn mathvariant="normal">0.66</mml:mn></mml:mrow></mml:math></inline-formula> and NSE <inline-formula><mml:math id="M74" display="inline"><mml:mo>=</mml:mo></mml:math></inline-formula> 0.66;) and DOC (Feeagh: <inline-formula><mml:math id="M75" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup><mml:mo>=</mml:mo><mml:mn mathvariant="normal">0.45</mml:mn></mml:mrow></mml:math></inline-formula> and NSE <inline-formula><mml:math id="M76" display="inline"><mml:mo>=</mml:mo></mml:math></inline-formula> 0.47; Sau: <inline-formula><mml:math id="M77" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup><mml:mo>=</mml:mo><mml:mn mathvariant="normal">0.44</mml:mn></mml:mrow></mml:math></inline-formula> and NSE <inline-formula><mml:math id="M78" display="inline"><mml:mo>=</mml:mo></mml:math></inline-formula> 0.40). Similarly, validation results were satisfactory for both hydrology (Feeagh: <inline-formula><mml:math id="M79" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup><mml:mo>=</mml:mo><mml:mn mathvariant="normal">0.60</mml:mn></mml:mrow></mml:math></inline-formula> and NSE <inline-formula><mml:math id="M80" display="inline"><mml:mo>=</mml:mo></mml:math></inline-formula> 0.60; Sau: <inline-formula><mml:math id="M81" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup><mml:mo>=</mml:mo><mml:mn mathvariant="normal">0.42</mml:mn></mml:mrow></mml:math></inline-formula> and NSE <inline-formula><mml:math id="M82" display="inline"><mml:mo>=</mml:mo></mml:math></inline-formula> 0.42) and DOC in the case of Sau (<inline-formula><mml:math id="M83" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup><mml:mo>=</mml:mo><mml:mn mathvariant="normal">0.50</mml:mn></mml:mrow></mml:math></inline-formula> and NSE <inline-formula><mml:math id="M84" display="inline"><mml:mo>=</mml:mo></mml:math></inline-formula> 0.46).</p>
</sec>
<sec id="App1.Ch1.S1.SS6">
  <label>A6</label><title>Lake Modelling</title>
      <p id="d2e2599">Daily time series of 5 key lake variables (see Fig. 2) were obtained from the General Lake Model (GLM) run for each site. GLM is an open-source, one-dimensional hydrodynamic model designed to simulate the vertical stratification and water balance of lakes and reservoirs. It calculates vertical profiles of temperature, and density by accounting for factors such as inflows and outflows, mixing processes, and surface heating and cooling <xref ref-type="bibr" rid="bib1.bibx20" id="paren.54"/>. GLM was calibrated and validated by evaluating the fit of modelled water temperature against measured water temperature profile data in Feeagh (2010–2015 and 2016–2017, respectively) and Sau (1997–2007 and 2008–2018, respectively). The calibration strategy was based on previous lake modelling deployments at each site <xref ref-type="bibr" rid="bib1.bibx33 bib1.bibx39" id="paren.55"/>. Model performance was satisfactory for both sites.</p>
</sec>
<sec id="App1.Ch1.S1.SS7">
  <label>A7</label><title>Comparison of validation methods using Random Forest</title>
      <p id="d2e2616">To pick the most suitable validation method, we implemented hold-out period method used in the main manuscript, <inline-formula><mml:math id="M85" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-fold cross-validation method using <inline-formula><mml:math id="M86" display="inline"><mml:mrow><mml:mi>k</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">5</mml:mn></mml:mrow></mml:math></inline-formula>, and rolling window cross-validation using training size of two year, testing size of one year, and a shift window every 90 d. Results are shown in Fig. A7.</p><fig id="FA7"><label>Figure A7</label><caption><p id="d2e2640">Comparison of validation methods using random forest for both study sites: hold-out period method used in the main manuscript, <inline-formula><mml:math id="M87" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-fold cross-validation method using <inline-formula><mml:math id="M88" display="inline"><mml:mrow><mml:mi>k</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">5</mml:mn></mml:mrow></mml:math></inline-formula>, and rolling window cross-validation using training size of two year, testing size of one year, and a shift window every 90 d. For this method, in the case of Sau Reservoir it is not possible to get a clear analysis due to the limited data.</p></caption>
          
          <graphic xlink:href="https://bg.copernicus.org/articles/23/2661/2026/bg-23-2661-2026-f12.png"/>

        </fig>


</sec>
<sec id="App1.Ch1.S1.SS8">
  <label>A8</label><title>Identification of most influential drivers</title>

      <fig id="FA8"><label>Figure A8</label><caption><p id="d2e2682">Selection of influential drivers for predicting fDOM. Twenty-four drivers from multiple sources were used to train the machine-learning models, including climate variables, soil variables, hydrologic and water-quality model outputs, lake model outputs, and cosine-transformed Julian day. Feature importance is shown for four models Random Forest, Light Gradient Boosting, eXtreme Gradient Boosting and Catboosting, and drivers exceeding 5 % importance are highlighted.</p></caption>
          
          <graphic xlink:href="https://bg.copernicus.org/articles/23/2661/2026/bg-23-2661-2026-f13.png"/>

        </fig>


</sec>
<sec id="App1.Ch1.S1.SS9">
  <label>A9</label><title>Partial dependence plots for the most influential drivers</title>

      <fig id="FA9"><label>Figure A9</label><caption><p id="d2e2705">Partial dependence plots for the most influential drivers (feature importance <inline-formula><mml:math id="M89" display="inline"><mml:mo>&gt;</mml:mo></mml:math></inline-formula> 5 %, estimated using node purity) based on Random Forest model outputs for Lough Feeagh. Cosine-transformed Julian day (cyday) values are shown for seasonality; approximate calendar timing corresponds to cos(Julian day) <inline-formula><mml:math id="M90" display="inline"><mml:mo>≈</mml:mo></mml:math></inline-formula> 1 (winter), 0 (spring/autumn), and −1 (summer).</p></caption>
          
          <graphic xlink:href="https://bg.copernicus.org/articles/23/2661/2026/bg-23-2661-2026-f14.png"/>

        </fig>

<fig id="FA10"><label>Figure A10</label><caption><p id="d2e2733">Partial dependence plots for the most influential drivers (feature importance <inline-formula><mml:math id="M91" display="inline"><mml:mo>&gt;</mml:mo></mml:math></inline-formula> 5 %, estimated using node purity) based on Random Forest model outputs for Sau Reservoir. Cosine-transformed Julian day (cyday) values are shown for seasonality; approximate calendar timing corresponds to cos(Julian day) <inline-formula><mml:math id="M92" display="inline"><mml:mo>≈</mml:mo></mml:math></inline-formula> 1 (winter), 0 (spring/autumn), and <inline-formula><mml:math id="M93" display="inline"><mml:mi mathvariant="normal">−</mml:mi></mml:math></inline-formula>1 (summer). </p></caption>
          
          <graphic xlink:href="https://bg.copernicus.org/articles/23/2661/2026/bg-23-2661-2026-f15.png"/>

        </fig>


</sec>
<sec id="App1.Ch1.S1.SS10">
  <label>A10</label><title>Result for Sau Reservoir  using log<sub>10</sub> transformation</title>

      <fig id="FA11"><label>Figure A11</label><caption><p id="d2e2787">fDOM predictions obtained from the boosting (CatBoost) model using the most influential drivers with initial log<sub>10</sub> transformation of the fDOM data.</p></caption>
          
          <graphic xlink:href="https://bg.copernicus.org/articles/23/2661/2026/bg-23-2661-2026-f16.png"/>

        </fig>

</sec>
<sec id="App1.Ch1.S1.SS11">
  <label>A11</label><title>Correlation matrix of all drivers and fDOM for Feeagh and Sau.</title>
      <p id="d2e2816">Correlation matrix of all drivers is presented in Fig. A12.</p>

      <fig id="FA12"><label>Figure A12</label><caption><p id="d2e2821">Correlation matrix of all drivers and fDOM for Feeagh and Sau.</p></caption>
          
          <graphic xlink:href="https://bg.copernicus.org/articles/23/2661/2026/bg-23-2661-2026-f17.png"/>

        </fig>


</sec>
<sec id="App1.Ch1.S1.SS12">
  <label>A12</label><title>Model comparison to predict fDOM using the most influential drivers in both study sites.</title>
      <p id="d2e2842">Resulting metrics during testing and training periods for all models are shown in Tables A1 and A2, respectively.</p>

<table-wrap id="TA1"><label>Table A1</label><caption><p id="d2e2849">Model comparison during testing to predict fDOM using all selected drivers in both study sites. Statistic metrics (<inline-formula><mml:math id="M96" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula>, RMSE and KGE) were calculated to compare the performance of the models Random Forest (RF), eXtreme Gradient Boosting (XGBoost), Light Gradient Boosting (LGB), Catboosting (CTB), <inline-formula><mml:math id="M97" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-Nearest Neighbors (KNN), Support Vector Regression (SVR) and linear model during training period.</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="15">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="right"/>
     <oasis:colspec colnum="3" colname="col3" align="right"/>
     <oasis:colspec colnum="4" colname="col4" align="right"/>
     <oasis:colspec colnum="5" colname="col5" align="right"/>
     <oasis:colspec colnum="6" colname="col6" align="right"/>
     <oasis:colspec colnum="7" colname="col7" align="right"/>
     <oasis:colspec colnum="8" colname="col8" align="right" colsep="1"/>
     <oasis:colspec colnum="9" colname="col9" align="right"/>
     <oasis:colspec colnum="10" colname="col10" align="right"/>
     <oasis:colspec colnum="11" colname="col11" align="right"/>
     <oasis:colspec colnum="12" colname="col12" align="right"/>
     <oasis:colspec colnum="13" colname="col13" align="right"/>
     <oasis:colspec colnum="14" colname="col14" align="right"/>
     <oasis:colspec colnum="15" colname="col15" align="right"/>
     <oasis:thead>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1"/>
         <oasis:entry namest="col2" nameend="col8" align="center" colsep="1">Lough Feeagh </oasis:entry>
         <oasis:entry namest="col9" nameend="col15" align="center">Sau Reservoir </oasis:entry>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">Model/Metric</oasis:entry>
         <oasis:entry colname="col2">RF</oasis:entry>
         <oasis:entry colname="col3">XGB</oasis:entry>
         <oasis:entry colname="col4">LGB</oasis:entry>
         <oasis:entry colname="col5">CTB</oasis:entry>
         <oasis:entry colname="col6">LM</oasis:entry>
         <oasis:entry colname="col7">KNN</oasis:entry>
         <oasis:entry colname="col8">SVR</oasis:entry>
         <oasis:entry colname="col9">RF</oasis:entry>
         <oasis:entry colname="col10">XGB</oasis:entry>
         <oasis:entry colname="col11">LGB</oasis:entry>
         <oasis:entry colname="col12">CTB</oasis:entry>
         <oasis:entry colname="col13">LM</oasis:entry>
         <oasis:entry colname="col14">KNN</oasis:entry>
         <oasis:entry colname="col15">SVR</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1"><inline-formula><mml:math id="M98" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col2">0.37</oasis:entry>
         <oasis:entry colname="col3">0.40</oasis:entry>
         <oasis:entry colname="col4">0.40</oasis:entry>
         <oasis:entry colname="col5">0.50</oasis:entry>
         <oasis:entry colname="col6">0.40</oasis:entry>
         <oasis:entry colname="col7">0.36</oasis:entry>
         <oasis:entry colname="col8">0.34</oasis:entry>
         <oasis:entry colname="col9">0.53</oasis:entry>
         <oasis:entry colname="col10">0.09</oasis:entry>
         <oasis:entry colname="col11">0.25</oasis:entry>
         <oasis:entry colname="col12">0.54</oasis:entry>
         <oasis:entry colname="col13">0.45</oasis:entry>
         <oasis:entry colname="col14">0.33</oasis:entry>
         <oasis:entry colname="col15">0.63</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">RMSE</oasis:entry>
         <oasis:entry colname="col2">9.07</oasis:entry>
         <oasis:entry colname="col3">9.28</oasis:entry>
         <oasis:entry colname="col4">8.97</oasis:entry>
         <oasis:entry colname="col5">8.17</oasis:entry>
         <oasis:entry colname="col6">8.52</oasis:entry>
         <oasis:entry colname="col7">9.85</oasis:entry>
         <oasis:entry colname="col8">10.54</oasis:entry>
         <oasis:entry colname="col9">8.51</oasis:entry>
         <oasis:entry colname="col10">12.22</oasis:entry>
         <oasis:entry colname="col11">11.35</oasis:entry>
         <oasis:entry colname="col12">7.11</oasis:entry>
         <oasis:entry colname="col13">6.58</oasis:entry>
         <oasis:entry colname="col14">9.23</oasis:entry>
         <oasis:entry colname="col15">17.05</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">KGE</oasis:entry>
         <oasis:entry colname="col2">0.54</oasis:entry>
         <oasis:entry colname="col3">0.51</oasis:entry>
         <oasis:entry colname="col4">0.50</oasis:entry>
         <oasis:entry colname="col5">0.68</oasis:entry>
         <oasis:entry colname="col6">0.55</oasis:entry>
         <oasis:entry colname="col7">0.12</oasis:entry>
         <oasis:entry colname="col8">0.30</oasis:entry>
         <oasis:entry colname="col9">0.55</oasis:entry>
         <oasis:entry colname="col10">0.25</oasis:entry>
         <oasis:entry colname="col11">0.22</oasis:entry>
         <oasis:entry colname="col12">0.66</oasis:entry>
         <oasis:entry colname="col13">0.31</oasis:entry>
         <oasis:entry colname="col14">0.43</oasis:entry>
         <oasis:entry colname="col15"><inline-formula><mml:math id="M99" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>0.82</oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table></table-wrap>

<table-wrap id="TA2"><label>Table A2</label><caption><p id="d2e3144">Model comparison during training to predict fDOM using all selected drivers in both study sites. Statistic metrics (<inline-formula><mml:math id="M100" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula>, RMSE and KGE) were calculated to compare the performance of the models Random Forest (RF), eXtreme Gradient Boosting (XGBoost), Light Gradient Boosting (LGB), Catboosting (CTB), <inline-formula><mml:math id="M101" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-Nearest Neighbors (KNN), Support Vector Regression (SVR) and linear model during training period.</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="15">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="right"/>
     <oasis:colspec colnum="3" colname="col3" align="right"/>
     <oasis:colspec colnum="4" colname="col4" align="right"/>
     <oasis:colspec colnum="5" colname="col5" align="right"/>
     <oasis:colspec colnum="6" colname="col6" align="right"/>
     <oasis:colspec colnum="7" colname="col7" align="right"/>
     <oasis:colspec colnum="8" colname="col8" align="right" colsep="1"/>
     <oasis:colspec colnum="9" colname="col9" align="right"/>
     <oasis:colspec colnum="10" colname="col10" align="right"/>
     <oasis:colspec colnum="11" colname="col11" align="right"/>
     <oasis:colspec colnum="12" colname="col12" align="right"/>
     <oasis:colspec colnum="13" colname="col13" align="right"/>
     <oasis:colspec colnum="14" colname="col14" align="right"/>
     <oasis:colspec colnum="15" colname="col15" align="right"/>
     <oasis:thead>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1"/>
         <oasis:entry namest="col2" nameend="col8" align="center" colsep="1">Lough Feeagh </oasis:entry>
         <oasis:entry namest="col9" nameend="col15" align="center">Sau Reservoir </oasis:entry>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">Model/Metric</oasis:entry>
         <oasis:entry colname="col2">RF</oasis:entry>
         <oasis:entry colname="col3">XGB</oasis:entry>
         <oasis:entry colname="col4">LGB</oasis:entry>
         <oasis:entry colname="col5">CTB</oasis:entry>
         <oasis:entry colname="col6">LM</oasis:entry>
         <oasis:entry colname="col7">KNN</oasis:entry>
         <oasis:entry colname="col8">SVR</oasis:entry>
         <oasis:entry colname="col9">RF</oasis:entry>
         <oasis:entry colname="col10">XGB</oasis:entry>
         <oasis:entry colname="col11">LGB</oasis:entry>
         <oasis:entry colname="col12">CTB</oasis:entry>
         <oasis:entry colname="col13">LM</oasis:entry>
         <oasis:entry colname="col14">KNN</oasis:entry>
         <oasis:entry colname="col15">SVR</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1"><inline-formula><mml:math id="M102" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col2">0.99</oasis:entry>
         <oasis:entry colname="col3">1.00</oasis:entry>
         <oasis:entry colname="col4">0.99</oasis:entry>
         <oasis:entry colname="col5">0.75</oasis:entry>
         <oasis:entry colname="col6">0.22</oasis:entry>
         <oasis:entry colname="col7">0.99</oasis:entry>
         <oasis:entry colname="col8">0.89</oasis:entry>
         <oasis:entry colname="col9">0.99</oasis:entry>
         <oasis:entry colname="col10">1.00</oasis:entry>
         <oasis:entry colname="col11">1.00</oasis:entry>
         <oasis:entry colname="col12">0.99</oasis:entry>
         <oasis:entry colname="col13">0.67</oasis:entry>
         <oasis:entry colname="col14">0.98</oasis:entry>
         <oasis:entry colname="col15">0.98</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">RMSE (QSU)</oasis:entry>
         <oasis:entry colname="col2">0.97</oasis:entry>
         <oasis:entry colname="col3">0.00</oasis:entry>
         <oasis:entry colname="col4">0.86</oasis:entry>
         <oasis:entry colname="col5">5.04</oasis:entry>
         <oasis:entry colname="col6">7.87</oasis:entry>
         <oasis:entry colname="col7">0.92</oasis:entry>
         <oasis:entry colname="col8">2.95</oasis:entry>
         <oasis:entry colname="col9">0.81</oasis:entry>
         <oasis:entry colname="col10">0.01</oasis:entry>
         <oasis:entry colname="col11">0.17</oasis:entry>
         <oasis:entry colname="col12">0.72</oasis:entry>
         <oasis:entry colname="col13">5.43</oasis:entry>
         <oasis:entry colname="col14">1.17</oasis:entry>
         <oasis:entry colname="col15">1.38</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">KGE</oasis:entry>
         <oasis:entry colname="col2">0.95</oasis:entry>
         <oasis:entry colname="col3">1.00</oasis:entry>
         <oasis:entry colname="col4">0.98</oasis:entry>
         <oasis:entry colname="col5">0.58</oasis:entry>
         <oasis:entry colname="col6">0.25</oasis:entry>
         <oasis:entry colname="col7">0.98</oasis:entry>
         <oasis:entry colname="col8">0.90</oasis:entry>
         <oasis:entry colname="col9">0.97</oasis:entry>
         <oasis:entry colname="col10">1.00</oasis:entry>
         <oasis:entry colname="col11">1.00</oasis:entry>
         <oasis:entry colname="col12">0.99</oasis:entry>
         <oasis:entry colname="col13">0.75</oasis:entry>
         <oasis:entry colname="col14">0.99</oasis:entry>
         <oasis:entry colname="col15">0.98</oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table></table-wrap>

</sec>
<sec id="App1.Ch1.S1.SS13">
  <label>A13</label><title>Model comparison to predict fDOM using the most influential drivers in Sau with previous application of log<sub>10</sub> transformation</title>
      <p id="d2e3447">Tables A3 and A4 presents the testing and training performance metrics results for Sau site with previous application of log<sub>10</sub> transformation.</p>

<table-wrap id="TA3"><label>Table A3</label><caption><p id="d2e3463">Model comparison during testing to predict fDOM using the most influential drivers in Sau with previous application of log<sub>10</sub> transformation. Statistic metrics (<inline-formula><mml:math id="M106" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula>, RMSE and KGE) were calculated to compare the performance of the models Random Forest (RF), eXtreme Gradient Boosting (XGBoost), Light Gradient Boosting (LGB), Catboosting (CTB), <inline-formula><mml:math id="M107" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-Nearest Neighbors (KNN), Support Vector Regression (SVR) and linear model during training period.</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="8">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="right"/>
     <oasis:colspec colnum="3" colname="col3" align="right"/>
     <oasis:colspec colnum="4" colname="col4" align="right"/>
     <oasis:colspec colnum="5" colname="col5" align="right"/>
     <oasis:colspec colnum="6" colname="col6" align="right"/>
     <oasis:colspec colnum="7" colname="col7" align="right"/>
     <oasis:colspec colnum="8" colname="col8" align="right"/>
     <oasis:thead>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1"/>
         <oasis:entry namest="col2" nameend="col8" align="center">Sau Reservoir </oasis:entry>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">Model/Metric</oasis:entry>
         <oasis:entry colname="col2">RF</oasis:entry>
         <oasis:entry colname="col3">XGB</oasis:entry>
         <oasis:entry colname="col4">LGB</oasis:entry>
         <oasis:entry colname="col5">CTB</oasis:entry>
         <oasis:entry colname="col6">LM</oasis:entry>
         <oasis:entry colname="col7">KNN</oasis:entry>
         <oasis:entry colname="col8">SVR</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1"><inline-formula><mml:math id="M108" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col2">0.51</oasis:entry>
         <oasis:entry colname="col3">0.18</oasis:entry>
         <oasis:entry colname="col4">0.22</oasis:entry>
         <oasis:entry colname="col5">0.43</oasis:entry>
         <oasis:entry colname="col6">0.40</oasis:entry>
         <oasis:entry colname="col7">0.26</oasis:entry>
         <oasis:entry colname="col8">0.58</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">RMSE (QSU)</oasis:entry>
         <oasis:entry colname="col2">14.24</oasis:entry>
         <oasis:entry colname="col3">16.56</oasis:entry>
         <oasis:entry colname="col4">13.07</oasis:entry>
         <oasis:entry colname="col5">11.50</oasis:entry>
         <oasis:entry colname="col6">10.91</oasis:entry>
         <oasis:entry colname="col7">9.75</oasis:entry>
         <oasis:entry colname="col8">35.23</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">KGE</oasis:entry>
         <oasis:entry colname="col2">0.63</oasis:entry>
         <oasis:entry colname="col3">0.38</oasis:entry>
         <oasis:entry colname="col4">0.34</oasis:entry>
         <oasis:entry colname="col5">0.59</oasis:entry>
         <oasis:entry colname="col6">0.36</oasis:entry>
         <oasis:entry colname="col7">0.32</oasis:entry>
         <oasis:entry colname="col8"><inline-formula><mml:math id="M109" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>1.35</oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table></table-wrap>

<table-wrap id="TA4"><label>Table A4</label><caption><p id="d2e3661">Model comparison during training to predict fDOM using the most influential drivers in Sau with previous application of log<sub>10</sub> transformation. Statistic metrics (<inline-formula><mml:math id="M111" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula>, RMSE and KGE) were calculated to compare the performance of the models Random Forest (RF), eXtreme Gradient Boosting (XGBoost), Light Gradient Boosting (LGB), Catboosting (CTB), <inline-formula><mml:math id="M112" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-Nearest Neighbors (KNN), Support Vector Regression (SVR) and linear model during training period.</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="8">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="right"/>
     <oasis:colspec colnum="3" colname="col3" align="right"/>
     <oasis:colspec colnum="4" colname="col4" align="right"/>
     <oasis:colspec colnum="5" colname="col5" align="right"/>
     <oasis:colspec colnum="6" colname="col6" align="right"/>
     <oasis:colspec colnum="7" colname="col7" align="right"/>
     <oasis:colspec colnum="8" colname="col8" align="right"/>
     <oasis:thead>
       <oasis:row rowsep="1">
         <oasis:entry namest="col1" nameend="col7" align="center">Sau Reservoir </oasis:entry>
         <oasis:entry colname="col8"/>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">Model/Metric</oasis:entry>
         <oasis:entry colname="col2">RF</oasis:entry>
         <oasis:entry colname="col3">XGB</oasis:entry>
         <oasis:entry colname="col4">LGB</oasis:entry>
         <oasis:entry colname="col5">CTB</oasis:entry>
         <oasis:entry colname="col6">LM</oasis:entry>
         <oasis:entry colname="col7">KNN</oasis:entry>
         <oasis:entry colname="col8">SVR</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1"><inline-formula><mml:math id="M113" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col2">0.98</oasis:entry>
         <oasis:entry colname="col3">1.00</oasis:entry>
         <oasis:entry colname="col4">0.99</oasis:entry>
         <oasis:entry colname="col5">1.00</oasis:entry>
         <oasis:entry colname="col6">0.58</oasis:entry>
         <oasis:entry colname="col7">0.94</oasis:entry>
         <oasis:entry colname="col8">0.87</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">RMSE (QSU)</oasis:entry>
         <oasis:entry colname="col2">1.05</oasis:entry>
         <oasis:entry colname="col3">0.07</oasis:entry>
         <oasis:entry colname="col4">0.74</oasis:entry>
         <oasis:entry colname="col5">0.54</oasis:entry>
         <oasis:entry colname="col6">4.65</oasis:entry>
         <oasis:entry colname="col7">1.22</oasis:entry>
         <oasis:entry colname="col8">1.39</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">KGE</oasis:entry>
         <oasis:entry colname="col2">0.94</oasis:entry>
         <oasis:entry colname="col3">1.00</oasis:entry>
         <oasis:entry colname="col4">0.98</oasis:entry>
         <oasis:entry colname="col5">0.99</oasis:entry>
         <oasis:entry colname="col6">0.66</oasis:entry>
         <oasis:entry colname="col7">0.94</oasis:entry>
         <oasis:entry colname="col8">0.89</oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table></table-wrap>

</sec>
</app>
  </app-group><notes notes-type="codedataavailability"><title>Code and data availability</title>

      <p id="d2e3853">All data and codes used in this study are available here: <ext-link xlink:href="https://doi.org/10.5281/zenodo.19354955" ext-link-type="DOI">10.5281/zenodo.19354955</ext-link> <xref ref-type="bibr" rid="bib1.bibx32" id="paren.56"/>.</p>
  </notes><notes notes-type="authorcontribution"><title>Author contributions</title>

      <p id="d2e3865">DMB wrote the original draft and conducted the main analysis. RP contributed to the main analysis and, the writing and revision of manuscript. DMB, RP, VM, EJ, and RM conceptualized and designed the study. VM, EJ, EE, and RM contributed to the writing and revision of the manuscript. EE, AG, MD, JG, and JJ collected and provided in-situ data and offered expert feedback.</p>
  </notes><notes notes-type="competinginterests"><title>Competing interests</title>

      <p id="d2e3871">The contact author has declared that none of the authors has any competing interests.</p>
  </notes><notes notes-type="disclaimer"><title>Disclaimer</title>

      <p id="d2e3877">Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. The authors bear the ultimate responsibility for providing appropriate place names. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.</p>
  </notes><ack><title>Acknowledgements</title><p id="d2e3883">This research was funded through “Horizon Europe funding program under Grant Agreement number 101081728” <ext-link xlink:href="https://doi.org/10.3030/101081728" ext-link-type="DOI">10.3030/101081728</ext-link>, funded by the European Commision, as a part of the “Innovative tools to control organic matter and disinfection byproducts in drinking water” (intoDBP) project <uri>https://intodbp.eu/</uri> (last access: April 2026). DMB and RM were funded by the Catalan Water Agency (Agència Catalana de l'Aigua – ACA) as a part of the NEREIDA project number RDI001/24/000044 (subvenció R+D+I de la convocatòria ACC/1362/2024).</p></ack><notes notes-type="financialsupport"><title>Financial support</title>

      <p id="d2e3895">This research has been supported by the EU HORIZON EUROPE Food, Bioeconomy, Natural Resources, Agriculture and Environment (grant no. 101081728).The article processing charges for this open-access publication were covered by the CSIC Open Access Publication Support Initiative through its Unit of Information Resources for Research (URICI).</p>
  </notes><notes notes-type="reviewstatement"><title>Review statement</title>

      <p id="d2e3906">This paper was edited by Bertrand Guenet and reviewed by Thelma Panaïotis, Shuo Chen, and one anonymous referee.</p>
  </notes><ref-list>
    <title>References</title>

      <ref id="bib1.bibx1"><label>Asadollah et al.(2025)Asadollah, Safaeinia, Jarahizadeh, Alcalá, Sharafati, and Jodar-Abellan</label><mixed-citation>Asadollah, S. B. H. S., Safaeinia, A., Jarahizadeh, S., Alcalá, F. J., Sharafati, A., and Jodar-Abellan, A.: Dissolved organic carbon estimation in lakes: Improving machine learning with data augmentation on fusion of multi-sensor remote sensing observations, Water Research, 277, 123350, <ext-link xlink:href="https://doi.org/10.1016/j.watres.2025.123350" ext-link-type="DOI">10.1016/j.watres.2025.123350</ext-link>, 2025.</mixed-citation></ref>
      <ref id="bib1.bibx2"><label>Bhateria and Jain(2016)</label><mixed-citation>Bhateria, R. and Jain, D.: Water quality assessment of lake water: A review, Sustainable Water Resources Management, 2, 161–173, <ext-link xlink:href="https://doi.org/10.1007/s40899-015-0014-7" ext-link-type="DOI">10.1007/s40899-015-0014-7</ext-link>, 2016.</mixed-citation></ref>
      <ref id="bib1.bibx3"><label>Biau and Scornet(2016)</label><mixed-citation> Biau, G. and Scornet, E.: A random forest guided tour, Test, 25, 197–227, 2016.</mixed-citation></ref>
      <ref id="bib1.bibx4"><label>Breiman(2001)</label><mixed-citation>Breiman, L.: Random Forests, Mach. Learn., 45, 5–32, <ext-link xlink:href="https://doi.org/10.1023/A:1010933404324" ext-link-type="DOI">10.1023/A:1010933404324</ext-link>, 2001.</mixed-citation></ref>
      <ref id="bib1.bibx5"><label>Chen et al.(2024)Chen, Chen, Yao, He, Zhang, Li, and Lin</label><mixed-citation>Chen, C., Chen, Q., Yao, S., He, M., Zhang, J., Li, G., and Lin, Y.: Combining physical-based model and machine learning to forecast chlorophyll-<inline-formula><mml:math id="M114" display="inline"><mml:mi>a</mml:mi></mml:math></inline-formula> concentration in freshwater lakes, Sci. Total Environ., 907, 168097, <ext-link xlink:href="https://doi.org/10.1016/j.scitotenv.2023.168097" ext-link-type="DOI">10.1016/j.scitotenv.2023.168097</ext-link>, 2024.</mixed-citation></ref>
      <ref id="bib1.bibx6"><label>Chen and Guestrin(2016)</label><mixed-citation>Chen, T. and Guestrin, C.: XGBoost: A Scalable Tree Boosting System, in: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,  785–794, <ext-link xlink:href="https://doi.org/10.1145/2939672.2939785" ext-link-type="DOI">10.1145/2939672.2939785</ext-link>, 2016.</mixed-citation></ref>
      <ref id="bib1.bibx7"><label>Cortes and Vapnik(1995)</label><mixed-citation>Cortes, C. and Vapnik, V.: Support-vector networks, Mach. Learn., 20, 273–297, <ext-link xlink:href="https://doi.org/10.1007/BF00994018" ext-link-type="DOI">10.1007/BF00994018</ext-link>, 1995.</mixed-citation></ref>
      <ref id="bib1.bibx8"><label>Creed et al.(2018)Creed, Bergström, Trick, Grimm, Hessen, Karlsson, Kidd, Kritzberg, McKnight, Freeman, Senar, Andersson, Ask, Berggren, Cherif, Giesler, Hotchkiss, Kortelainen, Palta, and Weyhenmeyer</label><mixed-citation>Creed, I. F., Bergström, A.-K., Trick, C. G., Grimm, N. B., Hessen, D. O., Karlsson, J., Kidd, K. A., Kritzberg, E., McKnight, D. M., Freeman, E. C., Senar, O. E., Andersson, A., Ask, J., Berggren, M., Cherif, M., Giesler, R., Hotchkiss, E. R., Kortelainen, P., Palta, M. M., and Weyhenmeyer, G. A.: Global change-driven effects on dissolved organic matter composition: Implications for food webs of northern lakes, Glob. Change Biol., 24, 3692–3714, <ext-link xlink:href="https://doi.org/10.1111/gcb.14129" ext-link-type="DOI">10.1111/gcb.14129</ext-link>, 2018.</mixed-citation></ref>
      <ref id="bib1.bibx9"><label>Downing et al.(2012)Downing, Pellerin, Bergamaschi, Saraceno, and Kraus</label><mixed-citation>Downing, B. D., Pellerin, B. A., Bergamaschi, B. A., Saraceno, J. F., and Kraus, T. E. C.: Seeing the light: The effects of particles, dissolved materials, and temperature on in situ measurements of DOM fluorescence in rivers and streams, Limnol. Oceanogr.-Meth., 10, 767–775, <ext-link xlink:href="https://doi.org/10.4319/lom.2012.10.767" ext-link-type="DOI">10.4319/lom.2012.10.767</ext-link>, 2012.</mixed-citation></ref>
      <ref id="bib1.bibx10"><label>Doyle et al.(2019)Doyle, de Eyto, Dillane, Poole, McCarthy, Ryder, and Jennings</label><mixed-citation>Doyle, B. C., de Eyto, E., Dillane, M., Poole, R., McCarthy, V., Ryder, E., and Jennings, E.: Synchrony in catchment stream colour levels is driven by both local and regional climate, Biogeosciences, 16, 1053–1071, <ext-link xlink:href="https://doi.org/10.5194/bg-16-1053-2019" ext-link-type="DOI">10.5194/bg-16-1053-2019</ext-link>, 2019.</mixed-citation></ref>
      <ref id="bib1.bibx11"><label>Duan et al.(2025)Duan, Cao, Luo, and Shen</label><mixed-citation>Duan, H., Cao, Z., Luo, J., and Shen, M.: AI-driven opportunities and challenges in lake remote sensing, Information Geography, 100014, <ext-link xlink:href="https://doi.org/10.1016/j.infgeo.2025.100014" ext-link-type="DOI">10.1016/j.infgeo.2025.100014</ext-link>, 2025.</mixed-citation></ref>
      <ref id="bib1.bibx12"><label>Dubeux et al.(2024)Dubeux, Lira Junior, Simili, Bretas, Trumpp, Bizzuti, Garcia, Oduor, Queiroz, Acuña, and Mendes</label><mixed-citation>Dubeux, J. C. B., Lira Junior, M. D. A., Simili, F. F., Bretas, I. L., Trumpp, K. R., Bizzuti, B. E., Garcia, L., Oduor, K. T., Queiroz, L. M. D., Acuña, J. P., and Mendes, C. T. E.: Deep soil organic carbon: A review, CABI Reviews, 19, <ext-link xlink:href="https://doi.org/10.1079/cabireviews.2024.0024" ext-link-type="DOI">10.1079/cabireviews.2024.0024</ext-link>, 2024.</mixed-citation></ref>
      <ref id="bib1.bibx13"><label>EEA(2020)</label><mixed-citation>European Environment Agency (EEA): CORINE Land Cover 2018 (vector/raster 100 m), Europe, 6-yearly – version 2020_20u1, European Environment Agency (EEA), <ext-link xlink:href="https://doi.org/10.2909/71c95a07-e296-44fc-b22b-415f42acfdf0" ext-link-type="DOI">10.2909/71c95a07-e296-44fc-b22b-415f42acfdf0</ext-link>, 2020.</mixed-citation></ref>
      <ref id="bib1.bibx14"><label>Fix(1985)</label><mixed-citation>Fix, E.: Discriminatory Analysis: Nonparametric Discrimination, Consistency Properties, Tech. rep., USAF School of Aviation Medicine,   <ext-link xlink:href="https://doi.org/10.1037/e471672008-001" ext-link-type="DOI">10.1037/e471672008-001</ext-link>, 1985.</mixed-citation></ref>
      <ref id="bib1.bibx15"><label>Gobler(2020)</label><mixed-citation>Gobler, C. J.: Climate Change and Harmful Algal Blooms: Insights and perspective, Harmful Algae, 91, 101731, <ext-link xlink:href="https://doi.org/10.1016/j.hal.2019.101731" ext-link-type="DOI">10.1016/j.hal.2019.101731</ext-link>, 2020.</mixed-citation></ref>
      <ref id="bib1.bibx16"><label>Hanson et al.(2020)Hanson, Stillman, Jia, Karpatne, Dugan, Carey, Stachelek, Ward, Zhang, Read, and Kumar</label><mixed-citation>Hanson, P. C., Stillman, A. B., Jia, X., Karpatne, A., Dugan, H. A., Carey, C. C., Stachelek, J., Ward, N. K., Zhang, Y., Read, J. S., and Kumar, V.: Predicting lake surface water phosphorus dynamics using process-guided machine learning, Ecol. Model., 430, 109136, <ext-link xlink:href="https://doi.org/10.1016/j.ecolmodel.2020.109136" ext-link-type="DOI">10.1016/j.ecolmodel.2020.109136</ext-link>, 2020.</mixed-citation></ref>
      <ref id="bib1.bibx17"><label>Harkort and Duan(2023)</label><mixed-citation>Harkort, L. and Duan, Z.: Estimation of dissolved organic carbon from inland waters at a large scale using satellite data and machine learning methods, Water Res., 229, 119478, <ext-link xlink:href="https://doi.org/10.1016/j.watres.2022.119478" ext-link-type="DOI">10.1016/j.watres.2022.119478</ext-link>, 2023.</mixed-citation></ref>
      <ref id="bib1.bibx18"><label>Hersbach et al.(Accessed on July 2025)Hersbach, Bell, Berrisford, Biavati, Horányi, Muñoz Sabater, Nicolas, Peubey, Radu, Rozum, Schepers, Simmons, Soci, Dee, and Thépaut</label><mixed-citation>Hersbach, H., Bell, B., Berrisford, P., Biavati, G., Horányi, A., Muñoz Sabater, J., Nicolas, J., Peubey, C., Radu, R., Rozum, I., Schepers, D., Simmons, A., Soci, C., Dee, D., and Thépaut, J.-N.: ERA5 hourly data on single levels from 1940 to present, Copernicus Climate Change Service (C3S) Climate Data Store (CDS) [data set], <ext-link xlink:href="https://doi.org/10.24381/cds.adbb2d47" ext-link-type="DOI">10.24381/cds.adbb2d47</ext-link>, 2025.</mixed-citation></ref>
      <ref id="bib1.bibx19"><label>Herzsprung et al.(2020)Herzsprung, Wentzky, Kamjunke, von Tümpling, Wilske, Friese, Boehrer, Reemtsma, Rinke, and Lechtenfeld</label><mixed-citation>Herzsprung, P., Wentzky, V., Kamjunke, N., von Tümpling, W., Wilske, C., Friese, K., Boehrer, B., Reemtsma, T., Rinke, K., and Lechtenfeld, O. J.: Improved Understanding of Dissolved Organic Matter Processing in Freshwater Using Complementary Experimental and Machine Learning Approaches, Environ. Sci. Technol., 54, 13556–13565, <ext-link xlink:href="https://doi.org/10.1021/acs.est.0c02383" ext-link-type="DOI">10.1021/acs.est.0c02383</ext-link>, 2020.</mixed-citation></ref>
      <ref id="bib1.bibx20"><label>Hipsey et al.(2019)Hipsey, Bruce, Boon, Busch, Carey, Hamilton, Hanson, Read, de Sousa, Weber, and Winslow</label><mixed-citation>Hipsey, M. R., Bruce, L. C., Boon, C., Busch, B., Carey, C. C., Hamilton, D. P., Hanson, P. C., Read, J. S., de Sousa, E., Weber, M., and Winslow, L. A.: A General Lake Model (GLM 3.0) for linking with high-frequency sensor data from the Global Lake Ecological Observatory Network (GLEON), Geosci. Model Dev., 12, 473–523, <ext-link xlink:href="https://doi.org/10.5194/gmd-12-473-2019" ext-link-type="DOI">10.5194/gmd-12-473-2019</ext-link>, 2019.</mixed-citation></ref>
      <ref id="bib1.bibx21"><label>Hollister et al.(2016)Hollister, Milstead, and Kreakie</label><mixed-citation>Hollister, J. W., Milstead, W. B., and Kreakie, B. J.: Modeling lake trophic state: A random forest approach, Ecosphere, 7, e01321, <ext-link xlink:href="https://doi.org/10.1002/ecs2.1321" ext-link-type="DOI">10.1002/ecs2.1321</ext-link>, 2016.</mixed-citation></ref>
      <ref id="bib1.bibx22"><label>Jennings et al.(2020)Jennings, de Eyto, Moore, Dillane, Ryder, Allott, Nic Aonghusa, Rouen, Poole, and Pierson</label><mixed-citation>Jennings, E., de Eyto, E., Moore, T., Dillane, M., Ryder, E., Allott, N., Nic Aonghusa, C., Rouen, M., Poole, R., and Pierson, D. C.: From Highs to Lows: Changes in Dissolved Organic Carbon in a Peatland Catchment and Lake Following Extreme Flow Events, Water, 12, 2843, <ext-link xlink:href="https://doi.org/10.3390/w12102843" ext-link-type="DOI">10.3390/w12102843</ext-link>, 2020.</mixed-citation></ref>
      <ref id="bib1.bibx23"><label>Kalbitz et al.(2000)Kalbitz, Solinger, Park, Michalzik, and Matzner</label><mixed-citation> Kalbitz, K., Solinger, S., Park, J.-H., Michalzik, B., and Matzner, E.: Controls on the dynamics of dissolved organic matter in soils: A review, Soil Sci., 165, 277–300, 2000.</mixed-citation></ref>
      <ref id="bib1.bibx24"><label>Ke et al.(2017)Ke, Meng, Finley, Wang, Chen, Ma, Ye, and Liu</label><mixed-citation>Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., and Liu, T.-Y.: LightGBM: A highly efficient gradient boosting decision tree, in: Proceedings of the 31st International Conference on Neural Information Processing Systems,  3149–3157, <uri>https://papers.nips.cc/paper/6907-lightgbm-a-highly-efficient-gradient-boosting-decision-tree</uri> (last access:  April 2026), 2017.</mixed-citation></ref>
      <ref id="bib1.bibx25"><label>Lake et al.(2000)Lake, Palmer, Biro, Cole, Covich, Dahm, Gibert, Goedkoop, Martens, and Verhoeven</label><mixed-citation>Lake, P. S., Palmer, M. A., Biro, P., Cole, J., Covich, A. P., Dahm, C., Gibert, J., Goedkoop, W., Martens, K., and Verhoeven, J.: Global Change and the Biodiversity of Freshwater Ecosystems: Impacts on Linkages between Above-Sediment and Sediment Biota, BioScience, 50, 1099–1107, <ext-link xlink:href="https://doi.org/10.1641/0006-3568(2000)050[1099:GCATBO]2.0.CO;2" ext-link-type="DOI">10.1641/0006-3568(2000)050[1099:GCATBO]2.0.CO;2</ext-link>, 2000.</mixed-citation></ref>
      <ref id="bib1.bibx26"><label>Li et al.(2014)Li, Zhao, Mao, Liu, and Qu</label><mixed-citation>Li, A., Zhao, X., Mao, R., Liu, H., and Qu, J.: Characterization of dissolved organic matter from surface waters with low to high dissolved organic carbon and the related disinfection byproduct formation potential, J. Hazard. Mater., 271, 228–235, <ext-link xlink:href="https://doi.org/10.1016/j.jhazmat.2014.02.009" ext-link-type="DOI">10.1016/j.jhazmat.2014.02.009</ext-link>, 2014.</mixed-citation></ref>
      <ref id="bib1.bibx27"><label>Li et al.(2016)Li, Yang, Wan, Dai, and Zhang</label><mixed-citation>Li, B., Yang, G., Wan, R., Dai, X., and Zhang, Y.: Comparison of random forests and other statistical methods for the prediction of lake water level: A case study of the Poyang Lake in China, Hydrol. Res., 47, 69–83, <ext-link xlink:href="https://doi.org/10.2166/nh.2016.264" ext-link-type="DOI">10.2166/nh.2016.264</ext-link>, 2016.</mixed-citation></ref>
      <ref id="bib1.bibx28"><label>Li et al.(2015)Li, del Giorgio, Parkes, and Prairie</label><mixed-citation>Li, M., del Giorgio, P. A., Parkes, A. H., and Prairie, Y. T.: The relative influence of topography and land cover on inorganic and organic carbon exports from catchments in southern Quebec, Canada, J. Geophys. Res.-Biogeo., 120, 2562–2578, <ext-link xlink:href="https://doi.org/10.1002/2015JG003073" ext-link-type="DOI">10.1002/2015JG003073</ext-link>, 2015.</mixed-citation></ref>
      <ref id="bib1.bibx29"><label>Liu et al.(2021)Liu, Yu, Xiao, Qi, and Duan</label><mixed-citation>Liu, D., Yu, S., Xiao, Q., Qi, T., and Duan, H.: Satellite estimation of dissolved organic carbon in eutrophic Lake Taihu, China, Remote Sens. Environ., 264, 112572, <ext-link xlink:href="https://doi.org/10.1016/j.rse.2021.112572" ext-link-type="DOI">10.1016/j.rse.2021.112572</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bibx30"><label>Marcé et al.(2021)Marcé, Verdura, and Leung</label><mixed-citation>Marcé, R., Verdura, L., and Leung, N.: Dissolved organic matter spectroscopy reveals a hot spot of organic matter changes at the river–reservoir boundary, Aquat. Sci., 83, 67, <ext-link xlink:href="https://doi.org/10.1007/s00027-021-00823-6" ext-link-type="DOI">10.1007/s00027-021-00823-6</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bibx31"><label>McCullough et al.(2018)McCullough, Dugan, Farrell, Morales-Williams, Ouyang, Roberts, Scordo, Bartlett, Burke, Doubek et al.</label><mixed-citation> McCullough, I. M., Dugan, H. A., Farrell, K. J., Morales-Williams, A. M., Ouyang, Z., Roberts, D., Scordo, F.,  Bartlett, S. L.,  Burke, S. M., Doubek, J. P.,  Krivak-Tetley, F. E.,  Skaff, N. K., Summers, J. C.,  Weathers, K. C., and Hanson, P. C.: Dynamic modeling of organic carbon fates in lake ecosystems, Ecol. Model., 386, 71–82, 2018.</mixed-citation></ref>
      <ref id="bib1.bibx32"><label>Mercado-Bettín(2026)</label><mixed-citation>Mercado-Bettín, D.:  Repository: A machine learning approach to driver attribution of dissolved organic matter dynamics in two contrasting freshwater systems (v1.0.0), Zenodo [data set] and [code],  <ext-link xlink:href="https://doi.org/10.5281/zenodo.19354955" ext-link-type="DOI">10.5281/zenodo.19354955</ext-link>, 2026.</mixed-citation></ref>
      <ref id="bib1.bibx33"><label>Mercado-Bettín et al.(2021)Mercado-Bettín, Clayer, Shikhani, Moore, Frías, Jackson-Blake, Sample, Iturbide, Herrera, French, Norling, Rinke, and Marcé</label><mixed-citation>Mercado-Bettín, D., Clayer, F., Shikhani, M., Moore, T. N., Frías, M. D., Jackson-Blake, L., Sample, J., Iturbide, M., Herrera, S., French, A. S., Norling, M. D., Rinke, K., and Marcé, R.: Forecasting water temperature in lakes and reservoirs using seasonal climate prediction, Water Res., 201, 117286, <ext-link xlink:href="https://doi.org/10.1016/j.watres.2021.117286" ext-link-type="DOI">10.1016/j.watres.2021.117286</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bibx34"><label>Mi et al.(2024)Mi, Tilahun, Flörke, Dürr, and Rinke</label><mixed-citation>Mi, C., Tilahun, A. B., Flörke, M., Dürr, H. H., and Rinke, K.: Climate warming effects in stratified reservoirs: Thorough assessment for opportunities and limits of machine learning techniques versus process-based models in thermal structure projections, J. Clean. Prod., 454, 142347, <ext-link xlink:href="https://doi.org/10.1016/j.jclepro.2024.142347" ext-link-type="DOI">10.1016/j.jclepro.2024.142347</ext-link>, 2024.</mixed-citation></ref>
      <ref id="bib1.bibx35"><label>Molnar(2020)</label><mixed-citation> Molnar, C.: Interpretable Machine Learning: A Guide for Making Black Box Models Explainable, Lulu.com, ISBN 978-0-244-76852-2, 2020.</mixed-citation></ref>
      <ref id="bib1.bibx36"><label>Müller et al.(2024)Müller, D'Andrilli, Silverman, Bier, Barnard, Lee, Richard, Tanentzap, Wang, de Melo, and Lu</label><mixed-citation>Müller, M., D'Andrilli, J., Silverman, V., Bier, R. L., Barnard, M. A., Lee, M. C. M., Richard, F., Tanentzap, A. J., Wang, J., de Melo, M., and Lu, Y.: Machine-learning based approach to examine ecological processes influencing the diversity of riverine dissolved organic matter composition, Frontiers in Water, 6, <ext-link xlink:href="https://doi.org/10.3389/frwa.2024.1379284" ext-link-type="DOI">10.3389/frwa.2024.1379284</ext-link>, 2024.</mixed-citation></ref>
      <ref id="bib1.bibx37"><label>Nearing et al.(2021)Nearing, Kratzert, Sampson, Pelissier, Klotz, Frame, Prieto, and Gupta</label><mixed-citation>Nearing, G. S., Kratzert, F., Sampson, A. K., Pelissier, C. S., Klotz, D., Frame, J. M., Prieto, C., and Gupta, H. V.: What Role Does Hydrological Science Play in the Age of Machine Learning?, Water Resour. Res., 57, e2020WR028091, <ext-link xlink:href="https://doi.org/10.1029/2020WR028091" ext-link-type="DOI">10.1029/2020WR028091</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bibx38"><label>Paíz et al.(2025a)Paíz, Pierson, Lindqvist, Naden, de Eyto, Dillane, McCarthy, Linnane, and Jennings</label><mixed-citation>Paíz, R., Pierson, D. C., Lindqvist, K., Naden, P. S., de Eyto, E., Dillane, M., McCarthy, V., Linnane, S., and Jennings, E.: Accounting for model parameter uncertainty provides more robust projections of dissolved organic carbon dynamics to aid drinking water management, Water Res., 276, 123238, <ext-link xlink:href="https://doi.org/10.1016/j.watres.2025.123238" ext-link-type="DOI">10.1016/j.watres.2025.123238</ext-link>, 2025a.</mixed-citation></ref>
      <ref id="bib1.bibx39"><label>Paíz et al.(2025b)Paíz, Thomas, Carey, de Eyto, Jones, Delany, Poole, Nixon, Dillane, McCarthy, Linnane, and Jennings</label><mixed-citation>Paíz, R., Thomas, R. Q., Carey, C. C., de Eyto, E., Jones, I. D., Delany, A. D., Poole, R., Nixon, P., Dillane, M., McCarthy, V., Linnane, S., and Jennings, E.: Near-term lake water temperature forecasts can be used to anticipate the ecological dynamics of freshwater species, Ecosphere, 16, e70335, <ext-link xlink:href="https://doi.org/10.1002/ecs2.70335" ext-link-type="DOI">10.1002/ecs2.70335</ext-link>, 2025b.</mixed-citation></ref>
      <ref id="bib1.bibx40"><label>Prokhorenkova et al.(2019)Prokhorenkova, Gusev, Vorobev, Dorogush, and Gulin</label><mixed-citation>Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A. V., and Gulin, A.: CatBoost: Unbiased boosting with categorical features, arXiv [preprint], <ext-link xlink:href="https://doi.org/10.48550/arXiv.1706.09516" ext-link-type="DOI">10.48550/arXiv.1706.09516</ext-link>, 2019.</mixed-citation></ref>
      <ref id="bib1.bibx41"><label>Qi(2012)</label><mixed-citation>Qi, Y.: Random Forest for Bioinformatics, Springer, 307–323, <ext-link xlink:href="https://doi.org/10.1007/978-1-4419-9326-7_11" ext-link-type="DOI">10.1007/978-1-4419-9326-7_11</ext-link>, 2012.</mixed-citation></ref>
      <ref id="bib1.bibx42"><label>Regier et al.(2023)Regier, Duggan, Myers-Pigg, and Ward</label><mixed-citation>Regier, P., Duggan, M., Myers-Pigg, A., and Ward, N.: Effects of random forest modeling decisions on biogeochemical time series predictions, Limnol. Oceanogr.-Meth., 21, 40–52, <ext-link xlink:href="https://doi.org/10.1002/lom3.10523" ext-link-type="DOI">10.1002/lom3.10523</ext-link>, 2023.</mixed-citation></ref>
      <ref id="bib1.bibx43"><label>Rodriguez-Galiano et al.(2015)Rodriguez-Galiano, Sanchez-Castillo, Chica-Olmo, and Chica-Rivas</label><mixed-citation>Rodriguez-Galiano, V., Sanchez-Castillo, M., Chica-Olmo, M., and Chica-Rivas, M.: Machine learning predictive models for mineral prospectivity: An evaluation of neural networks, random forest, regression trees and support vector machines, Ore Geol. Rev., 71, 804–818, <ext-link xlink:href="https://doi.org/10.1016/j.oregeorev.2015.01.001" ext-link-type="DOI">10.1016/j.oregeorev.2015.01.001</ext-link>, 2015.</mixed-citation></ref>
      <ref id="bib1.bibx44"><label>Rumpel and Kögel-Knabner(2011)</label><mixed-citation>Rumpel, C. and Kögel-Knabner, I.: Deep soil organic matter – A key but poorly understood component of terrestrial C cycle, Plant  Soil, 338, 143–158, <ext-link xlink:href="https://doi.org/10.1007/s11104-010-0391-5" ext-link-type="DOI">10.1007/s11104-010-0391-5</ext-link>, 2011.</mixed-citation></ref>
      <ref id="bib1.bibx45"><label>Ryder et al.(2012)Ryder, Jennings, de Eyto, Dillane, NicAonghusa, Pierson, Moore, Rouen, and Poole</label><mixed-citation>Ryder, E., Jennings, E., de Eyto, E., Dillane, M., NicAonghusa, C., Pierson, D. C., Moore, K., Rouen, M., and Poole, R.: Temperature quenching of CDOM fluorescence sensors: Temporal and spatial variability in the temperature response and a recommended temperature correction equation, Limnol. Oceanogr.-Meth., 10, 1004–1010, <ext-link xlink:href="https://doi.org/10.4319/lom.2012.10.1004" ext-link-type="DOI">10.4319/lom.2012.10.1004</ext-link>, 2012.</mixed-citation></ref>
      <ref id="bib1.bibx46"><label>Ryder et al.(2014)Ryder, de Eyto, Dillane, Poole, and Jennings</label><mixed-citation>Ryder, E., de Eyto, E., Dillane, M., Poole, R., and Jennings, E.: Identifying the role of environmental drivers in organic carbon export from a forested peat catchment, Sci. Total Environ., 490, 28–36, <ext-link xlink:href="https://doi.org/10.1016/j.scitotenv.2014.04.091" ext-link-type="DOI">10.1016/j.scitotenv.2014.04.091</ext-link>, 2014. </mixed-citation></ref>
      <ref id="bib1.bibx47"><label>Solomon et al.(2015)Solomon, Jones, Weidel, Buffam, Fork, Karlsson, Larsen, Lennon, Read, Sadro, and Saros</label><mixed-citation>Solomon, C. T., Jones, S. E., Weidel, B., Buffam, I., Fork, M. L., Karlsson, J., Larsen, S., Lennon, J. T., Read, J. S., Sadro, S., and Saros, J. E.: Ecosystem consequences of changing inputs of terrestrial dissolved organic matter to lakes: Current knowledge and future challenges, Ecosystems, 18, 376–389, <ext-link xlink:href="https://doi.org/10.1007/s10021-015-9848-y" ext-link-type="DOI">10.1007/s10021-015-9848-y</ext-link>, 2015.</mixed-citation></ref>
      <ref id="bib1.bibx48"><label>Sullivan(2022)</label><mixed-citation>Sullivan, E.: Understanding from Machine Learning Models, Brit. J. Philos. Sci., 73, 109–133, <ext-link xlink:href="https://doi.org/10.1093/bjps/axz035" ext-link-type="DOI">10.1093/bjps/axz035</ext-link>, 2022.</mixed-citation></ref>
      <ref id="bib1.bibx49"><label>Toming et al.(2020)Toming, Kotta, Uuemaa, Sobek, Kutser, and Tranvik</label><mixed-citation>Toming, K., Kotta, J., Uuemaa, E., Sobek, S., Kutser, T., and Tranvik, L. J.: Predicting lake dissolved organic carbon at a global scale, Sci. Rep., 10, 8471, <ext-link xlink:href="https://doi.org/10.1038/s41598-020-65010-3" ext-link-type="DOI">10.1038/s41598-020-65010-3</ext-link>, 2020.</mixed-citation></ref>
      <ref id="bib1.bibx50"><label>Šimek et al.(2011)Šimek, Comerma, García, Nedoma, Marcé, and Armengol</label><mixed-citation>Šimek, K., Comerma, M., García, J.-C., Nedoma, J., Marcé, R., and Armengol, J.: The Effect of River Water Circulation on the Distribution and Functioning of Reservoir Microbial Communities as Determined by a Relative Distance Approach, Ecosystems, 14, 1–14, <ext-link xlink:href="https://doi.org/10.1007/s10021-010-9388-4" ext-link-type="DOI">10.1007/s10021-010-9388-4</ext-link>, 2011.</mixed-citation></ref>
      <ref id="bib1.bibx51"><label>Watras et al.(2011)Watras, Hanson, Stacy, Morrison, Mather, Hu, and Milewski</label><mixed-citation> Watras, C., Hanson, P., Stacy, T., Morrison, K., Mather, J., Hu, Y.-H., and Milewski, P.: A temperature compensation method for CDOM fluorescence sensors in freshwater, Limnol. Oceanogr.-Meth., 9, 296–301, 2011.</mixed-citation></ref>
      <ref id="bib1.bibx52"><label>Weyhenmeyer and Karlsson(2009)</label><mixed-citation>Weyhenmeyer, G. A. and Karlsson, J.: Nonlinear response of dissolved organic carbon concentrations in boreal lakes to increasing temperatures, Limnol. Oceanogr., 54, 2513–2519, <ext-link xlink:href="https://doi.org/10.4319/lo.2009.54.6_part_2.2513" ext-link-type="DOI">10.4319/lo.2009.54.6_part_2.2513</ext-link>, 2009.</mixed-citation></ref>
      <ref id="bib1.bibx53"><label>Xenopoulos et al.(2021)Xenopoulos, Barnes, Boodoo, Butman, Catalán, D'Amario, Fasching, Kothawala, Pisani, Solomon, Spencer, Williams, and Wilson</label><mixed-citation>Xenopoulos, M. A., Barnes, R. T., Boodoo, K. S., Butman, D., Catalán, N., D'Amario, S. C., Fasching, C., Kothawala, D. N., Pisani, O., Solomon, C. T., Spencer, R. G. M., Williams, C. J., and Wilson, H. F.: How humans alter dissolved organic matter composition in freshwater: Relevance for the Earth's biogeochemistry, Biogeochemistry, 154, 323–348, <ext-link xlink:href="https://doi.org/10.1007/s10533-021-00753-3" ext-link-type="DOI">10.1007/s10533-021-00753-3</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bibx54"><label>Zhang et al.(2024)Zhang, Shi, Wang, Wang, Zhang, Qin, Zhu, Dong, and Zhang</label><mixed-citation>Zhang, D., Shi, K., Wang, W., Wang, X., Zhang, Y., Qin, B., Zhu, M., Dong, B., and Zhang, Y.: An optical mechanism-based deep learning approach for deriving water trophic state of China's lakes from Landsat images, Water Res., 252, 121181, <ext-link xlink:href="https://doi.org/10.1016/j.watres.2024.121181" ext-link-type="DOI">10.1016/j.watres.2024.121181</ext-link>, 2024.</mixed-citation></ref>
      <ref id="bib1.bibx55"><label>Zhang et al.(2021)Zhang, Yao, Wu, Huang, Zhou, Yang, and Liu</label><mixed-citation>Zhang, Y., Yao, X., Wu, Q., Huang, Y., Zhou, Z., Yang, J., and Liu, X.: Turbidity prediction of lake-type raw water using random forest model based on meteorological data: A case study of Tai lake, China, J. Environ. Manage., 290, 112657, <ext-link xlink:href="https://doi.org/10.1016/j.jenvman.2021.112657" ext-link-type="DOI">10.1016/j.jenvman.2021.112657</ext-link>, 2021.</mixed-citation></ref>

  </ref-list></back>
    <!--<article-title-html>A machine learning approach to driver attribution of dissolved organic matter dynamics in two contrasting freshwater systems</article-title-html>
<abstract-html/>
<ref-html id="bib1.bib1"><label>Asadollah et al.(2025)Asadollah, Safaeinia, Jarahizadeh, Alcalá,
Sharafati, and Jodar-Abellan</label><mixed-citation>
      
Asadollah, S. B. H. S., Safaeinia, A., Jarahizadeh, S., Alcalá, F. J.,
Sharafati, A., and Jodar-Abellan, A.: Dissolved organic carbon estimation in
lakes: Improving machine learning with data augmentation on fusion of
multi-sensor remote sensing observations, Water Research, 277, 123350, <a href="https://doi.org/10.1016/j.watres.2025.123350" target="_blank">https://doi.org/10.1016/j.watres.2025.123350</a>,
2025.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib2"><label>Bhateria and Jain(2016)</label><mixed-citation>
      
Bhateria, R. and Jain, D.: Water quality assessment of lake water: A review,
Sustainable Water Resources Management, 2, 161–173,
<a href="https://doi.org/10.1007/s40899-015-0014-7" target="_blank">https://doi.org/10.1007/s40899-015-0014-7</a>, 2016.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib3"><label>Biau and Scornet(2016)</label><mixed-citation>
      
Biau, G. and Scornet, E.: A random forest guided tour, Test, 25, 197–227,
2016.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib4"><label>Breiman(2001)</label><mixed-citation>
      
Breiman, L.: Random Forests, Mach. Learn., 45, 5–32,
<a href="https://doi.org/10.1023/A:1010933404324" target="_blank">https://doi.org/10.1023/A:1010933404324</a>, 2001.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib5"><label>Chen et al.(2024)Chen, Chen, Yao, He, Zhang, Li, and Lin</label><mixed-citation>
      
Chen, C., Chen, Q., Yao, S., He, M., Zhang, J., Li, G., and Lin, Y.: Combining
physical-based model and machine learning to forecast chlorophyll-<i>a</i>
concentration in freshwater lakes, Sci. Total Environ., 907,
168097, <a href="https://doi.org/10.1016/j.scitotenv.2023.168097" target="_blank">https://doi.org/10.1016/j.scitotenv.2023.168097</a>, 2024.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib6"><label>Chen and Guestrin(2016)</label><mixed-citation>
      
Chen, T. and Guestrin, C.: XGBoost: A Scalable Tree Boosting System, in:
Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge
Discovery and Data Mining,  785–794, <a href="https://doi.org/10.1145/2939672.2939785" target="_blank">https://doi.org/10.1145/2939672.2939785</a>, 2016.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib7"><label>Cortes and Vapnik(1995)</label><mixed-citation>
      
Cortes, C. and Vapnik, V.: Support-vector networks, Mach. Learn., 20,
273–297, <a href="https://doi.org/10.1007/BF00994018" target="_blank">https://doi.org/10.1007/BF00994018</a>, 1995.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib8"><label>Creed et al.(2018)Creed, Bergström, Trick, Grimm, Hessen,
Karlsson, Kidd, Kritzberg, McKnight, Freeman, Senar, Andersson, Ask,
Berggren, Cherif, Giesler, Hotchkiss, Kortelainen, Palta, and
Weyhenmeyer</label><mixed-citation>
      
Creed, I. F., Bergström, A.-K., Trick, C. G., Grimm, N. B., Hessen, D. O.,
Karlsson, J., Kidd, K. A., Kritzberg, E., McKnight, D. M., Freeman, E. C.,
Senar, O. E., Andersson, A., Ask, J., Berggren, M., Cherif, M., Giesler, R.,
Hotchkiss, E. R., Kortelainen, P., Palta, M. M., and Weyhenmeyer, G. A.:
Global change-driven effects on dissolved organic matter composition:
Implications for food webs of northern lakes, Glob. Change Biol., 24,
3692–3714, <a href="https://doi.org/10.1111/gcb.14129" target="_blank">https://doi.org/10.1111/gcb.14129</a>, 2018.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib9"><label>Downing et al.(2012)Downing, Pellerin, Bergamaschi, Saraceno, and
Kraus</label><mixed-citation>
      
Downing, B. D., Pellerin, B. A., Bergamaschi, B. A., Saraceno, J. F., and
Kraus, T. E. C.: Seeing the light: The effects of particles, dissolved
materials, and temperature on in situ measurements of DOM fluorescence in
rivers and streams, Limnol. Oceanogr.-Meth., 10, 767–775,
<a href="https://doi.org/10.4319/lom.2012.10.767" target="_blank">https://doi.org/10.4319/lom.2012.10.767</a>, 2012.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib10"><label>Doyle et al.(2019)Doyle, de Eyto, Dillane, Poole, McCarthy, Ryder,
and Jennings</label><mixed-citation>
      
Doyle, B. C., de Eyto, E., Dillane, M., Poole, R., McCarthy, V., Ryder, E., and Jennings, E.: Synchrony in catchment stream colour levels is driven by both local and regional climate, Biogeosciences, 16, 1053–1071, <a href="https://doi.org/10.5194/bg-16-1053-2019" target="_blank">https://doi.org/10.5194/bg-16-1053-2019</a>, 2019.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib11"><label>Duan et al.(2025)Duan, Cao, Luo, and Shen</label><mixed-citation>
      
Duan, H., Cao, Z., Luo, J., and Shen, M.: AI-driven opportunities and
challenges in lake remote sensing, Information Geography, 100014,
<a href="https://doi.org/10.1016/j.infgeo.2025.100014" target="_blank">https://doi.org/10.1016/j.infgeo.2025.100014</a>, 2025.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib12"><label>Dubeux et al.(2024)Dubeux, Lira Junior, Simili, Bretas, Trumpp,
Bizzuti, Garcia, Oduor, Queiroz, Acuña, and Mendes</label><mixed-citation>
      
Dubeux, J. C. B., Lira Junior, M. D. A., Simili, F. F., Bretas, I. L., Trumpp,
K. R., Bizzuti, B. E., Garcia, L., Oduor, K. T., Queiroz, L. M. D.,
Acuña, J. P., and Mendes, C. T. E.: Deep soil organic carbon: A review,
CABI Reviews, 19, <a href="https://doi.org/10.1079/cabireviews.2024.0024" target="_blank">https://doi.org/10.1079/cabireviews.2024.0024</a>, 2024.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib13"><label>EEA(2020)</label><mixed-citation>
      
European Environment Agency (EEA):
CORINE Land Cover 2018 (vector/raster 100&thinsp;m), Europe, 6-yearly – version 2020_20u1, European Environment Agency (EEA),
<a href="https://doi.org/10.2909/71c95a07-e296-44fc-b22b-415f42acfdf0" target="_blank">https://doi.org/10.2909/71c95a07-e296-44fc-b22b-415f42acfdf0</a>,
2020.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib14"><label>Fix(1985)</label><mixed-citation>
      
Fix, E.: Discriminatory Analysis: Nonparametric Discrimination, Consistency
Properties, Tech. rep., USAF School of Aviation Medicine,   <a href="https://doi.org/10.1037/e471672008-001" target="_blank">https://doi.org/10.1037/e471672008-001</a>, 1985.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib15"><label>Gobler(2020)</label><mixed-citation>
      
Gobler, C. J.: Climate Change and Harmful Algal Blooms: Insights and
perspective, Harmful Algae, 91, 101731, <a href="https://doi.org/10.1016/j.hal.2019.101731" target="_blank">https://doi.org/10.1016/j.hal.2019.101731</a>,
2020.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib16"><label>Hanson et al.(2020)Hanson, Stillman, Jia, Karpatne, Dugan, Carey,
Stachelek, Ward, Zhang, Read, and Kumar</label><mixed-citation>
      
Hanson, P. C., Stillman, A. B., Jia, X., Karpatne, A., Dugan, H. A., Carey,
C. C., Stachelek, J., Ward, N. K., Zhang, Y., Read, J. S., and Kumar, V.:
Predicting lake surface water phosphorus dynamics using process-guided
machine learning, Ecol. Model., 430, 109136,
<a href="https://doi.org/10.1016/j.ecolmodel.2020.109136" target="_blank">https://doi.org/10.1016/j.ecolmodel.2020.109136</a>, 2020.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib17"><label>Harkort and Duan(2023)</label><mixed-citation>
      
Harkort, L. and Duan, Z.: Estimation of dissolved organic carbon from inland
waters at a large scale using satellite data and machine learning methods,
Water Res., 229, 119478, <a href="https://doi.org/10.1016/j.watres.2022.119478" target="_blank">https://doi.org/10.1016/j.watres.2022.119478</a>, 2023.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib18"><label>Hersbach et al.(Accessed on July 2025)Hersbach, Bell, Berrisford,
Biavati, Horányi, Muñoz Sabater, Nicolas, Peubey, Radu, Rozum,
Schepers, Simmons, Soci, Dee, and Thépaut</label><mixed-citation>
      
Hersbach, H., Bell, B., Berrisford, P., Biavati, G., Horányi, A.,
Muñoz Sabater, J., Nicolas, J., Peubey, C., Radu, R., Rozum, I.,
Schepers, D., Simmons, A., Soci, C., Dee, D., and Thépaut, J.-N.: ERA5
hourly data on single levels from 1940 to present, Copernicus Climate Change
Service (C3S) Climate Data Store (CDS) [data set], <a href="https://doi.org/10.24381/cds.adbb2d47" target="_blank">https://doi.org/10.24381/cds.adbb2d47</a>,
2025.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib19"><label>Herzsprung et al.(2020)Herzsprung, Wentzky, Kamjunke, von
Tümpling, Wilske, Friese, Boehrer, Reemtsma, Rinke, and
Lechtenfeld</label><mixed-citation>
      
Herzsprung, P., Wentzky, V., Kamjunke, N., von Tümpling, W., Wilske, C.,
Friese, K., Boehrer, B., Reemtsma, T., Rinke, K., and Lechtenfeld, O. J.:
Improved Understanding of Dissolved Organic Matter Processing in Freshwater
Using Complementary Experimental and Machine Learning Approaches,
Environ. Sci. Technol., 54, 13556–13565,
<a href="https://doi.org/10.1021/acs.est.0c02383" target="_blank">https://doi.org/10.1021/acs.est.0c02383</a>, 2020.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib20"><label>Hipsey et al.(2019)Hipsey, Bruce, Boon, Busch, Carey, Hamilton,
Hanson, Read, de Sousa, Weber, and Winslow</label><mixed-citation>
      
Hipsey, M. R., Bruce, L. C., Boon, C., Busch, B., Carey, C. C., Hamilton, D. P., Hanson, P. C., Read, J. S., de Sousa, E., Weber, M., and Winslow, L. A.: A General Lake Model (GLM 3.0) for linking with high-frequency sensor data from the Global Lake Ecological Observatory Network (GLEON), Geosci. Model Dev., 12, 473–523, <a href="https://doi.org/10.5194/gmd-12-473-2019" target="_blank">https://doi.org/10.5194/gmd-12-473-2019</a>, 2019.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib21"><label>Hollister et al.(2016)Hollister, Milstead, and
Kreakie</label><mixed-citation>
      
Hollister, J. W., Milstead, W. B., and Kreakie, B. J.: Modeling lake trophic
state: A random forest approach, Ecosphere, 7, e01321,
<a href="https://doi.org/10.1002/ecs2.1321" target="_blank">https://doi.org/10.1002/ecs2.1321</a>, 2016.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib22"><label>Jennings et al.(2020)Jennings, de Eyto, Moore, Dillane, Ryder,
Allott, Nic Aonghusa, Rouen, Poole, and Pierson</label><mixed-citation>
      
Jennings, E., de Eyto, E., Moore, T., Dillane, M., Ryder, E., Allott, N.,
Nic Aonghusa, C., Rouen, M., Poole, R., and Pierson, D. C.: From Highs to
Lows: Changes in Dissolved Organic Carbon in a Peatland Catchment and Lake
Following Extreme Flow Events, Water, 12, 2843, <a href="https://doi.org/10.3390/w12102843" target="_blank">https://doi.org/10.3390/w12102843</a>,
2020.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib23"><label>Kalbitz et al.(2000)Kalbitz, Solinger, Park, Michalzik, and
Matzner</label><mixed-citation>
      
Kalbitz, K., Solinger, S., Park, J.-H., Michalzik, B., and Matzner, E.:
Controls on the dynamics of dissolved organic matter in soils: A review, Soil
Sci., 165, 277–300, 2000.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib24"><label>Ke et al.(2017)Ke, Meng, Finley, Wang, Chen, Ma, Ye, and
Liu</label><mixed-citation>
      
Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., and Liu,
T.-Y.: LightGBM: A highly efficient gradient boosting decision tree, in:
Proceedings of the 31st International Conference on Neural Information
Processing Systems,  3149–3157, <a href="https://papers.nips.cc/paper/6907-lightgbm-a-highly-efficient-gradient-boosting-decision-tree" target="_blank"/> (last access:  April 2026), 2017.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib25"><label>Lake et al.(2000)Lake, Palmer, Biro, Cole, Covich, Dahm, Gibert,
Goedkoop, Martens, and Verhoeven</label><mixed-citation>
      
Lake, P. S., Palmer, M. A., Biro, P., Cole, J., Covich, A. P., Dahm, C.,
Gibert, J., Goedkoop, W., Martens, K., and Verhoeven, J.: Global Change and
the Biodiversity of Freshwater Ecosystems: Impacts on Linkages between
Above-Sediment and Sediment Biota, BioScience, 50, 1099–1107,
<a href="https://doi.org/10.1641/0006-3568(2000)050[1099:GCATBO]2.0.CO;2" target="_blank">https://doi.org/10.1641/0006-3568(2000)050[1099:GCATBO]2.0.CO;2</a>, 2000.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib26"><label>Li et al.(2014)Li, Zhao, Mao, Liu, and Qu</label><mixed-citation>
      
Li, A., Zhao, X., Mao, R., Liu, H., and Qu, J.: Characterization of dissolved
organic matter from surface waters with low to high dissolved organic carbon
and the related disinfection byproduct formation potential, J.
Hazard. Mater., 271, 228–235, <a href="https://doi.org/10.1016/j.jhazmat.2014.02.009" target="_blank">https://doi.org/10.1016/j.jhazmat.2014.02.009</a>,
2014.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib27"><label>Li et al.(2016)Li, Yang, Wan, Dai, and Zhang</label><mixed-citation>
      
Li, B., Yang, G., Wan, R., Dai, X., and Zhang, Y.: Comparison of random forests
and other statistical methods for the prediction of lake water level: A case
study of the Poyang Lake in China, Hydrol. Res., 47, 69–83,
<a href="https://doi.org/10.2166/nh.2016.264" target="_blank">https://doi.org/10.2166/nh.2016.264</a>, 2016.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib28"><label>Li et al.(2015)Li, del Giorgio, Parkes, and Prairie</label><mixed-citation>
      
Li, M., del Giorgio, P. A., Parkes, A. H., and Prairie, Y. T.: The relative
influence of topography and land cover on inorganic and organic carbon
exports from catchments in southern Quebec, Canada, J. Geophys.
Res.-Biogeo., 120, 2562–2578, <a href="https://doi.org/10.1002/2015JG003073" target="_blank">https://doi.org/10.1002/2015JG003073</a>, 2015.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib29"><label>Liu et al.(2021)Liu, Yu, Xiao, Qi, and Duan</label><mixed-citation>
      
Liu, D., Yu, S., Xiao, Q., Qi, T., and Duan, H.: Satellite estimation of
dissolved organic carbon in eutrophic Lake Taihu, China, Remote Sens.
Environ., 264, 112572, <a href="https://doi.org/10.1016/j.rse.2021.112572" target="_blank">https://doi.org/10.1016/j.rse.2021.112572</a>, 2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib30"><label>Marcé et al.(2021)Marcé, Verdura, and Leung</label><mixed-citation>
      
Marcé, R., Verdura, L., and Leung, N.: Dissolved organic matter
spectroscopy reveals a hot spot of organic matter changes at the
river–reservoir boundary, Aquat. Sci., 83, 67,
<a href="https://doi.org/10.1007/s00027-021-00823-6" target="_blank">https://doi.org/10.1007/s00027-021-00823-6</a>, 2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib31"><label>McCullough et al.(2018)McCullough, Dugan, Farrell, Morales-Williams,
Ouyang, Roberts, Scordo, Bartlett, Burke, Doubek
et al.</label><mixed-citation>
      
McCullough, I. M., Dugan, H. A., Farrell, K. J.,
Morales-Williams, A. M., Ouyang, Z., Roberts, D.,
Scordo, F.,  Bartlett, S. L.,  Burke, S. M.,
Doubek, J. P.,  Krivak-Tetley, F. E.,  Skaff, N. K., Summers, J. C.,  Weathers, K. C., and Hanson, P. C.: Dynamic modeling of organic carbon fates in lake ecosystems,
Ecol. Model., 386, 71–82, 2018.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib32"><label>Mercado-Bettín(2026)</label><mixed-citation>
      
Mercado-Bettín, D.:  Repository: A machine learning approach to driver attribution of dissolved organic matter dynamics in two contrasting freshwater systems (v1.0.0), Zenodo [data set] and [code],  <a href="https://doi.org/10.5281/zenodo.19354955" target="_blank">https://doi.org/10.5281/zenodo.19354955</a>, 2026.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib33"><label>Mercado-Bettín et al.(2021)Mercado-Bettín, Clayer, Shikhani,
Moore, Frías, Jackson-Blake, Sample, Iturbide, Herrera, French, Norling,
Rinke, and Marcé</label><mixed-citation>
      
Mercado-Bettín, D., Clayer, F., Shikhani, M., Moore, T. N., Frías,
M. D., Jackson-Blake, L., Sample, J., Iturbide, M., Herrera, S., French,
A. S., Norling, M. D., Rinke, K., and Marcé, R.: Forecasting water
temperature in lakes and reservoirs using seasonal climate prediction, Water
Res., 201, 117286, <a href="https://doi.org/10.1016/j.watres.2021.117286" target="_blank">https://doi.org/10.1016/j.watres.2021.117286</a>, 2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib34"><label>Mi et al.(2024)Mi, Tilahun, Flörke, Dürr, and Rinke</label><mixed-citation>
      
Mi, C., Tilahun, A. B., Flörke, M., Dürr, H. H., and Rinke, K.: Climate
warming effects in stratified reservoirs: Thorough assessment for
opportunities and limits of machine learning techniques versus process-based
models in thermal structure projections, J. Clean. Prod., 454,
142347, <a href="https://doi.org/10.1016/j.jclepro.2024.142347" target="_blank">https://doi.org/10.1016/j.jclepro.2024.142347</a>, 2024.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib35"><label>Molnar(2020)</label><mixed-citation>
      
Molnar, C.: Interpretable Machine Learning: A Guide for Making Black Box Models Explainable, Lulu.com, ISBN 978-0-244-76852-2, 2020.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib36"><label>Müller et al.(2024)Müller, D'Andrilli, Silverman, Bier,
Barnard, Lee, Richard, Tanentzap, Wang, de Melo, and Lu</label><mixed-citation>
      
Müller, M., D'Andrilli, J., Silverman, V., Bier, R. L., Barnard, M. A.,
Lee, M. C. M., Richard, F., Tanentzap, A. J., Wang, J., de Melo, M., and Lu,
Y.: Machine-learning based approach to examine ecological processes
influencing the diversity of riverine dissolved organic matter composition,
Frontiers in Water, 6, <a href="https://doi.org/10.3389/frwa.2024.1379284" target="_blank">https://doi.org/10.3389/frwa.2024.1379284</a>, 2024.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib37"><label>Nearing et al.(2021)Nearing, Kratzert, Sampson, Pelissier, Klotz,
Frame, Prieto, and Gupta</label><mixed-citation>
      
Nearing, G. S., Kratzert, F., Sampson, A. K., Pelissier, C. S., Klotz, D.,
Frame, J. M., Prieto, C., and Gupta, H. V.: What Role Does Hydrological
Science Play in the Age of Machine Learning?, Water Resour. Res., 57,
e2020WR028091, <a href="https://doi.org/10.1029/2020WR028091" target="_blank">https://doi.org/10.1029/2020WR028091</a>, 2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib38"><label>Paíz et al.(2025a)Paíz, Pierson, Lindqvist,
Naden, de Eyto, Dillane, McCarthy, Linnane, and Jennings</label><mixed-citation>
      
Paíz, R., Pierson, D. C., Lindqvist, K., Naden, P. S., de Eyto, E.,
Dillane, M., McCarthy, V., Linnane, S., and Jennings, E.: Accounting for
model parameter uncertainty provides more robust projections of dissolved
organic carbon dynamics to aid drinking water management, Water Res.,
276, 123238, <a href="https://doi.org/10.1016/j.watres.2025.123238" target="_blank">https://doi.org/10.1016/j.watres.2025.123238</a>, 2025a.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib39"><label>Paíz et al.(2025b)Paíz, Thomas, Carey, de Eyto,
Jones, Delany, Poole, Nixon, Dillane, McCarthy, Linnane, and
Jennings</label><mixed-citation>
      
Paíz, R., Thomas, R. Q., Carey, C. C., de Eyto, E., Jones, I. D., Delany,
A. D., Poole, R., Nixon, P., Dillane, M., McCarthy, V., Linnane, S., and
Jennings, E.: Near-term lake water temperature forecasts can be used to
anticipate the ecological dynamics of freshwater species, Ecosphere, 16,
e70335, <a href="https://doi.org/10.1002/ecs2.70335" target="_blank">https://doi.org/10.1002/ecs2.70335</a>, 2025b.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib40"><label>Prokhorenkova et al.(2019)Prokhorenkova, Gusev, Vorobev, Dorogush,
and Gulin</label><mixed-citation>
      
Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A. V., and Gulin, A.:
CatBoost: Unbiased boosting with categorical features, arXiv [preprint], <a href="https://doi.org/10.48550/arXiv.1706.09516" target="_blank">https://doi.org/10.48550/arXiv.1706.09516</a>, 2019.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib41"><label>Qi(2012)</label><mixed-citation>
      
Qi, Y.: Random Forest for Bioinformatics, Springer, 307–323,
<a href="https://doi.org/10.1007/978-1-4419-9326-7_11" target="_blank">https://doi.org/10.1007/978-1-4419-9326-7_11</a>, 2012.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib42"><label>Regier et al.(2023)Regier, Duggan, Myers-Pigg, and Ward</label><mixed-citation>
      
Regier, P., Duggan, M., Myers-Pigg, A., and Ward, N.: Effects of random forest
modeling decisions on biogeochemical time series predictions, Limnol.
Oceanogr.-Meth., 21, 40–52, <a href="https://doi.org/10.1002/lom3.10523" target="_blank">https://doi.org/10.1002/lom3.10523</a>, 2023.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib43"><label>Rodriguez-Galiano et al.(2015)Rodriguez-Galiano, Sanchez-Castillo,
Chica-Olmo, and Chica-Rivas</label><mixed-citation>
      
Rodriguez-Galiano, V., Sanchez-Castillo, M., Chica-Olmo, M., and Chica-Rivas,
M.: Machine learning predictive models for mineral prospectivity: An
evaluation of neural networks, random forest, regression trees and support
vector machines, Ore Geol. Rev., 71, 804–818,
<a href="https://doi.org/10.1016/j.oregeorev.2015.01.001" target="_blank">https://doi.org/10.1016/j.oregeorev.2015.01.001</a>, 2015.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib44"><label>Rumpel and Kögel-Knabner(2011)</label><mixed-citation>
      
Rumpel, C. and Kögel-Knabner, I.: Deep soil organic matter – A key but
poorly understood component of terrestrial C cycle, Plant  Soil, 338,
143–158, <a href="https://doi.org/10.1007/s11104-010-0391-5" target="_blank">https://doi.org/10.1007/s11104-010-0391-5</a>, 2011.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib45"><label>Ryder et al.(2012)Ryder, Jennings, de Eyto, Dillane, NicAonghusa,
Pierson, Moore, Rouen, and Poole</label><mixed-citation>
      
Ryder, E., Jennings, E., de Eyto, E., Dillane, M., NicAonghusa, C., Pierson,
D. C., Moore, K., Rouen, M., and Poole, R.: Temperature quenching of CDOM
fluorescence sensors: Temporal and spatial variability in the temperature
response and a recommended temperature correction equation, Limnol.
Oceanogr.-Meth., 10, 1004–1010, <a href="https://doi.org/10.4319/lom.2012.10.1004" target="_blank">https://doi.org/10.4319/lom.2012.10.1004</a>, 2012.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib46"><label>Ryder et al.(2014)Ryder, de Eyto, Dillane, Poole, and
Jennings</label><mixed-citation>
      
Ryder, E., de Eyto, E., Dillane, M., Poole, R., and Jennings, E.: Identifying
the role of environmental drivers in organic carbon export from a forested
peat catchment, Sci. Total Environ., 490, 28–36,
<a href="https://doi.org/10.1016/j.scitotenv.2014.04.091" target="_blank">https://doi.org/10.1016/j.scitotenv.2014.04.091</a>, 2014.


    </mixed-citation></ref-html>
<ref-html id="bib1.bib47"><label>Solomon et al.(2015)Solomon, Jones, Weidel, Buffam, Fork, Karlsson,
Larsen, Lennon, Read, Sadro, and Saros</label><mixed-citation>
      
Solomon, C. T., Jones, S. E., Weidel, B., Buffam, I., Fork, M. L., Karlsson,
J., Larsen, S., Lennon, J. T., Read, J. S., Sadro, S., and Saros, J. E.:
Ecosystem consequences of changing inputs of terrestrial dissolved organic
matter to lakes: Current knowledge and future challenges, Ecosystems, 18,
376–389, <a href="https://doi.org/10.1007/s10021-015-9848-y" target="_blank">https://doi.org/10.1007/s10021-015-9848-y</a>, 2015.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib48"><label>Sullivan(2022)</label><mixed-citation>
      
Sullivan, E.: Understanding from Machine Learning Models, Brit. J.
Philos. Sci., 73, 109–133, <a href="https://doi.org/10.1093/bjps/axz035" target="_blank">https://doi.org/10.1093/bjps/axz035</a>, 2022.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib49"><label>Toming et al.(2020)Toming, Kotta, Uuemaa, Sobek, Kutser, and
Tranvik</label><mixed-citation>
      
Toming, K., Kotta, J., Uuemaa, E., Sobek, S., Kutser, T., and Tranvik, L. J.:
Predicting lake dissolved organic carbon at a global scale, Sci.
Rep., 10, 8471, <a href="https://doi.org/10.1038/s41598-020-65010-3" target="_blank">https://doi.org/10.1038/s41598-020-65010-3</a>, 2020.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib50"><label>Šimek et al.(2011)Šimek, Comerma, García, Nedoma,
Marcé, and Armengol</label><mixed-citation>
      
Šimek, K., Comerma, M., García, J.-C., Nedoma, J., Marcé, R., and
Armengol, J.: The Effect of River Water Circulation on the Distribution and
Functioning of Reservoir Microbial Communities as Determined by a Relative
Distance Approach, Ecosystems, 14, 1–14, <a href="https://doi.org/10.1007/s10021-010-9388-4" target="_blank">https://doi.org/10.1007/s10021-010-9388-4</a>,
2011.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib51"><label>Watras et al.(2011)Watras, Hanson, Stacy, Morrison, Mather, Hu, and
Milewski</label><mixed-citation>
      
Watras, C., Hanson, P., Stacy, T., Morrison, K., Mather, J., Hu, Y.-H., and
Milewski, P.: A temperature compensation method for CDOM fluorescence sensors
in freshwater, Limnol. Oceanogr.-Meth., 9, 296–301, 2011.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib52"><label>Weyhenmeyer and Karlsson(2009)</label><mixed-citation>
      
Weyhenmeyer, G. A. and Karlsson, J.: Nonlinear response of dissolved organic
carbon concentrations in boreal lakes to increasing temperatures, Limnol.
Oceanogr., 54, 2513–2519, <a href="https://doi.org/10.4319/lo.2009.54.6_part_2.2513" target="_blank">https://doi.org/10.4319/lo.2009.54.6_part_2.2513</a>,
2009.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib53"><label>Xenopoulos et al.(2021)Xenopoulos, Barnes, Boodoo, Butman,
Catalán, D'Amario, Fasching, Kothawala, Pisani, Solomon, Spencer,
Williams, and Wilson</label><mixed-citation>
      
Xenopoulos, M. A., Barnes, R. T., Boodoo, K. S., Butman, D., Catalán, N.,
D'Amario, S. C., Fasching, C., Kothawala, D. N., Pisani, O., Solomon, C. T.,
Spencer, R. G. M., Williams, C. J., and Wilson, H. F.: How humans alter
dissolved organic matter composition in freshwater: Relevance for the Earth's
biogeochemistry, Biogeochemistry, 154, 323–348,
<a href="https://doi.org/10.1007/s10533-021-00753-3" target="_blank">https://doi.org/10.1007/s10533-021-00753-3</a>, 2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib54"><label>Zhang et al.(2024)Zhang, Shi, Wang, Wang, Zhang, Qin, Zhu, Dong, and
Zhang</label><mixed-citation>
      
Zhang, D., Shi, K., Wang, W., Wang, X., Zhang, Y., Qin, B., Zhu, M., Dong, B.,
and Zhang, Y.: An optical mechanism-based deep learning approach for deriving
water trophic state of China's lakes from Landsat images, Water Res.,
252, 121181, <a href="https://doi.org/10.1016/j.watres.2024.121181" target="_blank">https://doi.org/10.1016/j.watres.2024.121181</a>, 2024.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib55"><label>Zhang et al.(2021)Zhang, Yao, Wu, Huang, Zhou, Yang, and
Liu</label><mixed-citation>
      
Zhang, Y., Yao, X., Wu, Q., Huang, Y., Zhou, Z., Yang, J., and Liu, X.:
Turbidity prediction of lake-type raw water using random forest model based
on meteorological data: A case study of Tai lake, China, J.
Environ. Manage., 290, 112657, <a href="https://doi.org/10.1016/j.jenvman.2021.112657" target="_blank">https://doi.org/10.1016/j.jenvman.2021.112657</a>,
2021.

    </mixed-citation></ref-html>--></article>
