Assessing the performance of a suite of machine learning models for daily river water temperature prediction

In this study, different versions of feedforward neural network (FFNN), Gaussian process regression (GPR), and decision tree (DT) models were developed to estimate daily river water temperature using air temperature (Ta), flow discharge (Q), and the day of year (DOY) as predictors. The proposed models were assessed using observed data from eight river stations, and modelling results were compared with the air2stream model. Model performances were evaluated using four indicators in this study: the coefficient of correlation (R), the Willmott index of agreement (d), the root mean squared error (RMSE), and the mean absolute error (MAE). Results indicated that the three machine learning models had similar performance when only Ta was used as the predictor. When the day of year was included as model input, the performances of the three machine learning models dramatically improved. Including flow discharge instead of day of year, as an additional predictor, provided a lower gain in model accuracy, thereby showing the relatively minor role of flow discharge in river water temperature prediction. However, an increase in the relative importance of flow discharge was noticed for stations with high altitude catchments (Rhône, Dischmabach and Cedar) which are influenced by cold water releases from hydropower or snow melting, suggesting the dependence of the role of flow discharge on the hydrological characteristics of such rivers. The air2stream model outperformed the three machine learning models for most of the studied rivers except for the cases where including flow discharge as a predictor provided the highest benefits. The DT model outperformed the FFNN and GPR models in the calibration phase, however in the validation phase, its performance slightly decreased. In general, the FFNN model performed slightly better than GPR model. In summary, the overall modelling results showed that the three machine learning models performed well for river water temperature modelling.


INTRODUCTION
Water temperature is one of the key indicators to determine the overall health of aquatic ecosystems since it impacts various physical and bio-chemical processes in rivers (Caissie, 2006). Clear examples are the significant impact of stream water temperature on dissolved oxygen dynamics (Marzadri, Tonina & Bellin, 2013) and algae growth (Goldman & Carpenter, 1974). Understanding the processes regulating water temperature and how thermal regimes have changed in the past as well as how they can be modified in the future is therefore of utmost importance for ecological applications. This is particularly relevant considering that the increase of air temperature as a result of climate change, extreme events and anthropogenic pressures concur in impacting river thermal dynamics (Nelson & Palmer, 2007;Mantua et al., 2010;Cai et al., 2018;Piccolroaz et al., 2018). Ongoing warming of river water temperature has been observed by several authors in the past decades as a consequence of global warming (Moatar & Gailhard, 2006;Acuña et al., 2008;Tian & Yang, 2016;Weyhenmeyer et al., 2017), and according to the Report of European Environment Agency (2012), which stated that water temperature of some European rivers and lakes has increased from 1 to 3 • C during the last century, this trend is expected in the future as well. Besides the significant role of climatic change, anthropogenic activities (land use change, damming, thermal releases, etc.) can also strongly affect river thermal dynamics (Ozaki, Fukushima & Kojiri, 2008;Hester & Doyle, 2011;Casado et al., 2013;Wang et al., 2017;Kędra & Wiejaczka, 2018).
Thermal dynamics in rivers depend on multiple factors (Blaen et al., 2013;Cai et al., 2018). River temperature follows a diurnal cycle and a seasonal cycle, which are the result of heat inputs and outputs under specific meteorological (air temperature, humidity, wind and solar radiation) and hydrological conditions (depth, flow discharge and volume of groundwater exchange), as schematized in Fig. 1. Despite that many factors affect river water temperatures, studies have demonstrated that there exists a strong relationship between water temperature and air temperature (Webb, Clack & Walling, 2003;Hadzima-Nyarko, Rabi & Šperac, 2014;Toffolon & Piccolroaz, 2015). However, the modelling of their relationship is not as trivial as it seems, especially when climatic change is to be considered (Piotrowski et al., 2016).
The modeling and forecasting of water temperature under different spatial and temporal scales is usually solved by two types of models: deterministic and statistical models. Deterministic models apply energy budget approaches to predict river water temperature (Sinokrot & Stefan, 1993;Zhu, Du & Luo, 2018;Du et al., 2018), while statistical models mainly predict river water temperature using air temperature data (Stefan & Preud'homme, 1993;Mohseni & Stefan, 1999;Webb, Clack & Walling, 2003;Caissie, 2006). Another group of statistical models are the so called non-parametric models. Artificial Neural Networks (ANN), which belong to this group, have demonstrated to be a good mathematical tool to characterize, model and predict a great amount of non-linear processes, and their applications in river water temperature predictions have been well documented (Sahoo, Schladow & Reuter, 2009;Hadzima-Nyarko, Rabi & Šperac, 2014;Deweber & Wagner, 2014;Rabi, Hadzima-Nyarko & Sperac, 2015;Temizyurek & Dadasercelik, 2018;Zhu, Nyarko &  Hadzima- Nyarko, 2018;Zhu et al., 2019). The Gaussian process regression (GPR) model has been widely used in engineering (Sun, Wang & Xu, 2014;Kang et al., 2015;Tan et al., 2016). Its applications in hydrology are also quite impressive (Rasouli, Hsieh & Cannon, 2012;Grbić, Kurtagić & Slišković, 2013;Sun, Wang & Xu, 2014;Zhu, Nyarko & Hadzima-Nyarko, 2018). Currently, GPR models have only been applied for river temperature prediction by Grbić, Kurtagić &Slišković (2013) andZhu, Nyarko &Hadzima-Nyarko (2018), and modelling results showed that GPR models can be efficiently used for river water temperature predictions. Decision Tree (DT) models are supervised machine learning approaches, which are very popular in machine learning (Pradhan, 2013;Rutkowski et al., 2014;Tsangaratos & Ilia, 2016). They have also been frequently used in hydrology (Balk & Elder, 2000;Schärer, Page & Beven, 2006;Tehrany, Pradhan & Jebur, 2013;Kumar et al., 2013), and have been applied only once in water temperature modeling (Zhu, Nyarko & Hadzima-Nyarko, 2018). Further investigations about these robust approaches in river water temperature modelling are needed since accurate simulation of river water temperature plays an important role in water resources management. Previously, ANN, GPR and DT models were compared in water temperature modeling of the Missouri River, USA (Zhu, Nyarko & Hadzima-Nyarko, 2018) using only air temperature (T a ) as a predictor. However, some researches have also tried to model river water temperature by considering multiple factors, such as river flow discharge (Webb, Clack & Walling, 2003;Arismendi et al., 2014;Laanaya, St-Hilaire & Gloaguen, 2017), solar radiation (Sahoo, Schladow & Reuter, 2009), riparian shade (Johnson, Wilby & Toone, 2014), landform attributes, and forested land cover (Deweber & Wagner, 2014). Air temperature and river flow discharge are generally the most available variables for modeling temperatures in rivers, and they have been shown to have the greatest impact on water temperature (Webb, Clack & Walling, 2003). In this research, ANN, GPR and DT models were developed for eight river stations characterized by different hydrological conditions using T a , flow discharge (Q) and day of the year (DOY ) as predictors. Modelling performances were assessed by comparing with air2stream, a hybrid statistical-physical based model (Toffolon & Piccolroaz, 2015;Piccolroaz et al., 2016). The aim of this study is to contribute to water temperature modelling for river systems.

Study sites and data
The measurements of daily water temperature (T w ), flow discharge (Q) and air temperature (T a ) were obtained from seven rivers in Europe and USA: (i) one river in Croatia, (ii) three rivers in Switzerland, and (iii) three rivers in USA. The main characteristics of the rivers with the period of records, and the calibration and validation periods are briefly summarized in Table 1.

The rescaled adjusted partial sums (RAPS) method
In order to detect and quantify trends and fluctuations in time series, the widely used rescaled adjusted partial sums (RAPS) method (Bonacci, Trninić & Roje-Bonacci, 2008;Bonacci & Oskoruš, 2010;Basarin et al., 2016) was employed in this study. The RAPS method can highlight trends, shifts, data clustering, irregular fluctuations and periodicities in the time series (Bonacci, Trninić & Roje-Bonacci, 2008), and is defined as: Descriptions of the variables are summarized in Table 2.

Machine learning models
In this study, we compared three machine learning models: feedforward artificial neural network (FFNN), GPR and DT. Detailed descriptions of these models can be found in Zhu, Nyarko & Hadzima-Nyarko (2018).

Air2stream
Air2stream is a hybrid model which combines a physically based structure with a stochastic calibration of parameters (Toffolon & Piccolroaz, 2015; source code available at https://github.com/spiccolroaz/air2stream). The model has been tested in various river systems, and generally it performs well. The equation for the air2stream model can be expressed as (the 8-parameter version): Descriptions of the variables can be found in Table 2. In this model version, T w is estimated from T a , Q and a sinusoidal term.
Assuming that the effect of flow discharge can be approximately retained using only a constant value and by combining the constant terms and those proportional to T w , a 5-parameter model version can be obtained (Toffolon & Piccolroaz, 2015). In this model version, T w is estimated from T a and a sinusoidal term.
Disregarding the second term on the right hand of Eq.
(2) and assuming that the effect of the flow discharge can be approximately retained using only a constant value, a 3-parameter model version can be obtained (Toffolon & Piccolroaz, 2015). In this model version, T w is estimated from T a only.
The detailed information about the model can be found in Toffolon & Piccolroaz (2015) and Piccolroaz et al. (2016).

Performance indices
Model performances were evaluated using four indicators in this study: the coefficient of correlation (R), the Willmott index of agreement (d), the root mean squared error (RMSE), and the mean absolute error (MAE).
Descriptions of the variables can be found in Table 2.

Model versions
The models were compared using three different versions: (i) version 1: FFNN1, GPR1 and DT1 using only T a as input variable, (ii) version 2: FFNN2, GPR2 and DT2 with T a and DOY as input variables, and (iii) version 3: FFNN3, GPR3 and DT3 with T a , Q, and DOY as input variables. For comparison, the 3-parameter, 5-parameter, and 8-parameter version air2stream models were used correspondingly.

Seasonal dynamics of T a , T w and Q
Detailed statistics for the variables selected (T w , Q and T a ) can be found in Table 3. Mean annual T w for Rhône at Sion and Dischmabach at Davos are smaller compared with T w for other river stations since Sion lies at the bottom of a populated Alpine valley, and Davos is located in a steep glacial valley (Dischma). Mean annual T a for Dischmabach at Davos is extremely colder than the other stations. Mentue at Yvonand, Dischmabach at Davos, Fanno and Irondequoit have smaller annual mean flow discharge (Q < 10.0 m 3 /s). Figure 2 shows the temporal variations of annual averaged T w and T a for the studied river stations. It can be seen that the increasing rates of T w are smaller than that of T a in general. T a displays an increasing trend except for Mentue at Yvonand and Fanno, and the Table 3 Basic statistical characteristics of the mean annual water temperature (T w : • C), air temperature (T a : • C), and flow discharge (Q: m 3 /s) at the eight gauging and meteorological stations. increasing rates range from 0.0384 to 0.0562 • C/year. T w displays an increasing trend except for Mentue at Yvonand and Irondequoit, and the increasing rates vary between 0.0052 and 0.0449 • C/year. Generally, T w and T a display the same varying pattern. However, for Fanno, T w has an increasing trend with a decreasing trend of T a , while for Irondequoit, T w has a decreasing trend with an increasing trend of T a , which indicates that the temporal variations of T w may not be explained only by T a . It is shown in Fig 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004 2006 2008 Figure 4 presents the seasonal dynamics of T w , T a and dimensionless flow discharge (DQ: the ratio of daily flow discharge to annually averaged flow discharge) for the eight river stations through the climatological year. Results showed that the seasonal variations of T w and T a are almost synchronous for Drava at Botovo and Donji Miholjac, Mentue at Yvonand, Fanno and Irondequoit, indicating the insignificant role of Q. The damped responses of T w to variations in T a can be found in Rhône at Sion, Dischmabach at Davos and Cedar, especially for Rhône River at Sion since it is impacted by cold water releases from high altitude hydropower reservoir, indicating the significant role of Q in these stations.

Model performances
All the four models (GPR, DT, FFNN and air2strteam) were applied and compared for the eight stations, and modelling performances were evaluated using RMSE, MAE, R and d values. The developed models were able to successfully predict T w using T a , Q, and DOY as inputs. Tables 4-11 present the statistical indicators for the eight stations. For the eight studied stations, the FFNN, GPR and DT models performed with relatively similar accuracy when only T a was used as predictor, with DT model performing slightly worse than FFNN and GPR. When DOY was included as model input, modelling performances of the three FFNN, GPR and DT models dramatically improved in terms of higher R, d and lower . The comparison between the improvement in river T w prediction obtained using model version 3 and version 2 indicates that the inclusion of discharge provided a lower gain than including DOY for all the cases (Tables 4-11). The relative importance of Q increases for the studied station in the Rhône, Dischmabach and Cedar Rivers, which suggests that the role of Q on river T w modelling is more relevant for high altitude rivers (cold water release from hydropower reservoirs, snow melting and etc.). However, for lowland hydrological stations impacted by hydropower reservoirs, such as the Drava River at Botovo, the influence of Q is not that remarkable.
Results at Botovo station are reported in Table 4. Figure 5 illustrates the scatterplot of measured and calculated values of T w using the best models with T a , Q, and DOY as input variables (version 3). During the validation phase, inspection of statistical metrics for different models shows that the RMSE and MAE are lowest for the three versions of the air2strteam model (Table 4), and the best accuracy was obtained using air2strteam3 (RMSE = 0.891 • C and MAE = 0.724 • C). Averaged RMSE and MAE values for the three air2strteam models are 0.986 • C and 0.797 • C respectively. Compared to the three machine learning models, air2strteam3 is more accurate, followed by GPR3 (RMSE = 1.302 • C and MAE = 1.042 • C) and FFNN3 (RMSE = 1.307 • C and MAE = 1.040 • C) with similar accuracy, and finally the DT3 is ranked in the third place with the highest RMSE (1.366 • C) and MAE (1.086 • C) values respectively.
Results at Donji Miholjac station are reported in Table 5. Figure 6 illustrates the scatterplot of measured and calculated values of T w using the best models with T a , Q, and DOY as input variables. Two important points must be highlighted. Firstly, using the air2strteam, there is no significant difference between air2strteam1, air2strteam2 and air2strteam3. The R and d values were slightly improved between air2strteam1 and air2strteam3 (0.3% and 0.1%). In addition, the RMSE and MAE did not change significantly between the two models: 8.9% reduction for the RMSE, and 9.27% reduction for the MAE. Secondly, by comparing the accuracy of the three machine learning models, it is clear that the FFNN, GPR and DT models worked with slight difference. In addition, DT1 possess   Figure 7 illustrates the scatterplot of measured and calculated values of T w using the best models with T a , Q, and DOY as input variables. As can be seen from Table 6, during the validation phase, the FFNN1, GPR1 and DT1 were practically identical. The difference in the RMSE and MAE values were small and marginal. The FFNN1 model was slightly more accurate than the GPR1 and DT1, with MAE of 0.982 • C and RMSE of 1.307 • C. However, using only T a as input variable, air2strteam1 was more accurate than the three machine learning models. The difference in the performance of these algorithms for the prediction of T w is discussed   (Table 7) showed that, for the models using only T a as input variable, the three machine learning models were more accurate than the air2strteam model and FFNN1 (R = 0.927 and d = 0.962) and GPR1 (R = 0.926 and d = 0.957) performed the best and slightly better than the DT1 model (R = 0.923 and d = 0.956). According to the RMSE and MAE values, the air2strteam1 was the worst model with RMSE and MAE values equal to 1.044 • C and 0.812 • C, respectively. The FFNN1 gave  Table 9 Performances of different model versions in modelling water temperature (T w ) for Cedar station.  Figure. 8 illustrates the scatterplot of measured and calculated values of T w using the best models (version 3) at Rhône station.

Model version Training (Calibration) Validation
A summary of the model performances at Dischmabach station is provided in Table 8 and Fig. 9. It can be observed that GPR1 slightly outperformed FFNN1 and DT1 according to all measures. A RMSE of 0.887 • C was observed in estimated T w using GPR1, while the respective values for FFNN1 and DT1 were 0.896 • C and 0.972 • C, respectively. GPR1 produced the highest R and d between the measured and calculated T w (R = 0.951 and d = 0.974), while the air2strteam1 yielded R of 0.945 and d of 0.972, less than the values obtained using the GRP1. To show the importance of including the DOY and Q as inputs  Table 11 Performances of different model versions in modelling water temperature (T w ) for Irondequoit station.  to the models, the corresponding performances are illustrated and compared. Comparing the FFNN2, GPR2 and DT2 models with the DOY as input in addition to the T a , it can be concluded that: (i) the three machine learning models provided the same accuracy with only marginal difference and (ii) the air2strteam2 yielded less accuracy compared to the FFNN2, GPR2 and DT2 models, with high RMSE and MAE, and low R and d values. Results obtained using the applied models at Cedar station are illustrated in Table 9. Figure 10 illustrates the scatterplot of measured and calculated values of T w using the best models. Overall, using only the T a as input, the maximum R and d values of 0.967 and 0.981 respectively during the validation phase were obtained using the air2strteam1 model. In addition, the RMSE and MAE values of the FFNN1, GRP1 and DT1 models obtained were 1.211 • C and 0.933 • C, 1.248 • C and 0.978 • C, and 1.243 • C and 0.955 • C, which were respectively 13.54% and 10.075%, 16.10% and 14.21%, and 15.76% and 12.14%, greater than the values obtained using the air2strteam1 model. Thus, it was demonstrated that the air2strteam1 model was more accurate than the machine learning models. With the inclusion of the DOY as input variable combined with the T a , the model performances varied accordingly and were significantly improved. The best accuracy was obtained with the FFNN2 model that had a significant decrease of the RMSE and MAE compared to the FFNN1 by 33.94% and 33.23%, respectively. Comparing the overall accuracy of the models using the version 2, it is clear from Table 9, that the four models provided the same accuracy with very marginal difference. Finally, as shown in Table 9, R and d values for the FFNN3, GPR3, DT3 and air2strteam3 reached up to 0.983, 0.985, 0.989, and 0.983, respectively, which were relatively higher compared to the values obtained using the versions 1 and 2. Results at the Fanno station are reported at Fig. 11 and Table 10. The RMSE and MAE of the four models for this station were relatively low. It was indicated that the T w calculated by air2strteam1 agreed well with the measured value, with R and d values of 0.984 and 0.989 during the validation phase. Compared to the machine learning models, air2strteam1 was the best model. Moreover, from the modeling accuracy reported in Table 10, the FFNN1, GPR1 and DT1 were less accurate than the air2strteam1. Analogously for the version 2 of models, air2strteam2 was the best model and yielded the highest R and d values ( Table 11 presents the performance metrics for the three machine learning models and air2strteam for Irondequoit River. Figure 12 illustrates the scatterplot of measured and calculated values of T w using the best models. Superiority of the air2strteam for predicting T w is evident for all three versions of models. During the validation phase, the superiority of the air2strteam1 is indicated by higher R (0.984) versus 0.966, 0.967, and 0.965 for FFNN1, GPR1 and DT1, respectively. This superior performance is confirmed by lower RMSE and MAE values (1.160 versus 1.867 • C, 1.926 • C, and 1.976 • C), and (0.890 versus 1.459 • C, 1.430 • C, and 1.387 • C) for FFNN1, GPR1 and DT1, respectively. As is shown in Table 11, air2strteam2 performs best, as indicated by higher R and d, and lower RMSE and MAE values. It is important to highlight the significant improvement of the models by the inclusion of the DOY as input variable. By adding the DOY to the model inputs, the R value was increased from 0.989 to 0.993 for air2strteam2, from 0.965 to 0.985 for DT1, from 0.967 to 0.986 for GPR2, and 0.966 to 0.986 for FFNN2, respectively (Table 11). The RMSE was decreased by 35.99%, 34.58%, 34.51% and 19.05%, when using FFNN2, GPR2, DT2 and air2strteam2, respectively. Similarly, the MAE was decreased by 33.23%, 29.51%, 28.78% and 21.12%, when using FFNN2, GPR2, DT2 and air2strteam2, respectively.

DISCUSSION
For the eight studied stations, the three machine learning models (FFNN1, GPR1 and DT1) performed comparatively when only T a was used as predictor. R and d values were larger than 0.9, indicating good modelling performances. RMSE and MAE ranged from 0.733 to 2.574 • C and 0.553 to 2.015 • C in the calibration phase, and varied from 0.864 to 2.732 • C and 0.651 to 2.314 • C in the validation period. The air2stream1 model significantly outperformed the three machine learning models for most of the studied stations, except for Rhône at Sion and Dischmabach at Davos. Flow discharge (releases of cold water from high altitude hydropower reservoirs and snow melting from the high altitude areas) significantly impacted water temperature dynamics for Rhône River at Sion (Fig. 4D) and Dischmabach River at Davos (Fig. 4E). However, for air2stream1 model with three parameters (Eq. (4)), the effect of flow discharge was neglected (Toffolon & Piccolroaz, 2015;Piccolroaz et al., 2016), resulting in poor modelling performances.
When DOY was included as model input, modelling performances of the three machine learning models (FFNN2, GPR2 and DT2) dramatically improved in term of higher R, d and lower , which confirmed the results of Heddam (2016), Heddam &Kisi (2017), andZhu et al. (2019). In their researches, the inclusion of the components of the Gregorian calendar significantly improved the performance of the machine learning models in the case of dissolved oxygen and water temperature modelling. R and d values varied from 0.952 to 0.993 and 0.975 to 0.996 in the calibration phase, while for the validation phase, R and d values ranged from 0.936 to 0.986 and 0.964 to 0.993 respectively. RMSE and MAE ranged from 0.387 to 1.481 • C and 0.293 to 1.178 • C in the calibration phase, and varied from 0.507 to 1.783 • C and 0.394 to 1.422 • C in the validation phase. Modelling results indicate that the inclusion of DOY is significant for river water temperature simulation since it provides additional information on the seasonality of the river thermal dynamics compared to that encapsulated in the T a time series. The air2stream2 model performed better than the air2stream1 model, which indicates that the sinusoidal annual term in Eq. (2) is important for river water temperature prediction since this annual periodicity can mimic the effect of lateral inflows and heat fluxes (Toffolon & Piccolroaz, 2015;Piccolroaz et al., 2016). The air2stream2 model outperformed the three machine learning models in most of the studied stations, except for Rhône at Sion, Dischmabach at Davos and Cedar River, which were impacted by cold water releases from high altitude hydropower reservoirs or snow melting.
The comparison between the improvement in river T w prediction obtained using model version 3 and version 2 indicated that the inclusion of flow discharge provided a lower gain than including DOY for all the cases (Tables 4-11). This result indicates that in the proposed models Q plays a minor role, and the addition of DOY significantly contributes to better capture the seasonal pattern of river thermal dynamics. The relative importance of Q increases for the studied station in the Rhône, Dischmabach and Cedar rivers. The results also suggested that the role of Q on T w modelling is more relevant for high altitude rivers impacted by cold water release from hydropower reservoirs or snow melting. However, for lowland hydrological stations impacted by hydropower reservoirs, such as the Drava River at Botovo, the influence of Q is not that remarkable. The air2stream3 model outperformed the three machine learning models in most of the studied stations, except for Rhône at Sion, Dischmabach at Davos and Cedar River.
For all the three model versions, DT model outperformed FFNN and GPR models in the calibration phase, while for the validation phase, its performance slightly decreased. Generally, FFNN model performed slightly better than GPR model, while the overall difference can be neglected. The RMSE values for the eight studied stations range from 0.956 to 1.366 • C, 1.014 to 1.69 • C, 0.611 to 0.947 • C, 0.427 to 0.763 • C, 0.325 to 0.536 • C, 0.448 to 0.779 • C, 0.755 to 1.225 • C, and 0.825 to 1.339 • C respectively for all the version 3 machine learning models, which are reasonable compared with Jackson et al. (2018) (1.57 • C) and Sohrabi et al. (2017) (1.25 • C), and far better than that of Temizyurek and Dadaser-Celik (2018) (2.10-2.64 • C). Overall, the three machine learning models (FFNN3, GPR3 and DT3) performed well for river water temperature modelling.

CONCLUSIONS
In this study, different versions of FFNN, GPR and DT models were developed to simulate daily river water temperatures using T a , Q, and DOY as predictors. The models were assessed in various river systems, and modelling results were compared with air2stream model. For the eight studied river stations, the FFNN, GPR and DT models performed similarly when only T a was used as predictor. When DOY was included as input, modelling performances of the FFNN, GPR and DT models dramatically improved in term of higher R, d and lower RMSE and MAE values. The inclusion of Q provided a lower gain than including DOY for all the cases, which indicates that in the proposed models Q plays a minor role. However, the relative importance of Q increases for the studied station in the Rhône River, Dischmabach River and Cedar River, which suggested that the role of flow discharge on river water temperature modelling is more relevant for high altitude rivers impacted by cold water releases from hydropower or snow melting. The air2stream model outperformed the three machine learning models in most of the studied rivers except for the Rhône River, Dischmabach River and Cedar River. For the eight studied river stations, RMSE values of MLPNN3, GPR3 and DT3 models ranged from 0.398 to 1.690 • C, 0.418 to 1.592 • C, and 0.325 to 1.668 • C respectively. For the Mentue River, Rhône River, Dischmabach River, and Cedar River, the RMSE values are lower than 1.0 • C. Overall, FFNN, GPR and DT models performed well for river water temperature modelling. website: https://waterdata.usgs.gov/nwis/inventory. Readers can input the USGS station number (for example, for Cedar River, users can input 12119000) and select the data for output. In this study, we selected daily water temperature and flow discharge. The data can be output to a TXT file for use.

Supplemental Information
Supplemental information for this article can be found online at http://dx.doi.org/10.7717/ peerj.7065#supplemental-information.