Session: Using Machine Learning to Quantify and Improve Earth System Predictions
Bayesian and hybrid machine learning modeling for improving predictability of streamflow in data-scarce watersheds
Wednesday, August 4, 2021
Link To Share This Presentation: https://cdmcd.co/5mqzGM
Dan Lu, Computational Sciences and Engineering Division and Climate Change Science Institute, Oak Ridge National Laboratory, Oak Ridge, TN, Goutam Konapala, NASA Goddard Space Flight Center, Scott Painter and Sudershan Gangrade, Oak Ridge National Laboratory, Shih-Chieh Kao, Environmental Sciences Division, Oak Ridge National Laboratory, Oak Ridge, TN
Computational Sciences and Engineering Division and Climate Change Science Institute, Oak Ridge National Laboratory Oak Ridge, TN, USA
Background/Question/Methods Hydrologic predictions at rural watersheds are important but also challenging due to data shortage. Long Short-Term Memory (LSTM) networks are a promising machine learning approach and have demonstrated good performance in streamflow predictions. However, due to its data-hungry nature, most of LSTM applications focused on well-monitored watersheds with abundant and high-quality observations. In this work, we investigate predictive capabilities of LSTM in poorly monitored watersheds with short observation records. To address three main challenges of LSTM applications in data-scarce locations, i.e., overfitting, uncertainty quantification (UQ), and out-of-distribution prediction, we evaluate different regularization techniques to prevent overfitting, apply a Bayesian LSTM for UQ, and introduce a physics-informed hybrid LSTM to enhance out-of-distribution prediction. Results/Conclusions Through case studies in two diverse sets of watersheds with and without snow influence, we demonstrate that: (1) when hydrologic variability in the prediction period is similar to the calibration period, LSTM models can reasonably predict daily streamflow with Nash-Sutcliffe efficiency above 0.8, even with only two years of calibration data. (2) When the hydrologic variability in the prediction and calibration periods is dramatically different, LSTM alone does not predict well, but the hybrid model can improve the out-of-distribution prediction with acceptable generalization accuracy. (3) L2 norm penalty and dropout can mitigate overfitting, and Bayesian and hybrid LSTM have no overfitting. (4) Bayesian LSTM provides useful uncertainty information to improve prediction understanding and credibility. These insights have vital implications for streamflow simulation in watersheds where data quality and availability are a critical issue.