Missing Data Imputation for Multisite Rainfall Networks: A Comparison between Geostatistical Interpolation and Pattern-Based Estimation on Different Terrain Types

Oriani F., Stisen S., Demirel M. C., Mariethoz G.

JOURNAL OF HYDROMETEOROLOGY, vol.21, pp.2325-2341, 2020 (SCI-Expanded) identifier identifier

  • Publication Type: Article / Article
  • Volume: 21
  • Publication Date: 2020
  • Doi Number: 10.1175/jhm-d-19-0220.1
  • Journal Indexes: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, Agricultural & Environmental Science Database, Aqualine, Aquatic Science & Fisheries Abstracts (ASFA), Artic & Antarctic Regions, CAB Abstracts, Environment Index, Geobase, Pollution Abstracts, Veterinary Science Database
  • Page Numbers: pp.2325-2341
  • Keywords: Hydrometeorology, Numerical analysis/modeling, Pattern detection, Statistical techniques, Hydrologic models, TIME-SERIES, SPATIAL INTERPOLATION, PRECIPITATION DATA, SIMULATION, SATELLITE, GAUGE, VARIABILITY, MODELS, PERFORMANCE, GENERATION
  • Istanbul Technical University Affiliated: Yes


Missing rainfall data are a major limitation for distributed hydrological modeling and climate studies. Practitioners need reliable approaches that can be employed on a daily basis, often with too limited data in space to feed complex predictive models. In this study we compare different automatic approaches for missing data imputation, including geostatistical interpolation and pattern-based estimation algorithms. We introduce two pattern-based approaches based on the analysis of historical data patterns: (i) an iterative version of K-nearest neighbor (IKNN) and (ii) a new algorithm called vector sampling (VS) that combines concepts of multiple-point statistics and resampling. Both algorithms can draw estimations from variably incomplete data patterns, allowing the target dataset to be at the same time the training dataset. Tested on five case studies from Denmark, Australia, and Switzerland, the algorithms show a different performance that seems to be related to the terrain type: on flat terrains with spatially homogeneous rain events, geostatistical interpolation tends to minimize the average error, while in mountainous regions with nonstationary rainfall statistics, data mining can recover better the rainfall patterns. The VS algorithm, requiring minimal parameterization, turns out to be a convenient option for routine application on complex and poorly gauged terrains.