hydrographr: an R package for scalable hydrographic data processing

Abstract

1. Freshwater ecosystems are considered biodiversity hotspots, but assessing the spatial distribution of species remains challenging. One major obstacle lies in the complex geospatial processing of large amounts of data, such as stream network, sub-catchment and basin data, that are necessary for addressing the longitudinal connectivity among water bodies. Workflows thus need to be scalable, especially when working across large spatial extents and at high spatial resolution. This in turn requires advanced command-line GIS skills and programming language integration, which often poses a challenge for freshwater researchers. 2. To address this challenge, we developed the package \texttt{hydrographr} that provides scalable hydrographic data processing in R. The package contains functions for downloading data of the high-resolution Hydrography90m dataset, processing, reading and extracting information, as well as assessing network distances and connectivity. While the functions are, by default, tailored towards the Hydrography90m data, they can also be generalized towards other data and purposes, such as efficient cropping and merging of raster and vector data, point-raster extraction, raster reclassification, and data aggregation. The package depends on the open-source software GDAL/OGR, GRASS-GIS and the AWK programming language in the Linux environment, allowing a seamless language integration. Since the data is processed outside R, hydrographr allows creating scalable geo-processing workflows. 3. We illustrate the \texttt{hydrographr} functions using two workflows that focus on (i) a freshwater species distribution modelling approach, and (ii) assessing stream connectivity given the fragmentation by dams. We also provide a detailed guide for the initial installation of the required software. Windows users need to first enable the Windows Subsystem for Linux (WSL) feature, and can then follow the same software installation as Linux users. \texttt{hydrographr} is maintained on GitHub at https://github.com/glowabio/hydrographr. 4. hydrographr provides a set of key functions for processing freshwater geospatial data. We expect that the package will support the freshwater-related research communities given the easy-to-use wrapper functions that allow capitalizing on powerful open-source command-line software, which may otherwise require a steep learning curve. Users can thus perform large-scale freshwater-specific longitudinal connectivity and network analyses across large geographic extents while staying within the R environment.

Publication
Methods in Ecology and Evolution, 14, 2953-2963