Get CAMELS dataset.
This class first downloads the CAMELS dataset if it is not already downloaded.
Then the selected attribute for a selected id are fetched and provided to the
user using the method fetch.
station/gauge_ids or a speficified station. It can also be used to
fetch all attributes of a number of stations ids either by providing
their guage_id or by just saying that we need data of 20 stations
which will then be chosen randomly.
-fetch_dynamic_features:
fetches speficied dynamic attributes of one specified station. If the
dynamic attribute is not specified, all dynamic attributes will be
fetched for the specified station. If station is not specified, the
specified dynamic attributes will be fetched for all stations.
-fetch_static_features:
works same as fetch_dynamic_features but for static attributes.
Here if the category is not specified then static attributes of
the specified station for all categories are returned.
stations – if string, it is supposed to be a station name/gauge_id.
If list, it will be a list of station/gauge_ids. If int, it will
be supposed that the user want data for this number of
stations/gauge_ids. If None (default), then attributes of all
available stations. If float, it will be supposed that the user
wants data of this fraction of stations.
dynamic_features – If not None, then it is the attributes to be
fetched. If None, then all available attributes are fetched
static_features – list of static attributes to be fetches. None
means no static attribute will be fetched.
st – starting date of data to be returned. If None, the data will be
returned from where it is available.
en – end date of data to be returned. If None, then the data will be
returned till the date data is available.
as_dataframe – whether to return dynamic attributes as pandas
dataframe or as xarray dataset.
kwargs – keyword arguments to read the files
Returns:
If both static and dynamic features are obtained then it returns a
dictionary whose keys are station/gauge_ids and values are the
attributes and dataframes.
Otherwise either dynamic or static features are returned.
Examples
>>> dataset=CAMELS_AUS()>>> # get data of 10% of stations>>> df=dataset.fetch(stations=0.1,as_dataframe=True)# returns a multiindex dataframe... # fetch data of 5 (randomly selected) stations>>> df=dataset.fetch(stations=5,as_dataframe=True)... # fetch data of 3 selected stations>>> df=dataset.fetch(stations=['912101A','912105A','915011A'],as_dataframe=True)... # fetch data of a single stations>>> df=dataset.fetch(stations='318076',as_dataframe=True)... # get both static and dynamic features as dictionary>>> data=dataset.fetch(1,static_features="all",as_dataframe=True)# -> dict>>> data['dynamic']... # get only selected dynamic features>>> df=dataset.fetch(stations='318076',... dynamic_features=['streamflow_MLd','solarrad_AWAP'],as_dataframe=True)... # fetch data between selected periods>>> df=dataset.fetch(stations='318076',st="20010101",en="20101231",as_dataframe=True)
station – station id/gauge id for which the data is to be fetched.
dynamic_features (str/list, optional) – names of dynamic features/attributes to fetch
static_features – names of static features/attributes to be fetches
as_ts (bool) – whether static attributes are to be converted into a time
series or not. If yes then the returned time series will be of
same length as that of dynamic attribtues.
st (str,optional) – starting point from which the data to be fetched. By default
the data will be fetched from where it is available.
en (str, optional) – end point of data to be fetched. By default the dat will be fetched
Returns:
dataframe if as_ts is True else it returns a dictionary of static and
dynamic attributes for a station/gauge_id
stations – list of stations for which data is to be fetched.
dynamic_features – list of dynamic attributes to be fetched.
if ‘all’, then all dynamic attributes will be fetched.
static_features – list of static attributes to be fetched.
If all, then all static attributes will be fetched. If None,
then no static attribute will be fetched.
st – start of data to be fetched.
en – end of data to be fetched.
as_dataframe – whether to return the data as pandas dataframe. default
is xr.dataset object
dict (kwargs) – additional keyword arguments
Returns:
Dynamic and static features of multiple stations. Dynamic features
are by default returned as xr.Dataset unless as_dataframe is True, in
such a case, it is a pandas dataframe with multiindex. If xr.Dataset,
it consists of data_vars equal to number of stations and for each
station, the DataArray is of dimensions (time, dynamic_features).
where time is defined by st and en i.e length of DataArray.
In case, when the returned object is pandas DataFrame, the first index
is time and second index is dyanamic_features. Static attributes
are always returned as pandas DataFrame and have following shape
(stations, static_features). If `dynamic_features is None,
then they are not returned and the returned value only consists of
static features. Same holds true for static_features.
If both are not None, then the returned type is a dictionary with
static and dynamic keys.
Raises:
ValueError, if both dynamic_features and static_features are None –
Examples
>>> fromai4water.datasetsimportCAMELS_AUS>>> dataset=CAMELS_AUS()... # find out station ids>>> dataset.stations()... # get data of selected stations>>> dataset.fetch_stations_attributes(['912101A','912105A','915011A'],... as_dataframe=True)
Inherits from Camels class. Reads CAMELS-AUS dataset of
Fowler et al., 2020
dataset.
Examples
>>> fromai4water.datasetsimportCAMELS_AUS>>> dataset=CAMELS_AUS()>>> df=dataset.fetch(stations=1,as_dataframe=True)>>> df=df.unstack()# the returned dataframe is a multi-indexed dataframe so we have to unstack it>>> df.shape (21184, 26)... # get name of all stations as list>>> stns=dataset.stations()>>> len(stns) 222... # get data by station id>>> df=dataset.fetch(stations='224214A',as_dataframe=True).unstack()>>> df.shape (21184, 26)... # get names of available dynamic features>>> dataset.dynamic_features... # get only selected dynamic features>>> data=dataset.fetch(1,as_dataframe=True,... dynamic_features=['tmax_AWAP','precipitation_AWAP','et_morton_actual_SILO','streamflow_MLd']).unstack()>>> data.shape (21184, 4)... # get names of available static features>>> dataset.static_features... # get data of 10 random stations>>> df=dataset.fetch(10,as_dataframe=True)>>> df.shape# remember this is a multiindexed dataframe (21184, 260)
path – path where the CAMELS-AUS dataset has been downloaded. This path
must contain five zip files and one xlsx file. If None, then the
data will downloaded.
Fetches static attribuets of one or more stations as dataframe.
Parameters:
stn_id (str) – name/id of station of which to extract the data
features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available
static features are returned.
Examples
>>> fromai4water.datasetsimportCAMELS_AUS>>> dataset=CAMELS_AUS()get the names of stations>>> stns=dataset.stations()>>> len(stns) 222get all static data of all stations>>> static_data=dataset.fetch_static_features(stns)>>> static_data.shape (222, 110)get static data of one station only>>> static_data=dataset.fetch_static_features('305202')>>> static_data.shape (1, 110)get the names of static features>>> dataset.static_featuresget only selected features of all stations>>> static_data=dataset.fetch_static_features(stns,['catchment_di','elev_mean'])>>> static_data.shape (222, 2)
Fetches static attributes of one or more stations for one or
more category as dataframe.
Parameters:
stn_id (str) – name/id of station of which to extract the data
features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available
static features are returned.
Examples
>>> fromai4water.datasetsimportCAMELS_GB>>> dataset=CAMELS_GB()get the names of stations>>> stns=dataset.stations()>>> len(stns) 671get all static data of all stations>>> static_data=dataset.fetch_static_features(stns)>>> static_data.shape (671, 290)get static data of one station only>>> static_data=dataset.fetch_static_features('85004')>>> static_data.shape (1, 290)get the names of static features>>> dataset.static_featuresget only selected features of all stations>>> static_data=dataset.fetch_static_features(stns,['area','elev_mean'])>>> static_data.shape (671, 2)
>>> fromai4water.datasetsimportCAMELS_BR>>> dataset=CAMELS_BR(path=r'F:\data\CAMELS\CAMELS_BR')>>> df=dataset.fetch(stations=1,as_dataframe=True)>>> df=df.unstack()# the returned dataframe is a multi-indexed dataframe so we have to unstack it>>> df.shape(14245, 12)# get name of all stations as list>>> stns=dataset.stations()>>> len(stns)593# get data by station id>>> df=dataset.fetch(stations='46035000',as_dataframe=True).unstack()>>> df.shape(14245, 12)# get names of available dynamic features>>> dataset.dynamic_features# get only selected dynamic features>>> df=dataset.fetch(1,as_dataframe=True,... dynamic_features=['precipitation_cpc','evapotransp_mgb','temperature_mean','streamflow_m3s']).unstack()>>> df.shape(14245, 4)# get names of available static features>>> dataset.static_features# get data of 10 random stations>>> df=dataset.fetch(10,as_dataframe=True)>>> df.shape(170940, 10) # remember this is multi-indexed DataFrame
stn_id (int/list) – station id whose attribute to fetch
features (str/list) – name of attribute to fetch. Default is None, which will return all the
attributes for a particular station of the specified category.
Example
>>> dataset=Camels('CAMELS-BR')>>> df=dataset.fetch_static_features('11500000','climate')# read all static features of all stations>>> data=dataset.fetch_static_features(dataset.stations(),dataset.static_features)>>> data.shape(597, 67)
>>> fromai4water.datasetsimportCAMELS_US>>> dataset=CAMELS_US(path=r'F:\data\CAMELS\CAMELS_US')>>> df=dataset.fetch(stations=1,as_dataframe=True)>>> df=df.unstack()# the returned dataframe is a multi-indexed dataframe so we have to unstack it>>> df.shape(12784, 8)# get name of all stations as list>>> stns=dataset.stations()>>> len(stns)671# get data by station id>>> df=dataset.fetch(stations='11478500',as_dataframe=True).unstack()>>> df.shape(12784, 8)# get names of available dynamic features>>> dataset.dynamic_features# get only selected dynamic features>>> df=dataset.fetch(1,as_dataframe=True,... dynamic_features=['prcp(mm/day)','srad(W/m2)','tmax(C)','tmin(C)','Flow']).unstack()>>> df.shape(12784, 5)# get names of available static features>>> dataset.static_features# get data of 10 random stations>>> df=dataset.fetch(10,as_dataframe=True)>>> df.shape(102272, 10) # remember this is multi-indexed DataFrame
gets one or more static features of one or more stations
Parameters:
stn_id (str) – name/id of station of which to extract the data
features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available
static features are returned.
Examples
>>> fromai4water.datasetsimportCAMELS_US>>> camels=CAMELS_US()>>> st_data=camels.fetch_static_features('11532500')>>> st_data.shape (1, 59)get names of available static features>>> camels.static_featuresget specific features of one station>>> static_data=camels.fetch_static_features('11528700',>>> features=['area_gages2','geol_porostiy','soil_conductivity','elev_mean'])>>> static_data.shape (1, 4)get names of allstations>>> all_stns=camels.stations()>>> len(all_stns) 671>>> all_static_data=camels.fetch_static_features(all_stns)>>> all_static_data.shape (671, 59)
Downloads and processes CAMELS dataset of Chile following the work of
Alvarez-Garreton et al., 2018 .
Examples
>>> fromai4water.datasetsimportCAMELS_CL>>> dataset=CAMELS_CL()>>> df=dataset.fetch(stations=1,as_dataframe=True)>>> df=df.unstack()# the returned dataframe is a multi-indexed dataframe so we have to unstack it>>> df.shape (38374, 12)# get name of all stations as list>>> stns=dataset.stations()>>> len(stns)516# get data by station id>>> df=dataset.fetch(stations='11130001',as_dataframe=True).unstack()>>> df.shape(38374, 12)# get names of available dynamic features>>> dataset.dynamic_features# get only selected dynamic features>>> df=dataset.fetch(1,as_dataframe=True,... dynamic_features=['pet_hargreaves','precip_tmpa','tmean_cr2met','streamflow_m3s']).unstack()>>> df.shape(38374, 4)# get names of available static features>>> dataset.static_features# get data of 10 random stations>>> df=dataset.fetch(10,as_dataframe=True)>>> df.shape(460488, 10)
stn_id (str) – name/id of station of which to extract the data
features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available
static features are returned.
Examples
>>> fromai4water.datasetsimportCAMELS_CL>>> dataset=CAMELS_CL()get the names of stations>>> stns=dataset.stations()>>> len(stns) 516get all static data of all stations>>> static_data=dataset.fetch_static_features(stns)>>> static_data.shape (516, 104)get static data of one station only>>> static_data=dataset.fetch_static_features('11315001')>>> static_data.shape (1, 104)get the names of static features>>> dataset.static_featuresget only selected features of all stations>>> static_data=dataset.fetch_static_features(stns,['slope_mean','area'])>>> static_data.shape (516, 2)>>> data=dataset.fetch_static_features('2110002',features=['slope_mean','area'])>>> data.shape (1, 2)
Rainfall run-off dataset for Iowa (US) following the work of
Demir et al., 2022
Examples
>>> fromai4water.datasetsimportWaterBenchIowa>>> ds=WaterBenchIowa()... # fetch static and dynamic features of 5 stations>>> data=ds.fetch(5,as_dataframe=True)>>> data.shape# it is a multi-indexed DataFrame(184032, 5)... # fetch both static and dynamic features of 5 stations>>> data=ds.fetch(5,static_features="all",as_dataframe=True)>>> data.keys()dict_keys(['dynamic', 'static'])>>> data['static'].shape(5, 7)>>> data['dynamic']# returns a xarray DataSet... # using another method>>> data=ds.fetch_dynamic_features('644',as_dataframe=True)>>> data.unstack().shape(61344, 3)
stn_id (str) – name/id of station of which to extract the data
features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available
static features are returned.
Examples
>>> fromai4water.datasetsimportWaterBenchIowa>>> dataset=WaterBenchIowa()get the names of stations>>> stns=dataset.stations()>>> len(stns) 125get all static data of all stations>>> static_data=dataset.fetch_static_features(stns)>>> static_data.shape (125, 7)get static data of one station only>>> static_data=dataset.fetch_static_features('592')>>> static_data.shape (1, 7)get the names of static features>>> dataset.static_featuresget only selected features of all stations>>> static_data=dataset.fetch_static_features(stns,['slope','area'])>>> static_data.shape (125, 2)>>> data=dataset.fetch_static_features('592',features=['slope','area'])>>> data.shape (1, 2)
stn_id (str) – name/id of station of which to extract the data
features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available
static features are returned.
Examples
>>> fromai4water.datasetsimportLamaH>>> dataset=LamaH(time_step='daily',data_type='total_upstrm')>>> df=dataset.fetch_static_features('99')# (1, 61)... # get list of all static features>>> dataset.static_features>>> dataset.fetch_static_features('99',>>> features=['area_calc','elev_mean','agr_fra','sand_fra'])# (1, 4)
database for hydrometeorological modeling of 14,425 North American watersheds
from 1950-2018 following the work of Arsenault et al., 2020
The user must manually download the files, unpack them and provide
the path where these files are saved.
This data comes with multiple sources. Each source having one or more dynamic_features
Following data_source are available.
SNODAS_SWE
dynamic_features
dscharge, swe
SCDNA
discharge, pr, tasmin, tasmax
nonQC_stations
discharge, pr, tasmin, tasmax
Livneh
discharge, pr, tasmin, tasmax
ERA5
discharge, pr, tasmax, tasmin
ERAS5Land_SWE
discharge, swe
ERA5Land
discharge, pr, tasmax, tasmin
all sources contain one or more following dynamic_features
with following shapes
time
shape
(25202,)
watershedID
(14425,)
drainage_area
(14425,)
drainage_area_GSIM
(14425,)
flag_GSIM_boundaries
(14425,)
flag_artificial_boundaries
(14425,)
centroid_lat
(14425,)
centroid_lon
(14425,)
elevation
(14425,)
slope
(14425,)
discharge
(14425, 25202)
pr
(14425, 25202)
tasmax
(14425, 25202)
tasmin
(14425, 25202)
Examples
>>> fromai4water.datasetsimportHYSETS>>> dataset=HYSETS(path="path/to/HYSETS")... # fetch data of a random station>>> df=dataset.fetch(1,as_dataframe=True)>>> df.shape(25202, 5)>>> stations=dataset.stations()>>> len(stations)14425>>> df=dataset.fetch('999',as_dataframe=True)>>> df.unstack().shape(25202, 5)
returns static atttributes of one or multiple stations
Parameters:
stn_id (str) – name/id of station of which to extract the data
features (list/str, optional (default="all")) – The name/names of features to fetch. By default, all available
static features are returned.
Examples
>>> fromai4water.datasetsimportHYSETS>>> dataset=HYSETS()get the names of stations>>> stns=dataset.stations()>>> len(stns) 14425get all static data of all stations>>> static_data=dataset.fetch_static_features(stns)>>> static_data.shape (14425, 28)get static data of one station only>>> static_data=dataset.fetch_static_features('991')>>> static_data.shape (1, 28)get the names of static features>>> dataset.static_featuresget only selected features of all stations>>> static_data=dataset.fetch_static_features(stns,['Drainage_Area_km2','Elevation_m'])>>> static_data.shape (14425, 2)
Downloads and preprocesses HYPE [1] dataset from Lindstroem et al., 2010 [2] .
This is a rainfall-runoff dataset of Sweden of 564 stations from 1985 to
2019 at daily, monthly and yearly time steps.
Examples
>>> fromai4water.datasetsimportHYPE>>> dataset=HYPE()... # get data of 5% of stations>>> df=dataset.fetch(stations=0.05,as_dataframe=True)# returns a multiindex dataframe>>> df.shape (115047, 28)... # fetch data of 5 (randomly selected) stations>>> df=dataset.fetch(stations=5,as_dataframe=True)>>> df.shape (115047, 5)fetch data of 3 selected stations>>> df=dataset.fetch(stations=['564','563','562'],as_dataframe=True)>>> df.shape (115047, 3)... # fetch data of a single stations>>> df=dataset.fetch(stations='500',as_dataframe=True) (115047, 1)# get only selected dynamic features>>> df=dataset.fetch(stations='501',... dynamic_features=['AET_mm','Prec_mm','Streamflow_mm'],as_dataframe=True)# fetch data between selected periods>>> df=dataset.fetch(stations='225',st="20010101",en="20101231",as_dataframe=True)>>> df.shape (32868, 1)... # get data at monthly time step>>> dataset=HYPE(time_step="month")>>> df=dataset.fetch(stations='500',as_dataframe=True)>>> df.shape (3780, 1)