GAMCR.dataset.dataset module#

class GAMCR.dataset.dataset.Dataset(max_lag=1440, features={}, n_splines=10, lam=10)[source]#

Bases: object

A class used to preprocess, load and save data and models

max_lag#

maximum lag time consider for the transfer functions

Type:

int

features#

dictionary of the different features used in the model

Type:

dic

n_splines#

number of splines considered for a GAM

Type:

int

lam#

regularization parameter related to the smoothing penalty in the GAM

Type:

positive float

load_model(path_model, lam=None)[source]#

Load the model saved in the file with path: path_model

save_model_parameters(save_folder, name='', add_dic={})[source]#

Save the model parameters

save_batch_common_GAM(allGISID, save_folder, ntest=0, nstart=0, nfiles=40)[source]#

Preprocess and save the data. It makes sure to use the same knots for the GAMs for the different sites.

save_batch(save_folder, datafile, nstart=0, nfiles=100)[source]#

Preprocess and save the data.

get_fluxes(datafile, nstart=0, ntest=0, size_time_window=None)[source]#

Load the data and get the PET, precipitation, dates and streamflow time series. This method is called by the ‘save_batch’ type methods.

get_design(pet, x, y, dates)[source]#

Compute the design matrix of the GAMs. This method is called by the ‘save_batch’ type methods.

get_GAMdesign(X, J)[source]#

Compute the matrix that is used in the convolution to get the streamflow values.

load_data(save_folder, max_files=100, test_mode=False)[source]#

Load the data that has been already proprecessed and saved using one of the ‘save_batch’ type methods.

compute_spline_basis(show_splines=False)[source]#

Compute the basis functions (B-splines) one which we decompose the transfer functions.

load_model(path_model, lam=None)[source]#

Load the model saved in the file with path: path_model

Parameters:
path_model str

The location of the file where the model has been saved

lam positive float, optional

Regularization parameter for the smoothing penalty used when fitting the GAM

save_model_parameters(save_folder, name='', add_dic={})[source]#
save_batch_common_GAM(allGISID, save_folder, ntest=0, nstart=0, nfiles=40)[source]#

Preprocess and save the data for all sites in allGISID.

It makes sure to use the same knots for the GAMs for the different sites. This was needed in our paper when trying to fit a model predicting model coefficients from catchment’s features. Otherwise, coefficients learned at different sites would not correspond to the same quantities.

Parameters:
allGISID list

List of the names of the sites

save_folder str

Path where we can find a folder ‘{site}’ for all ‘{site}’ in allGISID. In this folder, the .txt file with the data at the corresponding site should be saved.

ntest int, optional

Number of more recent time points that should be discarded.

nstart int, optional

Number of older time points that should be discarded.

nfiles int, optional

The preprocessed data will be splitted and saved in different files (to potentially speed up the loading process of the data if only some fraction of the total dataset is needed). This integer specifies the number of files used to split the data.

save_batch(save_folder, datafile, nstart=0, nfiles=100)[source]#

Preprocess and save the data for all sites in allGISID.

Parameters:
save_folder str

Path where we can find a folder ‘{site}’ for the site under study. In this folder, the .txt file with the data at the corresponding site should be saved.

nstart int, optional

Number of older time points that should be discarded.

nfiles int, optional

The preprocessed data will be splitted and saved in different files (to potentially speed up the loading process of the data if only some fraction of the total dataset is needed). This integer specifies the number of files used to split the data.

get_fluxes(datafile, nstart=0, ntest=0, size_time_window=None)[source]#

Load the data and get the PET, precipitation, dates and streamflow time series. This method is called by the ‘save_batch’ type methods.

Parameters:
datafile str

Path of the .txt file with the data at the corresponding site.

nstart int, optional

Number of older time points that should be discarded.

get_design(pet, x, y, dates)[source]#

Compute the matrix that is used in the convolution to get the streamflow values.

Parameters:
pet array

Potential evapotranspiration.

x array

Precipitation time series.

y array

Streamflow time series.

dates array

Time series with dates.

get_GAMdesign(X, J)[source]#

Compute the matrix that is used in the convolution to get the streamflow values.

Parameters:
X array

Design matrix of the GAM compute from the method ‘get_design’. X has dimension: number of timepoints x number of features

J array

Precipitation time series.

load_data(save_folder, max_files=100, test_mode=False)[source]#

Load the data that has been already proprecessed and saved using one of the ‘save_batch’ type methods.

Parameters:
save_folder str

Path of the folder where the data is stored.

max_files int

Total number of files among the ones saved by the ‘save_batch’ type method to load.

test_mode bool

If False, the oldest data will be loaded. Otherwise, the more recent one is loaded.

compute_spline_basis(show_splines=False)[source]#

Compute the basis functions (B-splines) one which we decompose the transfer functions.

Parameters:
show_splines bool, optional

If True, a figure showing the basis functions will be produced.