Running the model#

This note explains in a nutshell how the data should be formatted and what preprocessing steps are needed to run the GAMCR model.

1. Check That Your Dataset Has the Right Format#

➡️ Run the Notebook: `check_data.ipynb`#

Your dataset should be a file named data_{site_name}.txt with the following required column names:

``timeyear``: Represents the year in decimal format (e.g., 2022.45).
``p``: Precipitation.
``pet``: Potential evapotranspiration.
``q``: Streamflow.
``date``: A datetime object representing the date.

Folder Structure for Using GAMCR#

To properly use the GAMCR package, create a folder for your site named {site_name}. This folder should follow the structure below:

Place the data_{site_name}.txt file in this folder.
GAMCR will save the models you train for this site in the same folder.
Two subfolders will be automatically created and used by GAMCR:
- ``data/``: This subfolder stores the preprocessed data, created when calling a save_batch type method.
- ``results/``: This subfolder saves statistics on the results of a trained model, created when calling the compute_statistics method.

2. Data Preprocessing#

➡️ Run the Script: `save_data_batch.py`#

To make training GAMCR more efficient, some computations should be performed offline before starting the training process.

To preprocess the data, run the script: save_data_batch.py

3.A Training a Model with Predefined Hyperparameters#

➡️ Run the Script: `train_models.py`#

As explained in our paper, GAMCR uses two regularization parameters.

If you choose to use the default values for these parameters (recommended), simply run the script: train_models.py

3.B Training a model selecting hyperparameters with cross validation#

In case you would like to optimize the selection of the hyperparamters, you can launch the script CV_model.py which will train the model for different values of the hyperparameters (located on a 2D grid).

Once all models are trained, you can investigate the results yourself to find the best one and use the script find_best_model_CV.py to use an automated processure to find the best model.

You can find examples of these files in the folder ./experiments/real_data/data_and_visualization/CV/.

[ ]: