Running the model#

This note explains in a nutshell how the data should be formatted and what preprocessing steps are needed to run the GAMCR model.

1. Check That Your Dataset Has the Right Format#

➡️ Run the Notebook: check_data.ipynb#

Your dataset should be a file named data_{site_name}.txt with the following required column names:

  • ``timeyear``: Represents the year in decimal format (e.g., 2022.45).

  • ``p``: Precipitation.

  • ``pet``: Potential evapotranspiration.

  • ``q``: Streamflow.

  • ``date``: A datetime object representing the date.


Folder Structure for Using GAMCR#

To properly use the GAMCR package, create a folder for your site named {site_name}. This folder should follow the structure below:

  • Place the data_{site_name}.txt file in this folder.

  • GAMCR will save the models you train for this site in the same folder.

  • Two subfolders will be automatically created and used by GAMCR:

    • ``data/``: This subfolder stores the preprocessed data, created when calling a save_batch type method.

    • ``results/``: This subfolder saves statistics on the results of a trained model, created when calling the compute_statistics method.

2. Data Preprocessing#

➡️ Run the Script: save_data_batch.py#

To make training GAMCR more efficient, some computations should be performed offline before starting the training process.

To preprocess the data, run the script: save_data_batch.py

3.A Training a Model with Predefined Hyperparameters#

➡️ Run the Script: train_models.py#

As explained in our paper, GAMCR uses two regularization parameters.

If you choose to use the default values for these parameters (recommended), simply run the script: train_models.py

3.B Training a model selecting hyperparameters with cross validation#

In case you would like to optimize the selection of the hyperparamters, you can launch the script CV_model.py which will train the model for different values of the hyperparameters (located on a 2D grid).

Once all models are trained, you can investigate the results yourself to find the best one and use the script find_best_model_CV.py to use an automated processure to find the best model.

You can find examples of these files in the folder ./experiments/real_data/data_and_visualization/CV/.

[ ]: