Running the model#
This note explains in a nutshell how the data should be formatted and what preprocessing steps are needed to run the GAMCR model.
1. Check That Your Dataset Has the Right Format#
➡️ Run the Notebook: check_data.ipynb#
Your dataset should be a file named data_{site_name}.txt with the following required column names:
``timeyear``: Represents the year in decimal format (e.g., 2022.45).
``p``: Precipitation.
``pet``: Potential evapotranspiration.
``q``: Streamflow.
``date``: A datetime object representing the date.
Folder Structure for Using GAMCR#
To properly use the GAMCR package, create a folder for your site named {site_name}. This folder should follow the structure below:
Place the
data_{site_name}.txtfile in this folder.GAMCR will save the models you train for this site in the same folder.
Two subfolders will be automatically created and used by GAMCR:
``data/``: This subfolder stores the preprocessed data, created when calling a
save_batchtype method.``results/``: This subfolder saves statistics on the results of a trained model, created when calling the
compute_statisticsmethod.
2. Data Preprocessing#
➡️ Run the Script: save_data_batch.py#
To make training GAMCR more efficient, some computations should be performed offline before starting the training process.
To preprocess the data, run the script: save_data_batch.py
3.A Training a Model with Predefined Hyperparameters#
➡️ Run the Script: train_models.py#
As explained in our paper, GAMCR uses two regularization parameters.
If you choose to use the default values for these parameters (recommended), simply run the script: train_models.py
3.B Training a model selecting hyperparameters with cross validation#
In case you would like to optimize the selection of the hyperparamters, you can launch the script CV_model.py which will train the model for different values of the hyperparameters (located on a 2D grid).
Once all models are trained, you can investigate the results yourself to find the best one and use the script find_best_model_CV.py to use an automated processure to find the best model.
You can find examples of these files in the folder ./experiments/real_data/data_and_visualization/CV/.
[ ]: