{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "Running the model\n", "================\n", "\n", "\n", "This note explains in a nutshell how the data should be formatted and what preprocessing steps are needed to run the GAMCR model." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 1. Check That Your Dataset Has the Right Format\n", "\n", "### ➡️ Run the Notebook: `check_data.ipynb`\n", "\n", "Your dataset should be a file named `data_{site_name}.txt` with the following required column names:\n", "\n", "- **`timeyear`**: Represents the year in decimal format (e.g., 2022.45). \n", "- **`p`**: Precipitation. \n", "- **`pet`**: Potential evapotranspiration. \n", "- **`q`**: Streamflow. \n", "- **`date`**: A datetime object representing the date. \n", "\n", "---\n", "\n", "### Folder Structure for Using GAMCR\n", "\n", "To properly use the GAMCR package, create a folder for your site named `{site_name}`. This folder should follow the structure below:\n", "\n", "- Place the `data_{site_name}.txt` file in this folder.\n", "- GAMCR will save the models you train for this site in the same folder.\n", "- Two subfolders will be automatically created and used by GAMCR:\n", " - **`data/`**: \n", " This subfolder stores the preprocessed data, created when calling a `save_batch` type method. \n", " - **`results/`**: \n", " This subfolder saves statistics on the results of a trained model, created when calling the `compute_statistics` method.\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 2. Data Preprocessing\n", "\n", "### ➡️ Run the Script: `save_data_batch.py`\n", "\n", "To make training GAMCR more efficient, some computations should be performed offline before starting the training process. \n", "\n", "To preprocess the data, run the script: `save_data_batch.py`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 3.A Training a Model with Predefined Hyperparameters\n", "\n", "### ➡️ Run the Script: `train_models.py`\n", "\n", "As explained in our paper, GAMCR uses two regularization parameters. \n", "\n", "If you choose to use the default values for these parameters (recommended), simply run the script: `train_models.py`\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 3.B Training a model selecting hyperparameters with cross validation\n", "\n", "In case you would like to optimize the selection of the hyperparamters, you can launch the script `CV_model.py` which will train the model for different values of the hyperparameters (located on a 2D grid).\n", "\n", "Once all models are trained, you can investigate the results yourself to find the best one and use the script `find_best_model_CV.py` to use an automated processure to find the best model.\n", "\n", "You can find examples of these files in the folder `./experiments/real_data/data_and_visualization/CV/`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.12" } }, "nbformat": 4, "nbformat_minor": 4 }