{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Running the model\n",
    "================\n",
    "\n",
    "\n",
    "This note explains in a nutshell how the data should be formatted and what preprocessing steps are needed to run the GAMCR model."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 1. Check That Your Dataset Has the Right Format\n",
    "\n",
    "### ➡️ Run the Notebook: `check_data.ipynb`\n",
    "\n",
    "Your dataset should be a file named `data_{site_name}.txt` with the following required column names:\n",
    "\n",
    "- **`timeyear`**: Represents the year in decimal format (e.g., 2022.45).  \n",
    "- **`p`**: Precipitation.  \n",
    "- **`pet`**: Potential evapotranspiration.  \n",
    "- **`q`**: Streamflow.  \n",
    "- **`date`**: A datetime object representing the date.  \n",
    "\n",
    "---\n",
    "\n",
    "### Folder Structure for Using GAMCR\n",
    "\n",
    "To properly use the GAMCR package, create a folder for your site named `{site_name}`. This folder should follow the structure below:\n",
    "\n",
    "- Place the `data_{site_name}.txt` file in this folder.\n",
    "- GAMCR will save the models you train for this site in the same folder.\n",
    "- Two subfolders will be automatically created and used by GAMCR:\n",
    "  - **`data/`**:  \n",
    "    This subfolder stores the preprocessed data, created when calling a `save_batch` type method.  \n",
    "  - **`results/`**:  \n",
    "    This subfolder saves statistics on the results of a trained model, created when calling the `compute_statistics` method.\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 2. Data Preprocessing\n",
    "\n",
    "### ➡️ Run the Script: `save_data_batch.py`\n",
    "\n",
    "To make training GAMCR more efficient, some computations should be performed offline before starting the training process.  \n",
    "\n",
    "To preprocess the data, run the script:  `save_data_batch.py`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 3.A Training a Model with Predefined Hyperparameters\n",
    "\n",
    "### ➡️ Run the Script: `train_models.py`\n",
    "\n",
    "As explained in our paper, GAMCR uses two regularization parameters.  \n",
    "\n",
    "If you choose to use the default values for these parameters (recommended), simply run the script: `train_models.py`\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 3.B Training a model selecting hyperparameters with cross validation\n",
    "\n",
    "In case you would like to optimize the selection of the hyperparamters, you can launch the script `CV_model.py` which will train the model for different values of the hyperparameters (located on a 2D grid).\n",
    "\n",
    "Once all models are trained, you can investigate the results yourself to find the best one and use the script `find_best_model_CV.py` to use an automated processure to find the best model.\n",
    "\n",
    "You can find examples of these files in the folder `./experiments/real_data/data_and_visualization/CV/`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}