---
name: r.learn.train.py
description: Supervised classification and regression of GRASS rasters using the python scikit-learn package.
keywords: [ raster, classification, regression, machine learning, scikit-learn, training, parallel ]
---

# r.learn.train.py

Supervised classification and regression of GRASS rasters using the python scikit-learn package.

=== "Command line"

    **r.learn.train.py**
    [**-fsb**]
    **group**=*name*
    [**training_map**=*name*]
    [**training_points**=*name*]
    [**field**=*name*]
    **save_model**=*name*
    [**model_name**=*string*]
    [**penalty**=*string* [,*string*,...]]
    [**alpha**=*float* [,*float*,...]]
    [**l1_ratio**=*float* [,*float*,...]]
    [**c**=*float* [,*float*,...]]
    [**epsilon**=*float* [,*float*,...]]
    [**max_features**=*integer* [,*integer*,...]]
    [**max_depth**=*integer* [,*integer*,...]]
    [**min_samples_leaf**=*integer* [,*integer*,...]]
    [**n_estimators**=*integer* [,*integer*,...]]
    [**learning_rate**=*float* [,*float*,...]]
    [**subsample**=*float* [,*float*,...]]
    [**n_neighbors**=*integer* [,*integer*,...]]
    [**hidden_units**=*string* [,*string*,...]]
    [**weights**=*string* [,*string*,...]]
    [**group_raster**=*name*]
    [**cv**=*integer*]
    [**preds_file**=*name*]
    [**classif_file**=*name*]
    [**category_maps**=*name* [,*name*,...]]
    [**fimp_file**=*name*]
    [**param_file**=*name*]
    [**random_state**=*integer*]
    [**n_jobs**=*integer*]
    [**save_training**=*name*]
    [**load_training**=*name*]
    [**--overwrite**]
    [**--verbose**]
    [**--quiet**]
    [**--qq**]
    [**--ui**]

    Example:

    ```sh
    r.learn.train.py group=name training_map=name save_model=name
    ```

=== "Python (grass.script)"

    *grass.script.run_command*("***r.learn.train.py***",
        **group**,
        **training_map**=*None*,
        **training_points**=*None*,
        **field**=*None*,
        **save_model**,
        **model_name**=*"RandomForestClassifier"*,
        **penalty**=*"l2"*,
        **alpha**=*0.0001*,
        **l1_ratio**=*0.15*,
        **c**=*1.0*,
        **epsilon**=*0.1*,
        **max_features**=*0*,
        **max_depth**=*0*,
        **min_samples_leaf**=*1*,
        **n_estimators**=*100*,
        **learning_rate**=*0.1*,
        **subsample**=*1.0*,
        **n_neighbors**=*5*,
        **hidden_units**=*"(100;100)"*,
        **weights**=*"uniform"*,
        **group_raster**=*None*,
        **cv**=*1*,
        **preds_file**=*None*,
        **classif_file**=*None*,
        **category_maps**=*None*,
        **fimp_file**=*None*,
        **param_file**=*None*,
        **random_state**=*1*,
        **n_jobs**=*-2*,
        **save_training**=*None*,
        **load_training**=*None*,
        **flags**=*None*,
        **overwrite**=*None*,
        **verbose**=*None*,
        **quiet**=*None*,
        **superquiet**=*None*)

    Example:

    ```python
    gs.run_command("r.learn.train.py", group="name", training_map="name", save_model="name")
    ```

=== "Python (grass.tools)"

    *grass.tools.Tools.r_learn_train_py*(**group**,
        **training_map**=*None*,
        **training_points**=*None*,
        **field**=*None*,
        **save_model**,
        **model_name**=*"RandomForestClassifier"*,
        **penalty**=*"l2"*,
        **alpha**=*0.0001*,
        **l1_ratio**=*0.15*,
        **c**=*1.0*,
        **epsilon**=*0.1*,
        **max_features**=*0*,
        **max_depth**=*0*,
        **min_samples_leaf**=*1*,
        **n_estimators**=*100*,
        **learning_rate**=*0.1*,
        **subsample**=*1.0*,
        **n_neighbors**=*5*,
        **hidden_units**=*"(100;100)"*,
        **weights**=*"uniform"*,
        **group_raster**=*None*,
        **cv**=*1*,
        **preds_file**=*None*,
        **classif_file**=*None*,
        **category_maps**=*None*,
        **fimp_file**=*None*,
        **param_file**=*None*,
        **random_state**=*1*,
        **n_jobs**=*-2*,
        **save_training**=*None*,
        **load_training**=*None*,
        **flags**=*None*,
        **overwrite**=*None*,
        **verbose**=*None*,
        **quiet**=*None*,
        **superquiet**=*None*)

    Example:

    ```python
    tools = Tools()
    tools.r_learn_train_py(group="name", training_map="name", save_model="name")
    ```

    This grass.tools API is experimental in version 8.5 and expected to be stable in version 8.6.

## Parameters

=== "Command line"

    **group**=*name* **[required]**  
    &nbsp;&nbsp;&nbsp;&nbsp;Group of raster layers to be classified  
    &nbsp;&nbsp;&nbsp;&nbsp;GRASS imagery group of raster maps representing predictor variables to be used in the machine learning model  
    **training_map**=*name*  
    &nbsp;&nbsp;&nbsp;&nbsp;Labelled pixels  
    &nbsp;&nbsp;&nbsp;&nbsp;Raster map with labelled pixels for training  
    **training_points**=*name*  
    &nbsp;&nbsp;&nbsp;&nbsp;Vector map with training samples  
    &nbsp;&nbsp;&nbsp;&nbsp;Vector points map where each point is used as training sample  
    **field**=*name*  
    &nbsp;&nbsp;&nbsp;&nbsp;Response attribute column  
    &nbsp;&nbsp;&nbsp;&nbsp;Name of attribute column in training_points table containing response values  
    **save_model**=*name* **[required]**  
    &nbsp;&nbsp;&nbsp;&nbsp;Save model to file (for compression use e.g. '.gz' extension)  
    &nbsp;&nbsp;&nbsp;&nbsp;Name of file to store model results using python joblib  
    **model_name**=*string*  
    &nbsp;&nbsp;&nbsp;&nbsp;model_name  
    &nbsp;&nbsp;&nbsp;&nbsp;Supervised learning model to use  
    &nbsp;&nbsp;&nbsp;&nbsp;Allowed values: *LogisticRegression, LinearRegression, SGDClassifier, SGDRegressor, LinearDiscriminantAnalysis, QuadraticDiscriminantAnalysis, KNeighborsClassifier, KNeighborsRegressor, GaussianNB, DecisionTreeClassifier, DecisionTreeRegressor, RandomForestClassifier, RandomForestRegressor, ExtraTreesClassifier, ExtraTreesRegressor, GradientBoostingClassifier, GradientBoostingRegressor, HistGradientBoostingClassifier, HistGradientBoostingRegressor, SVC, SVR, MLPClassifier, MLPRegressor*  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *RandomForestClassifier*  
    **penalty**=*string* [,*string*,...]  
    &nbsp;&nbsp;&nbsp;&nbsp;The regularization method  
    &nbsp;&nbsp;&nbsp;&nbsp;The regularization method to be used for the SGDClassifier and SGDRegressor  
    &nbsp;&nbsp;&nbsp;&nbsp;Allowed values: *l1, l2, elasticnet*  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *l2*  
    **alpha**=*float* [,*float*,...]  
    &nbsp;&nbsp;&nbsp;&nbsp;Constant that multiplies the regularization term  
    &nbsp;&nbsp;&nbsp;&nbsp;Constant that multiplies the regularization term for SGDClassifier/SGDRegressor/MLPClassifier/MLPRegressor  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *0.0001*  
    **l1_ratio**=*float* [,*float*,...]  
    &nbsp;&nbsp;&nbsp;&nbsp;The Elastic Net mixing parameter  
    &nbsp;&nbsp;&nbsp;&nbsp;The Elastic Net mixing parameter for SGDClassifier/SGDRegressor  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *0.15*  
    **c**=*float* [,*float*,...]  
    &nbsp;&nbsp;&nbsp;&nbsp;Inverse of regularization strength  
    &nbsp;&nbsp;&nbsp;&nbsp;Inverse of regularization strength (LogisticRegression and SVC/SVR)  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *1.0*  
    **epsilon**=*float* [,*float*,...]  
    &nbsp;&nbsp;&nbsp;&nbsp;Epsilon in the SVR model  
    &nbsp;&nbsp;&nbsp;&nbsp;Epsilon in the SVR model  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *0.1*  
    **max_features**=*integer* [,*integer*,...]  
    &nbsp;&nbsp;&nbsp;&nbsp;Number of features available during node splitting; zero uses estimator defaults  
    &nbsp;&nbsp;&nbsp;&nbsp;Number of features available during node splitting (tree-based classifiers and regressors)  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *0*  
    **max_depth**=*integer* [,*integer*,...]  
    &nbsp;&nbsp;&nbsp;&nbsp;Maximum tree depth; zero uses estimator defaults  
    &nbsp;&nbsp;&nbsp;&nbsp;Maximum tree depth for tree-based method; zero uses estimator defaults (full-growing for Decision trees and Randomforest, 3 for GBM)  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *0*  
    **min_samples_leaf**=*integer* [,*integer*,...]  
    &nbsp;&nbsp;&nbsp;&nbsp;The minimum number of samples required to form a leaf node  
    &nbsp;&nbsp;&nbsp;&nbsp;The minimum number of samples required to form a leaf node in tree-based estimators  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *1*  
    **n_estimators**=*integer* [,*integer*,...]  
    &nbsp;&nbsp;&nbsp;&nbsp;Number of estimators  
    &nbsp;&nbsp;&nbsp;&nbsp;Number of estimators (trees) in ensemble tree-based estimators  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *100*  
    **learning_rate**=*float* [,*float*,...]  
    &nbsp;&nbsp;&nbsp;&nbsp;learning rate  
    &nbsp;&nbsp;&nbsp;&nbsp;learning rate (also known as shrinkage) for gradient boosting methods  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *0.1*  
    **subsample**=*float* [,*float*,...]  
    &nbsp;&nbsp;&nbsp;&nbsp;The fraction of samples to be used for fitting  
    &nbsp;&nbsp;&nbsp;&nbsp;The fraction of samples to be used for fitting, controls stochastic behaviour of gradient boosting methods  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *1.0*  
    **n_neighbors**=*integer* [,*integer*,...]  
    &nbsp;&nbsp;&nbsp;&nbsp;Number of neighbors to use  
    &nbsp;&nbsp;&nbsp;&nbsp;Number of neighbors to use  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *5*  
    **hidden_units**=*string* [,*string*,...]  
    &nbsp;&nbsp;&nbsp;&nbsp;Number of neurons to use in the hidden layers  
    &nbsp;&nbsp;&nbsp;&nbsp;Number of neurons to use in each layer, i.e. (100;50) for two layers  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *(100;100)*  
    **weights**=*string* [,*string*,...]  
    &nbsp;&nbsp;&nbsp;&nbsp;weight function  
    &nbsp;&nbsp;&nbsp;&nbsp;Distance weight function for k-nearest neighbours model prediction  
    &nbsp;&nbsp;&nbsp;&nbsp;Allowed values: *uniform, distance*  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *uniform*  
    **group_raster**=*name*  
    &nbsp;&nbsp;&nbsp;&nbsp;Custom group ids for training samples from GRASS raster  
    &nbsp;&nbsp;&nbsp;&nbsp;GRASS raster containing group ids for training samples. Samples with the same group id will not be split between training and test cross-validation folds  
    **cv**=*integer*  
    &nbsp;&nbsp;&nbsp;&nbsp;Number of cross-validation folds  
    &nbsp;&nbsp;&nbsp;&nbsp;Number of cross-validation folds  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *1*  
    **preds_file**=*name*  
    &nbsp;&nbsp;&nbsp;&nbsp;Save cross-validation predictions to csv  
    &nbsp;&nbsp;&nbsp;&nbsp;Name of output file in which to save the cross-validation predictions  
    **classif_file**=*name*  
    &nbsp;&nbsp;&nbsp;&nbsp;Save classification report to csv  
    &nbsp;&nbsp;&nbsp;&nbsp;Name of output file to save the classification report  
    **category_maps**=*name* [,*name*,...]  
    &nbsp;&nbsp;&nbsp;&nbsp;Names of categorical rasters within the imagery group  
    &nbsp;&nbsp;&nbsp;&nbsp;Names of categorical rasters within the imagery group that will be one-hot encoded. Leave empty if none.  
    **fimp_file**=*name*  
    &nbsp;&nbsp;&nbsp;&nbsp;Save feature importances to csv  
    &nbsp;&nbsp;&nbsp;&nbsp;Name for output file  
    **param_file**=*name*  
    &nbsp;&nbsp;&nbsp;&nbsp;Save hyperparameter search scores to csv  
    &nbsp;&nbsp;&nbsp;&nbsp;Name of file to save the hyperparameter tuning results  
    **random_state**=*integer*  
    &nbsp;&nbsp;&nbsp;&nbsp;Seed to use for random state  
    &nbsp;&nbsp;&nbsp;&nbsp;Seed to use for random state to enable reproducible results for estimators that have stochastic components  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *1*  
    **n_jobs**=*integer*  
    &nbsp;&nbsp;&nbsp;&nbsp;Number of cores for multiprocessing  
    &nbsp;&nbsp;&nbsp;&nbsp;Number of cores for multiprocessing, -2 is n_cores-1  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *-2*  
    **save_training**=*name*  
    &nbsp;&nbsp;&nbsp;&nbsp;Save training data to csv  
    &nbsp;&nbsp;&nbsp;&nbsp;Name of output file to save training data in comma-delimited format  
    **load_training**=*name*  
    &nbsp;&nbsp;&nbsp;&nbsp;Load training data from csv  
    &nbsp;&nbsp;&nbsp;&nbsp;Load previously extracted training data from a csv file  
    **-f**  
    &nbsp;&nbsp;&nbsp;&nbsp;Compute Feature importances  
    &nbsp;&nbsp;&nbsp;&nbsp;Compute feature importances using permutation  
    **-s**  
    &nbsp;&nbsp;&nbsp;&nbsp;Standardization preprocessing  
    &nbsp;&nbsp;&nbsp;&nbsp;Standardize feature variables (convert values the get zero mean and unit variance)  
    **-b**  
    &nbsp;&nbsp;&nbsp;&nbsp;Balance training data using class weights  
    &nbsp;&nbsp;&nbsp;&nbsp;Automatically adjust weights inversely proportional to class frequencies  
    **--overwrite**  
    &nbsp;&nbsp;&nbsp;&nbsp;Allow output files to overwrite existing files  
    **--help**  
    &nbsp;&nbsp;&nbsp;&nbsp;Print usage summary  
    **--verbose**  
    &nbsp;&nbsp;&nbsp;&nbsp;Verbose module output  
    **--quiet**  
    &nbsp;&nbsp;&nbsp;&nbsp;Quiet module output  
    **--qq**  
    &nbsp;&nbsp;&nbsp;&nbsp;Very quiet module output  
    **--ui**  
    &nbsp;&nbsp;&nbsp;&nbsp;Force launching GUI dialog

=== "Python (grass.script)"

    **group** : str, *required*  
    &nbsp;&nbsp;&nbsp;&nbsp;Group of raster layers to be classified  
    &nbsp;&nbsp;&nbsp;&nbsp;GRASS imagery group of raster maps representing predictor variables to be used in the machine learning model  
    &nbsp;&nbsp;&nbsp;&nbsp;Used as: input, group, *name*  
    **training_map** : str, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Labelled pixels  
    &nbsp;&nbsp;&nbsp;&nbsp;Raster map with labelled pixels for training  
    &nbsp;&nbsp;&nbsp;&nbsp;Used as: input, raster, *name*  
    **training_points** : str, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Vector map with training samples  
    &nbsp;&nbsp;&nbsp;&nbsp;Vector points map where each point is used as training sample  
    &nbsp;&nbsp;&nbsp;&nbsp;Used as: input, vector, *name*  
    **field** : str, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Response attribute column  
    &nbsp;&nbsp;&nbsp;&nbsp;Name of attribute column in training_points table containing response values  
    &nbsp;&nbsp;&nbsp;&nbsp;Used as: input, dbcolumn, *name*  
    **save_model** : str, *required*  
    &nbsp;&nbsp;&nbsp;&nbsp;Save model to file (for compression use e.g. '.gz' extension)  
    &nbsp;&nbsp;&nbsp;&nbsp;Name of file to store model results using python joblib  
    &nbsp;&nbsp;&nbsp;&nbsp;Used as: output, file, *name*  
    **model_name** : str, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;model_name  
    &nbsp;&nbsp;&nbsp;&nbsp;Supervised learning model to use  
    &nbsp;&nbsp;&nbsp;&nbsp;Allowed values: *LogisticRegression, LinearRegression, SGDClassifier, SGDRegressor, LinearDiscriminantAnalysis, QuadraticDiscriminantAnalysis, KNeighborsClassifier, KNeighborsRegressor, GaussianNB, DecisionTreeClassifier, DecisionTreeRegressor, RandomForestClassifier, RandomForestRegressor, ExtraTreesClassifier, ExtraTreesRegressor, GradientBoostingClassifier, GradientBoostingRegressor, HistGradientBoostingClassifier, HistGradientBoostingRegressor, SVC, SVR, MLPClassifier, MLPRegressor*  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *RandomForestClassifier*  
    **penalty** : str | list[str], *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;The regularization method  
    &nbsp;&nbsp;&nbsp;&nbsp;The regularization method to be used for the SGDClassifier and SGDRegressor  
    &nbsp;&nbsp;&nbsp;&nbsp;Allowed values: *l1, l2, elasticnet*  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *l2*  
    **alpha** : float | list[float] | str, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Constant that multiplies the regularization term  
    &nbsp;&nbsp;&nbsp;&nbsp;Constant that multiplies the regularization term for SGDClassifier/SGDRegressor/MLPClassifier/MLPRegressor  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *0.0001*  
    **l1_ratio** : float | list[float] | str, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;The Elastic Net mixing parameter  
    &nbsp;&nbsp;&nbsp;&nbsp;The Elastic Net mixing parameter for SGDClassifier/SGDRegressor  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *0.15*  
    **c** : float | list[float] | str, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Inverse of regularization strength  
    &nbsp;&nbsp;&nbsp;&nbsp;Inverse of regularization strength (LogisticRegression and SVC/SVR)  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *1.0*  
    **epsilon** : float | list[float] | str, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Epsilon in the SVR model  
    &nbsp;&nbsp;&nbsp;&nbsp;Epsilon in the SVR model  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *0.1*  
    **max_features** : int | list[int] | str, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Number of features available during node splitting; zero uses estimator defaults  
    &nbsp;&nbsp;&nbsp;&nbsp;Number of features available during node splitting (tree-based classifiers and regressors)  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *0*  
    **max_depth** : int | list[int] | str, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Maximum tree depth; zero uses estimator defaults  
    &nbsp;&nbsp;&nbsp;&nbsp;Maximum tree depth for tree-based method; zero uses estimator defaults (full-growing for Decision trees and Randomforest, 3 for GBM)  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *0*  
    **min_samples_leaf** : int | list[int] | str, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;The minimum number of samples required to form a leaf node  
    &nbsp;&nbsp;&nbsp;&nbsp;The minimum number of samples required to form a leaf node in tree-based estimators  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *1*  
    **n_estimators** : int | list[int] | str, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Number of estimators  
    &nbsp;&nbsp;&nbsp;&nbsp;Number of estimators (trees) in ensemble tree-based estimators  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *100*  
    **learning_rate** : float | list[float] | str, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;learning rate  
    &nbsp;&nbsp;&nbsp;&nbsp;learning rate (also known as shrinkage) for gradient boosting methods  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *0.1*  
    **subsample** : float | list[float] | str, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;The fraction of samples to be used for fitting  
    &nbsp;&nbsp;&nbsp;&nbsp;The fraction of samples to be used for fitting, controls stochastic behaviour of gradient boosting methods  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *1.0*  
    **n_neighbors** : int | list[int] | str, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Number of neighbors to use  
    &nbsp;&nbsp;&nbsp;&nbsp;Number of neighbors to use  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *5*  
    **hidden_units** : str | list[str], *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Number of neurons to use in the hidden layers  
    &nbsp;&nbsp;&nbsp;&nbsp;Number of neurons to use in each layer, i.e. (100;50) for two layers  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *(100;100)*  
    **weights** : str | list[str], *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;weight function  
    &nbsp;&nbsp;&nbsp;&nbsp;Distance weight function for k-nearest neighbours model prediction  
    &nbsp;&nbsp;&nbsp;&nbsp;Allowed values: *uniform, distance*  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *uniform*  
    **group_raster** : str, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Custom group ids for training samples from GRASS raster  
    &nbsp;&nbsp;&nbsp;&nbsp;GRASS raster containing group ids for training samples. Samples with the same group id will not be split between training and test cross-validation folds  
    &nbsp;&nbsp;&nbsp;&nbsp;Used as: input, raster, *name*  
    **cv** : int, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Number of cross-validation folds  
    &nbsp;&nbsp;&nbsp;&nbsp;Number of cross-validation folds  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *1*  
    **preds_file** : str, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Save cross-validation predictions to csv  
    &nbsp;&nbsp;&nbsp;&nbsp;Name of output file in which to save the cross-validation predictions  
    &nbsp;&nbsp;&nbsp;&nbsp;Used as: output, file, *name*  
    **classif_file** : str, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Save classification report to csv  
    &nbsp;&nbsp;&nbsp;&nbsp;Name of output file to save the classification report  
    &nbsp;&nbsp;&nbsp;&nbsp;Used as: output, file, *name*  
    **category_maps** : str | list[str], *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Names of categorical rasters within the imagery group  
    &nbsp;&nbsp;&nbsp;&nbsp;Names of categorical rasters within the imagery group that will be one-hot encoded. Leave empty if none.  
    &nbsp;&nbsp;&nbsp;&nbsp;Used as: input, raster, *name*  
    **fimp_file** : str, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Save feature importances to csv  
    &nbsp;&nbsp;&nbsp;&nbsp;Name for output file  
    &nbsp;&nbsp;&nbsp;&nbsp;Used as: output, file, *name*  
    **param_file** : str, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Save hyperparameter search scores to csv  
    &nbsp;&nbsp;&nbsp;&nbsp;Name of file to save the hyperparameter tuning results  
    &nbsp;&nbsp;&nbsp;&nbsp;Used as: output, file, *name*  
    **random_state** : int, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Seed to use for random state  
    &nbsp;&nbsp;&nbsp;&nbsp;Seed to use for random state to enable reproducible results for estimators that have stochastic components  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *1*  
    **n_jobs** : int, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Number of cores for multiprocessing  
    &nbsp;&nbsp;&nbsp;&nbsp;Number of cores for multiprocessing, -2 is n_cores-1  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *-2*  
    **save_training** : str, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Save training data to csv  
    &nbsp;&nbsp;&nbsp;&nbsp;Name of output file to save training data in comma-delimited format  
    &nbsp;&nbsp;&nbsp;&nbsp;Used as: output, file, *name*  
    **load_training** : str, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Load training data from csv  
    &nbsp;&nbsp;&nbsp;&nbsp;Load previously extracted training data from a csv file  
    &nbsp;&nbsp;&nbsp;&nbsp;Used as: input, file, *name*  
    **flags** : str, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Allowed values: *f*, *s*, *b*  
    &nbsp;&nbsp;&nbsp;&nbsp;**f**  
    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Compute Feature importances  
    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Compute feature importances using permutation  
    &nbsp;&nbsp;&nbsp;&nbsp;**s**  
    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Standardization preprocessing  
    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Standardize feature variables (convert values the get zero mean and unit variance)  
    &nbsp;&nbsp;&nbsp;&nbsp;**b**  
    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Balance training data using class weights  
    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Automatically adjust weights inversely proportional to class frequencies  
    **overwrite** : bool, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Allow output files to overwrite existing files  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *None*  
    **verbose** : bool, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Verbose module output  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *None*  
    **quiet** : bool, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Quiet module output  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *None*  
    **superquiet** : bool, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Very quiet module output  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *None*  

=== "Python (grass.tools)"

    **group** : str, *required*  
    &nbsp;&nbsp;&nbsp;&nbsp;Group of raster layers to be classified  
    &nbsp;&nbsp;&nbsp;&nbsp;GRASS imagery group of raster maps representing predictor variables to be used in the machine learning model  
    &nbsp;&nbsp;&nbsp;&nbsp;Used as: input, group, *name*  
    **training_map** : str | np.ndarray, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Labelled pixels  
    &nbsp;&nbsp;&nbsp;&nbsp;Raster map with labelled pixels for training  
    &nbsp;&nbsp;&nbsp;&nbsp;Used as: input, raster, *name*  
    **training_points** : str, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Vector map with training samples  
    &nbsp;&nbsp;&nbsp;&nbsp;Vector points map where each point is used as training sample  
    &nbsp;&nbsp;&nbsp;&nbsp;Used as: input, vector, *name*  
    **field** : str, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Response attribute column  
    &nbsp;&nbsp;&nbsp;&nbsp;Name of attribute column in training_points table containing response values  
    &nbsp;&nbsp;&nbsp;&nbsp;Used as: input, dbcolumn, *name*  
    **save_model** : str, *required*  
    &nbsp;&nbsp;&nbsp;&nbsp;Save model to file (for compression use e.g. '.gz' extension)  
    &nbsp;&nbsp;&nbsp;&nbsp;Name of file to store model results using python joblib  
    &nbsp;&nbsp;&nbsp;&nbsp;Used as: output, file, *name*  
    **model_name** : str, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;model_name  
    &nbsp;&nbsp;&nbsp;&nbsp;Supervised learning model to use  
    &nbsp;&nbsp;&nbsp;&nbsp;Allowed values: *LogisticRegression, LinearRegression, SGDClassifier, SGDRegressor, LinearDiscriminantAnalysis, QuadraticDiscriminantAnalysis, KNeighborsClassifier, KNeighborsRegressor, GaussianNB, DecisionTreeClassifier, DecisionTreeRegressor, RandomForestClassifier, RandomForestRegressor, ExtraTreesClassifier, ExtraTreesRegressor, GradientBoostingClassifier, GradientBoostingRegressor, HistGradientBoostingClassifier, HistGradientBoostingRegressor, SVC, SVR, MLPClassifier, MLPRegressor*  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *RandomForestClassifier*  
    **penalty** : str | list[str], *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;The regularization method  
    &nbsp;&nbsp;&nbsp;&nbsp;The regularization method to be used for the SGDClassifier and SGDRegressor  
    &nbsp;&nbsp;&nbsp;&nbsp;Allowed values: *l1, l2, elasticnet*  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *l2*  
    **alpha** : float | list[float] | str, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Constant that multiplies the regularization term  
    &nbsp;&nbsp;&nbsp;&nbsp;Constant that multiplies the regularization term for SGDClassifier/SGDRegressor/MLPClassifier/MLPRegressor  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *0.0001*  
    **l1_ratio** : float | list[float] | str, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;The Elastic Net mixing parameter  
    &nbsp;&nbsp;&nbsp;&nbsp;The Elastic Net mixing parameter for SGDClassifier/SGDRegressor  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *0.15*  
    **c** : float | list[float] | str, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Inverse of regularization strength  
    &nbsp;&nbsp;&nbsp;&nbsp;Inverse of regularization strength (LogisticRegression and SVC/SVR)  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *1.0*  
    **epsilon** : float | list[float] | str, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Epsilon in the SVR model  
    &nbsp;&nbsp;&nbsp;&nbsp;Epsilon in the SVR model  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *0.1*  
    **max_features** : int | list[int] | str, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Number of features available during node splitting; zero uses estimator defaults  
    &nbsp;&nbsp;&nbsp;&nbsp;Number of features available during node splitting (tree-based classifiers and regressors)  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *0*  
    **max_depth** : int | list[int] | str, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Maximum tree depth; zero uses estimator defaults  
    &nbsp;&nbsp;&nbsp;&nbsp;Maximum tree depth for tree-based method; zero uses estimator defaults (full-growing for Decision trees and Randomforest, 3 for GBM)  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *0*  
    **min_samples_leaf** : int | list[int] | str, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;The minimum number of samples required to form a leaf node  
    &nbsp;&nbsp;&nbsp;&nbsp;The minimum number of samples required to form a leaf node in tree-based estimators  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *1*  
    **n_estimators** : int | list[int] | str, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Number of estimators  
    &nbsp;&nbsp;&nbsp;&nbsp;Number of estimators (trees) in ensemble tree-based estimators  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *100*  
    **learning_rate** : float | list[float] | str, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;learning rate  
    &nbsp;&nbsp;&nbsp;&nbsp;learning rate (also known as shrinkage) for gradient boosting methods  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *0.1*  
    **subsample** : float | list[float] | str, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;The fraction of samples to be used for fitting  
    &nbsp;&nbsp;&nbsp;&nbsp;The fraction of samples to be used for fitting, controls stochastic behaviour of gradient boosting methods  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *1.0*  
    **n_neighbors** : int | list[int] | str, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Number of neighbors to use  
    &nbsp;&nbsp;&nbsp;&nbsp;Number of neighbors to use  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *5*  
    **hidden_units** : str | list[str], *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Number of neurons to use in the hidden layers  
    &nbsp;&nbsp;&nbsp;&nbsp;Number of neurons to use in each layer, i.e. (100;50) for two layers  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *(100;100)*  
    **weights** : str | list[str], *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;weight function  
    &nbsp;&nbsp;&nbsp;&nbsp;Distance weight function for k-nearest neighbours model prediction  
    &nbsp;&nbsp;&nbsp;&nbsp;Allowed values: *uniform, distance*  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *uniform*  
    **group_raster** : str | np.ndarray, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Custom group ids for training samples from GRASS raster  
    &nbsp;&nbsp;&nbsp;&nbsp;GRASS raster containing group ids for training samples. Samples with the same group id will not be split between training and test cross-validation folds  
    &nbsp;&nbsp;&nbsp;&nbsp;Used as: input, raster, *name*  
    **cv** : int, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Number of cross-validation folds  
    &nbsp;&nbsp;&nbsp;&nbsp;Number of cross-validation folds  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *1*  
    **preds_file** : str, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Save cross-validation predictions to csv  
    &nbsp;&nbsp;&nbsp;&nbsp;Name of output file in which to save the cross-validation predictions  
    &nbsp;&nbsp;&nbsp;&nbsp;Used as: output, file, *name*  
    **classif_file** : str, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Save classification report to csv  
    &nbsp;&nbsp;&nbsp;&nbsp;Name of output file to save the classification report  
    &nbsp;&nbsp;&nbsp;&nbsp;Used as: output, file, *name*  
    **category_maps** : str | list[str], *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Names of categorical rasters within the imagery group  
    &nbsp;&nbsp;&nbsp;&nbsp;Names of categorical rasters within the imagery group that will be one-hot encoded. Leave empty if none.  
    &nbsp;&nbsp;&nbsp;&nbsp;Used as: input, raster, *name*  
    **fimp_file** : str, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Save feature importances to csv  
    &nbsp;&nbsp;&nbsp;&nbsp;Name for output file  
    &nbsp;&nbsp;&nbsp;&nbsp;Used as: output, file, *name*  
    **param_file** : str, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Save hyperparameter search scores to csv  
    &nbsp;&nbsp;&nbsp;&nbsp;Name of file to save the hyperparameter tuning results  
    &nbsp;&nbsp;&nbsp;&nbsp;Used as: output, file, *name*  
    **random_state** : int, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Seed to use for random state  
    &nbsp;&nbsp;&nbsp;&nbsp;Seed to use for random state to enable reproducible results for estimators that have stochastic components  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *1*  
    **n_jobs** : int, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Number of cores for multiprocessing  
    &nbsp;&nbsp;&nbsp;&nbsp;Number of cores for multiprocessing, -2 is n_cores-1  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *-2*  
    **save_training** : str, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Save training data to csv  
    &nbsp;&nbsp;&nbsp;&nbsp;Name of output file to save training data in comma-delimited format  
    &nbsp;&nbsp;&nbsp;&nbsp;Used as: output, file, *name*  
    **load_training** : str | io.StringIO, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Load training data from csv  
    &nbsp;&nbsp;&nbsp;&nbsp;Load previously extracted training data from a csv file  
    &nbsp;&nbsp;&nbsp;&nbsp;Used as: input, file, *name*  
    **flags** : str, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Allowed values: *f*, *s*, *b*  
    &nbsp;&nbsp;&nbsp;&nbsp;**f**  
    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Compute Feature importances  
    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Compute feature importances using permutation  
    &nbsp;&nbsp;&nbsp;&nbsp;**s**  
    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Standardization preprocessing  
    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Standardize feature variables (convert values the get zero mean and unit variance)  
    &nbsp;&nbsp;&nbsp;&nbsp;**b**  
    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Balance training data using class weights  
    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Automatically adjust weights inversely proportional to class frequencies  
    **overwrite** : bool, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Allow output files to overwrite existing files  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *None*  
    **verbose** : bool, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Verbose module output  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *None*  
    **quiet** : bool, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Quiet module output  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *None*  
    **superquiet** : bool, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Very quiet module output  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *None*  

    Returns:

    **result** : grass.tools.support.ToolResult | None  
    If the tool produces text as standard output, a *ToolResult* object will be returned. Otherwise, `None` will be returned.

    Raises:

    *grass.tools.ToolError*: When the tool ended with an error.

## DESCRIPTION

*r.learn.train* performs training data extraction, supervised machine
learning and cross-validation using the python package *scikit learn*.
The choice of machine learning algorithm is set using the *model\_name*
parameter. For more details relating to the classifiers, refer to the
[scikit learn documentation](https://scikit-learn.org/stable/). The
training data can be provided either by a GRASS raster map containing
labelled pixels using the *training\_map* parameter, or a GRASS vector
dataset containing point geometries using the *training\_points*
parameter. If a vector map is used then the *field* parameter also needs
to indicate which column in the vector attribute table contains the
labels/values for training.

For regression models the *field* parameter must contain only numeric
values. For classification models the field can contain integer-encoded
labels, or it can represent text categories that will automatically be
encoded as integer values (in alphabetical order). These text labels
will also be applied as categories to the classification output when
using **r.learn.predict**. The vector map should also not contain
multiple geometries per attribute.

### Supervised Learning Algorithms

The following classification and regression methods are available:

| Model                                                                                                                | Description                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |
| -------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| LogisticRegression, LinearRegression                                                                                 | Linear models for classification and regression                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
| SGDClassifier, SGDRegressor                                                                                          | Linear models for classification and regression using stochastic gradient descent optimization suitable for large datasets. Supports l1, l2 and elastic net regularization                                                                                                                                                                                                                                                                                                                                                                                |
| LinearDiscriminantAnalysis, QuadraticDiscriminantAnalysis                                                            | Classifiers with linear and quadratic decision surfaces                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |
| KNeighborsClassifier, KNeighborsRegressor                                                                            | Local approximation methods for classification/regression that assign predictions to new observations based on the values assigned to the k-nearest observations in the training data feature space                                                                                                                                                                                                                                                                                                                                                       |
| GaussianNB                                                                                                           | Gaussian Naive Bayes algorithm and can be used for classification                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |
| DecisionTreeClassifier DecisionTreeRegressor                                                                         | Classification and regression tree models that map observations to a response variable using a hierarchy of splits and branches. The terminus of these branches, termed leaves, represent the prediction of the response variable. Decision trees are non-parametric and can model non-linear relationships between a response and predictor variables, and are insensitive the scaling of the predictors                                                                                                                                                 |
| RandomForestClassifier, RandomForestRegressor, ExtraTreesClassifier, ExtraTreesRegressor                             | Ensemble classification and regression tree methods. Each tree in the ensemble is based on a random subsample of the training data. Also, only a randomly-selected subset of the predictors are available during each node split. Each tree produces a prediction and the final result is obtained by averaging across all of the trees. The ExtraTreesClassifier and ExtraTreesRegressor are variant on random forests where during each node split, the splitting rule that is selected is based on the best of a several randomly-generated thresholds |
| GradientBoostingClassifier, GradientBoostingRegressor, HistGradientBoostingClassifier, HistGradientBoostingRegressor | Ensemble tree models where learning occurs in an additive, forward step-wise fashion where each additional tree fits to the model residuals to gradually improve the model fit. HistGradientBoostingClassifier and HistGradientBoostingRegressor are the new scikit learn multithreaded implementations.                                                                                                                                                                                                                                                  |
| SVC, SVR                                                                                                             | Support Vector Machine classifiers and regressors. Only a linear kernel is enabled in r.learn.ml2 because non-linear kernels are too slow for most remote sensing and spatial datasets                                                                                                                                                                                                                                                                                                                                                                    |
| MLPClassifier, MLPRegressor                                                                                          | Multi-layer perceptron algorithm for classification or regression                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |

### Hyperparameters

The estimator settings tab provides access to the most pertinent
parameters that affect the previously described algorithms. The
scikit-learn estimator defaults are generally supplied, and these
parameters can be tuned using a grid-search by inputting multiple
comma-separated parameters. The grid search is performed using a 3-fold
cross validation. This tuning can also be accomplished simultaneously
with nested cross-validation by settings the *cv* option to \> 1.

The following table summarizes the hyperparameter and which models they
apply to:

| Hyperparameter     | Description                                                                                                                                                                      | Method                                                                                                                                                                                                         |
| ------------------ | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| alpha              | The constrant used to multiply the regularization term                                                                                                                           | SGDClassifier, SGDRegressor, MLPClassifier, MLPRegressor                                                                                                                                                       |
| l1\_ratio          | The elastic net mixing ration between l1 and l2 regularization                                                                                                                   | SGDClassifier, SGDRegressor                                                                                                                                                                                    |
| c                  | Inverse of the regularization strength                                                                                                                                           | LogisticRegression, SVC, SVR                                                                                                                                                                                   |
| epsilon            | Width of the margin used to maximize the number of fitted observations                                                                                                           | SVR                                                                                                                                                                                                            |
| n\_estimators      | The number of trees                                                                                                                                                              | RandomForestClassifier, RandomForestRegressor, ExtraTreesClassifier, ExtraTreesRegressor, GradientBoostingClassifier, GradientBoostingRegressor, HistGradientBoostingClassifier, HistGradientBoostingRegressor |
| max\_features      | The number of predictor variables that are randomly selected to be available at each node split                                                                                  | RandomForestClassifier, RandomForestRegressor, ExtraTreesClassifier, ExtraTreesRegressor, GradientBoostingClassifier, GradientBoostingRegressor, HistGradientBoostingClassifier, HistGradientBoostingRegressor |
| min\_samples\_leaf | The number of samples required to split a node                                                                                                                                   | RandomForestClassifier, RandomForestRegressor, ExtraTreesClassifier, ExtraTreesRegressor, GradientBoostingClassifier, GradientBoostingRegressor, HistGradientBoostingClassifier, HistGradientBoostingRegressor |
| learning\_rate     | Shrinkage parameter to control the contribution of each tree                                                                                                                     | GradientBoostingClassifier, GradientBoostingRegressor, HistGradientBoostingClassifier, HistGradientBoostingRegressor                                                                                           |
| hidden\_units      | The number of neurons in each hidden layer, e.g. (100;100) for 100 neurons in two hidden layers. Tuning can be performed using comma-separated values, e.g. (100;100),(200;200). | MLPClassifier, MLRRegressor                                                                                                                                                                                    |

### Preprocessing

Although tree-based classifiers are insensitive to the scaling of the
input data, other classifiers such as linear models may not perform
optimally if some predictors have variances that are orders of magnitude
larger than others. The *-s* flag adds a standardization preprocessing
step to the classification and prediction to reduce this effect.
Additionally, most of the classifiers do not perform well if there is a
large class imbalance in the training data. Using the *-b* flag balances
the training data by weighting of the minority classes relative to the
majority class. This does not apply to the Naive Bayes or
LinearDiscriminantAnalysis classifiers.

Scikit learn does not specifically recognize raster predictors that
represent non-ordinal, categorical values, for example if using a
landcover map as a predictor. Predictive performances may be improved if
the categories in these maps are one-hot encoded before training. The
parameter *categorical\_maps* can be used to select rasters that in
contained within the imagery group to apply one-hot encoding before
training.

### Feature Importances

In addition to model fitting and prediction, feature importances can be
generated using the **-f** flag. The feature importances method uses a
permutation-based method can be applied to all the estimators. The
feature importances represent the average decrease in performance of
each variable when permuted. For binary classifications, the AUC is used
as the metric. Multiclass classifications use accuracy, and regressions
use R2.

### Cross-Validation

Cross validation can be performed by setting the *cv* parameters to \>

1. Cross-validation is performed using stratified k-folds for
classification and k-folds for regression. Several global and per-class
accuracy measures are produced depending on whether the response
variable is binary or multiclass, or the classifier is for regression or
classification. Cross-validation can also be performed in groups by
supplying a raster containing the group\_ids of the partitions using the
*group\_raster* option. In this case, training samples with the same
group id as set by the group\_raster will never be split between
training and test partitions during cross-validation. This can reduce
problems with overly optimistic cross-validation scores if the training
data are strongly spatially correlated, i.e. the training data represent
rasterized polygons.

## NOTES

Many of the estimators involve a random process which can causes a small
amount of variation in the classification/regression results and and
feature importances. To enable reproducible results, a seed is supplied
to the estimator. This can be changed using the *randst* parameter.

For convenience when repeatedly training models on the same data, the
training data can be saved to a csv file using the *save\_training*
option. This data can then imported into subsequent classification runs,
saving time by avoiding the need to repeatedly query the predictors.

## EXAMPLE

Here we are going to use the GRASS GIS sample North Carolina data set as
a basis to perform a landsat classification. We are going to classify a
Landsat 7 scene from 2000, using training information from an older
(1996) land cover dataset.

Landsat 7 (2000) bands 7,4,2 color composite example:

![image-alt](lsat7_2000_b742.png)

Note that this example must be run in the "landsat" mapset of the North
Carolina sample data set location.

First, we are going to generate some training pixels from an older
(1996) land cover classification:

```sh
g.region raster=landclass96 -p
r.random input=landclass96 npoints=1000 raster=training_pixels
```

Then we can use these training pixels to perform a classification on the
more recently obtained landsat 7 image:

```sh
# train a random forest classification model using r.learn.train
r.learn.train group=lsat7_2000 training_map=training_pixels \
    model_name=RandomForestClassifier n_estimators=500 save_model=rf_model.gz

# perform prediction using r.learn.predict
r.learn.predict group=lsat7_2000 load_model=rf_model.gz output=rf_classification

# check raster categories - they are automatically applied to the classification output
r.category rf_classification

# copy color scheme from landclass training map to result
r.colors rf_classification raster=training_pixels
```

Random forest classification result:

![image-alt](rfclassification.png)

## SEE ALSO

[r.learn.ml2](r.learn.ml2.md) (overview),
[r.learn.predict](r.learn.predict.md)

## REFERENCES

Scikit-learn: Machine Learning in Python, Pedregosa et al., JMLR 12, pp.
2825-2830, 2011.

## AUTHOR

Steven Pawley

## SOURCE CODE

Available at: [r.learn.train source code](https://github.com/OSGeo/grass-addons/tree/grass8/raster/r.learn.ml2/r.learn.train)
([history](https://github.com/OSGeo/grass-addons/commits/grass8/raster/r.learn.ml2/r.learn.train))  
Latest change: Friday Feb 21 10:10:05 2025 in commit [7d78fe3](https://github.com/OSGeo/grass-addons/commit/7d78fe34868674c3b6050ba1924e1c5675d155c9)
