---
name: r.object.activelearning.py
description: Active learning for classifying raster objects
keywords: [  ]
---

# r.object.activelearning.py

Active learning for classifying raster objects

=== "Command line"

    **r.object.activelearning.py**
    **training_set**=*name*
    **test_set**=*name*
    **unlabeled_set**=*name*
    [**learning_steps**=*integer*]
    [**nbr_uncertainty**=*integer*]
    [**diversity_lambda**=*float*]
    [**c_svm**=*float*]
    [**gamma_parameter**=*float*]
    [**search_iter**=*integer*]
    [**update**=*name*]
    [**predictions**=*name*]
    [**training_updated**=*name*]
    [**unlabeled_updated**=*name*]
    [**--overwrite**]
    [**--verbose**]
    [**--quiet**]
    [**--qq**]
    [**--ui**]

    Example:

    ```sh
    r.object.activelearning.py training_set=name test_set=name unlabeled_set=name
    ```

=== "Python (grass.script)"

    *grass.script.run_command*("***r.object.activelearning.py***",
        **training_set**,
        **test_set**,
        **unlabeled_set**,
        **learning_steps**=*5*,
        **nbr_uncertainty**=*15*,
        **diversity_lambda**=*0.25*,
        **c_svm**=*None*,
        **gamma_parameter**=*None*,
        **search_iter**=*15*,
        **update**=*None*,
        **predictions**=*None*,
        **training_updated**=*None*,
        **unlabeled_updated**=*None*,
        **overwrite**=*None*,
        **verbose**=*None*,
        **quiet**=*None*,
        **superquiet**=*None*)

    Example:

    ```python
    gs.run_command("r.object.activelearning.py", training_set="name", test_set="name", unlabeled_set="name")
    ```

=== "Python (grass.tools)"

    *grass.tools.Tools.r_object_activelearning_py*(**training_set**,
        **test_set**,
        **unlabeled_set**,
        **learning_steps**=*5*,
        **nbr_uncertainty**=*15*,
        **diversity_lambda**=*0.25*,
        **c_svm**=*None*,
        **gamma_parameter**=*None*,
        **search_iter**=*15*,
        **update**=*None*,
        **predictions**=*None*,
        **training_updated**=*None*,
        **unlabeled_updated**=*None*,
        **overwrite**=*None*,
        **verbose**=*None*,
        **quiet**=*None*,
        **superquiet**=*None*)

    Example:

    ```python
    tools = Tools()
    tools.r_object_activelearning_py(training_set="name", test_set="name", unlabeled_set="name")
    ```

    This grass.tools API is experimental in version 8.5 and expected to be stable in version 8.6.

## Parameters

=== "Command line"

    **training_set**=*name* **[required]**  
    &nbsp;&nbsp;&nbsp;&nbsp;Training set (csv format)  
    **test_set**=*name* **[required]**  
    &nbsp;&nbsp;&nbsp;&nbsp;Test set (csv format)  
    **unlabeled_set**=*name* **[required]**  
    &nbsp;&nbsp;&nbsp;&nbsp;Unlabeled samples (csv format)  
    **learning_steps**=*integer*  
    &nbsp;&nbsp;&nbsp;&nbsp;Number of samples to label at each iteration  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *5*  
    **nbr_uncertainty**=*integer*  
    &nbsp;&nbsp;&nbsp;&nbsp;Number of samples to select (based on uncertainty criterion) before applying the diversity criterion.  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *15*  
    **diversity_lambda**=*float*  
    &nbsp;&nbsp;&nbsp;&nbsp;Lambda parameter used in the diversity heuristic  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *0.25*  
    **c_svm**=*float*  
    &nbsp;&nbsp;&nbsp;&nbsp;Penalty parameter C of the error term  
    **gamma_parameter**=*float*  
    &nbsp;&nbsp;&nbsp;&nbsp;Kernel coefficient  
    **search_iter**=*integer*  
    &nbsp;&nbsp;&nbsp;&nbsp;Number of parameter settings that are sampled in the automatic parameter search (C, gamma).  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *15*  
    **update**=*name*  
    &nbsp;&nbsp;&nbsp;&nbsp;Training set update file  
    **predictions**=*name*  
    &nbsp;&nbsp;&nbsp;&nbsp;Output file for class predictions  
    **training_updated**=*name*  
    &nbsp;&nbsp;&nbsp;&nbsp;Output file for the updated training file  
    **unlabeled_updated**=*name*  
    &nbsp;&nbsp;&nbsp;&nbsp;Output file for the updated unlabeled file  
    **--overwrite**  
    &nbsp;&nbsp;&nbsp;&nbsp;Allow output files to overwrite existing files  
    **--help**  
    &nbsp;&nbsp;&nbsp;&nbsp;Print usage summary  
    **--verbose**  
    &nbsp;&nbsp;&nbsp;&nbsp;Verbose module output  
    **--quiet**  
    &nbsp;&nbsp;&nbsp;&nbsp;Quiet module output  
    **--qq**  
    &nbsp;&nbsp;&nbsp;&nbsp;Very quiet module output  
    **--ui**  
    &nbsp;&nbsp;&nbsp;&nbsp;Force launching GUI dialog

=== "Python (grass.script)"

    **training_set** : str, *required*  
    &nbsp;&nbsp;&nbsp;&nbsp;Training set (csv format)  
    &nbsp;&nbsp;&nbsp;&nbsp;Used as: input, file, *name*  
    **test_set** : str, *required*  
    &nbsp;&nbsp;&nbsp;&nbsp;Test set (csv format)  
    &nbsp;&nbsp;&nbsp;&nbsp;Used as: input, file, *name*  
    **unlabeled_set** : str, *required*  
    &nbsp;&nbsp;&nbsp;&nbsp;Unlabeled samples (csv format)  
    &nbsp;&nbsp;&nbsp;&nbsp;Used as: input, file, *name*  
    **learning_steps** : int, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Number of samples to label at each iteration  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *5*  
    **nbr_uncertainty** : int, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Number of samples to select (based on uncertainty criterion) before applying the diversity criterion.  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *15*  
    **diversity_lambda** : float, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Lambda parameter used in the diversity heuristic  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *0.25*  
    **c_svm** : float, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Penalty parameter C of the error term  
    **gamma_parameter** : float, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Kernel coefficient  
    **search_iter** : int, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Number of parameter settings that are sampled in the automatic parameter search (C, gamma).  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *15*  
    **update** : str, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Training set update file  
    &nbsp;&nbsp;&nbsp;&nbsp;Used as: input, file, *name*  
    **predictions** : str, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Output file for class predictions  
    &nbsp;&nbsp;&nbsp;&nbsp;Used as: output, file, *name*  
    **training_updated** : str, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Output file for the updated training file  
    &nbsp;&nbsp;&nbsp;&nbsp;Used as: output, file, *name*  
    **unlabeled_updated** : str, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Output file for the updated unlabeled file  
    &nbsp;&nbsp;&nbsp;&nbsp;Used as: output, file, *name*  
    **overwrite** : bool, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Allow output files to overwrite existing files  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *None*  
    **verbose** : bool, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Verbose module output  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *None*  
    **quiet** : bool, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Quiet module output  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *None*  
    **superquiet** : bool, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Very quiet module output  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *None*  

=== "Python (grass.tools)"

    **training_set** : str | io.StringIO, *required*  
    &nbsp;&nbsp;&nbsp;&nbsp;Training set (csv format)  
    &nbsp;&nbsp;&nbsp;&nbsp;Used as: input, file, *name*  
    **test_set** : str | io.StringIO, *required*  
    &nbsp;&nbsp;&nbsp;&nbsp;Test set (csv format)  
    &nbsp;&nbsp;&nbsp;&nbsp;Used as: input, file, *name*  
    **unlabeled_set** : str | io.StringIO, *required*  
    &nbsp;&nbsp;&nbsp;&nbsp;Unlabeled samples (csv format)  
    &nbsp;&nbsp;&nbsp;&nbsp;Used as: input, file, *name*  
    **learning_steps** : int, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Number of samples to label at each iteration  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *5*  
    **nbr_uncertainty** : int, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Number of samples to select (based on uncertainty criterion) before applying the diversity criterion.  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *15*  
    **diversity_lambda** : float, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Lambda parameter used in the diversity heuristic  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *0.25*  
    **c_svm** : float, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Penalty parameter C of the error term  
    **gamma_parameter** : float, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Kernel coefficient  
    **search_iter** : int, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Number of parameter settings that are sampled in the automatic parameter search (C, gamma).  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *15*  
    **update** : str | io.StringIO, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Training set update file  
    &nbsp;&nbsp;&nbsp;&nbsp;Used as: input, file, *name*  
    **predictions** : str, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Output file for class predictions  
    &nbsp;&nbsp;&nbsp;&nbsp;Used as: output, file, *name*  
    **training_updated** : str, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Output file for the updated training file  
    &nbsp;&nbsp;&nbsp;&nbsp;Used as: output, file, *name*  
    **unlabeled_updated** : str, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Output file for the updated unlabeled file  
    &nbsp;&nbsp;&nbsp;&nbsp;Used as: output, file, *name*  
    **overwrite** : bool, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Allow output files to overwrite existing files  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *None*  
    **verbose** : bool, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Verbose module output  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *None*  
    **quiet** : bool, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Quiet module output  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *None*  
    **superquiet** : bool, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Very quiet module output  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *None*  

    Returns:

    **result** : grass.tools.support.ToolResult | None  
    If the tool produces text as standard output, a *ToolResult* object will be returned. Otherwise, `None` will be returned.

    Raises:

    *grass.tools.ToolError*: When the tool ended with an error.

## DESCRIPTION

This module uses SVM from the *scikit-learn* python package to perform
classification on regions of raster maps. These regions can be the
output of
*[i.segment](https://grass.osgeo.org/grass-stable/manuals/i.segment.html)*
or
*[r.clump](https://grass.osgeo.org/grass-stable/manuals/r.clump.html)*.

The module enables learning with only a small initial labeled data set
via *active learning*. This semi-supervised learning algorithm
interactively query the user to label the regions that are most useful
to improve the overall classification score. With this technique, the
number of examples to learn the classification is often much lower than
the number of examples needed in normal supervised algorithms. You
should start the classification with a small training set and run the
module multiple times to label new informative samples to improve the
classification score. The score metric is the number of correctly
predicted labels over the total number of samples in the test set.

The samples that are chosen to be labeled are the ones where the class
prediction is the most uncertain \[2\]. Moreover, from the more
uncertain samples, only the most different samples are kept \[1\]. This
diversity heuristic takes into account for each uncertain sample the
distance to its closest neighbour and the average distance to all other
samples. This ensures that newly labeled samples are not redundant with
each other.

The learning data should be composed of features extracted from the
regions, for example with the *[i.segment.stats](i.segment.stats.md)*
module. The features of the training set, the test set and the unlabeled
set should be in three different files in csv format. The first line of
each file must be a header containing the features' name. Every regions
should be uniquely identified by the first attribute. The classes for
the training and test examples should be the second attribute.

Example of a training and test files :

```csv
cat,Class_num,attr1,attr2,attr3
167485,4,3.546,456.76,6.76
183234,6,5.76,1285.54,9.45
173457,2,5.65,468.76,6.78
```

Example of an unlabeled file :

```csv
cat,attr1,attr2,attr3
167485,3.546,456.76,6.76
183234,5.76,1285.54,9.45
173457,5.65,468.76,6.78
```

The training set can be easily updated once you have labeled new
samples. Create a file to specify what label you give to which sample.
This file in csv format should have a header and two attributes per line
: the ID of the sample you have labeled and the label itself. The module
will transfer the newly labeled samples from the unlabeled set to the
training set, adding the class you have provided. This is done
internally and does not modify your original files.

If the user wants to save the changes in new files according to the
updates, new files can be created with the new labeled samples added to
the training file and removed from the unlabeled file. Just specify the
path of those output files in the parameters (training\_updated,
unlabeled\_updated).

Example of an update file :

```csv
cat,Class_num
194762,2
153659,6
178350,2
```

Here are more details on a few parameters :

- **learning\_steps** : This is the number of samples that the module
    will ask to label at each run.
- **nbr\_uncertainty** : Number of uncertain samples to choose before
    applying the diversity filter. This number should be higher than
    *learning\_steps*
- **diversity\_lambda** : Parameter used in the diversity heuristic.
    If close to 0 only take into account the average distance to all
    other samples. If close to 1 only take into account the distance to
    the closest neighbour
- **c\_SVM** : Penalty parameter C of the error term. If it is too
    large there is a risk of overfitting the training data. If it is too
    small you may have underfitting.
- **gamma\_SVM** : Kernel coefficient. 1/\#features is often a good
    value to start with.
- **search\_iter** :Number of parameter settings that are sampled in
    the automatic parameter search (C, gamma). search\_iter trades off
    runtime vs quality of the solution.

## EXAMPLES

The following examples are based on the data files found in this module
repository.

### Simple run without an update file

```sh
        r.object.activelearning training_set=/path/to/training_set.csv \
                    test_set=/path/to/test_set.csv \
                    unlabeled_set=/path/to/unlabeled_set.csv

        Parameters used : C=146.398423284, gamma=0.0645567086567, lambda=0.25
        12527959
        9892568
        13731120
        15445003
        13767630
        Class predictions written to predictions.csv
        Training set : 70
        Test set : 585
        Unlabeled set : 792
        Score : 0.321367521368
    
```

### With an update file

The five samples output at the previous example have been labeled and
added to the update file.

```sh
        r.object.activelearning training_set=/path/to/training_set.csv \
                                test_set=/path/to/test_set.csv \
                    unlabeled_set=/path/to/unlabeled_set.csv \
                    update=/path/to/update.csv

        Parameters used : C=101.580687073, gamma=0.00075388337475, lambda=0.25
        Class predictions written to predictions.csv
        Training set : 75
        Test set : 585
        Unlabeled set : 787
        Score : 0.454700854701
        8691475
        9321017
        14254774
        14954255
        15838185
    
```

## NOTES

This module requires the *scikit-learn* python package. This module
needs to be installed in your GRASS GIS Python environment. Please refer
to [*r.learn.ml*](r.learn.ml.html)'s notes on how to install this
package.

The memory usage for \~1450 samples of 52 features each is around \~650
kb. This number can vary due to the unpredictablity of the garbage
collector's behaviour. Everything is computed in memory; therefore the
size of the data is limited by the amount of RAM available.

## REFERENCES

\[1\] Bruzzone, L. and Persello, C. (2009). Active learning for
classification of remote sensing images. 2009 IEEE International
Geoscience and Remote Sensing Symposium.
doi:10.1109/igarss.2009.5417857  
\[2\] Tuia, D. et al (2011). A Survey of Active Learning Algorithms for
Supervised Remote Sensing Image Classification. IEEE Journal of Selected
Topics in Signal Processing, 5(3), 606-617.
doi:10.1109/jstsp.2011.2139193

## AUTHOR

Lucas Lefèvre (ULB, Brussels, Belgium)

## SOURCE CODE

Available at: [r.object.activelearning source code](https://github.com/OSGeo/grass-addons/tree/grass8/src/raster/r.object.activelearning)
([history](https://github.com/OSGeo/grass-addons/commits/grass8/src/raster/r.object.activelearning))  
Latest change: Thursday Mar 20 21:36:57 2025 in commit [7286ecf](https://github.com/OSGeo/grass-addons/commit/7286ecf7af235bfd089fb9b1b82fb383cf95f3fc)
