---
name: i.ann.maskrcnn.train.py
description: Train your Mask R-CNN network
keywords: [ ann, vector, raster ]
---

# i.ann.maskrcnn.train.py

Train your Mask R-CNN network

=== "Command line"

    **i.ann.maskrcnn.train.py**
    [**-esbn**]
    **training_dataset**=*name*
    [**model**=*name*]
    **classes**=*string* [,*string*,...]
    **logs**=*name*
    **name**=*string*
    [**epochs**=*integer*]
    [**steps_per_epoch**=*integer*]
    [**rois_per_image**=*integer*]
    [**images_per_gpu**=*integer*]
    [**gpu_count**=*integer*]
    [**mini_mask_size**=*integer* [,*integer*,...]]
    [**validation_steps**=*integer*]
    [**images_min_dim**=*integer*]
    [**images_max_dim**=*integer*]
    [**backbone**=*string*]
    [**--verbose**]
    [**--quiet**]
    [**--qq**]
    [**--ui**]

    Example:

    ```sh
    i.ann.maskrcnn.train.py training_dataset=name classes=string logs=name name=string
    ```

=== "Python (grass.script)"

    *grass.script.run_command*("***i.ann.maskrcnn.train.py***",
        **training_dataset**,
        **model**=*None*,
        **classes**,
        **logs**,
        **name**,
        **epochs**=*200*,
        **steps_per_epoch**=*3000*,
        **rois_per_image**=*64*,
        **images_per_gpu**=*1*,
        **gpu_count**=*1*,
        **mini_mask_size**=*None*,
        **validation_steps**=*100*,
        **images_min_dim**=*256*,
        **images_max_dim**=*1280*,
        **backbone**=*"resnet101"*,
        **flags**=*None*,
        **verbose**=*None*,
        **quiet**=*None*,
        **superquiet**=*None*)

    Example:

    ```python
    gs.run_command("i.ann.maskrcnn.train.py", training_dataset="name", classes="string", logs="name", name="string")
    ```

=== "Python (grass.tools)"

    *grass.tools.Tools.i_ann_maskrcnn_train_py*(**training_dataset**,
        **model**=*None*,
        **classes**,
        **logs**,
        **name**,
        **epochs**=*200*,
        **steps_per_epoch**=*3000*,
        **rois_per_image**=*64*,
        **images_per_gpu**=*1*,
        **gpu_count**=*1*,
        **mini_mask_size**=*None*,
        **validation_steps**=*100*,
        **images_min_dim**=*256*,
        **images_max_dim**=*1280*,
        **backbone**=*"resnet101"*,
        **flags**=*None*,
        **verbose**=*None*,
        **quiet**=*None*,
        **superquiet**=*None*)

    Example:

    ```python
    tools = Tools()
    tools.i_ann_maskrcnn_train_py(training_dataset="name", classes="string", logs="name", name="string")
    ```

    This grass.tools API is experimental in version 8.5 and expected to be stable in version 8.6.

## Parameters

=== "Command line"

    **training_dataset**=*name* **[required]**  
    &nbsp;&nbsp;&nbsp;&nbsp;Path to the dataset with images and masks  
    &nbsp;&nbsp;&nbsp;&nbsp;Name of input directory  
    **model**=*name*  
    &nbsp;&nbsp;&nbsp;&nbsp;Path to the .h5 file to use as initial values  
    &nbsp;&nbsp;&nbsp;&nbsp;Keep empty to train from a scratch  
    **classes**=*string* [,*string*,...] **[required]**  
    &nbsp;&nbsp;&nbsp;&nbsp;Names of classes separated with ","  
    **logs**=*name* **[required]**  
    &nbsp;&nbsp;&nbsp;&nbsp;Path to the directory in which will be models saved  
    &nbsp;&nbsp;&nbsp;&nbsp;Name of input directory  
    **name**=*string* **[required]**  
    &nbsp;&nbsp;&nbsp;&nbsp;Name for output models  
    **epochs**=*integer*  
    &nbsp;&nbsp;&nbsp;&nbsp;Number of epochs  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *200*  
    **steps_per_epoch**=*integer*  
    &nbsp;&nbsp;&nbsp;&nbsp;Steps per each epoch  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *3000*  
    **rois_per_image**=*integer*  
    &nbsp;&nbsp;&nbsp;&nbsp;How many ROIs train per image  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *64*  
    **images_per_gpu**=*integer*  
    &nbsp;&nbsp;&nbsp;&nbsp;Number of images per GPU  
    &nbsp;&nbsp;&nbsp;&nbsp;Bigger number means faster training but needs a bigger GPU  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *1*  
    **gpu_count**=*integer*  
    &nbsp;&nbsp;&nbsp;&nbsp;Number of GPUs to be used  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *1*  
    **mini_mask_size**=*integer* [,*integer*,...]  
    &nbsp;&nbsp;&nbsp;&nbsp;Size of mini mask separated with ","  
    &nbsp;&nbsp;&nbsp;&nbsp;To use full sized masks, keep empty. Mini mask saves memory at the expense of precision  
    **validation_steps**=*integer*  
    &nbsp;&nbsp;&nbsp;&nbsp;Number of validation steps  
    &nbsp;&nbsp;&nbsp;&nbsp;Bigger number means more accurate estimation of the model precision  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *100*  
    **images_min_dim**=*integer*  
    &nbsp;&nbsp;&nbsp;&nbsp;Minimum length of images sides  
    &nbsp;&nbsp;&nbsp;&nbsp;Images will be resized to have their shortest side at least of this value (has to be a multiple of 64)  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *256*  
    **images_max_dim**=*integer*  
    &nbsp;&nbsp;&nbsp;&nbsp;Maximum length of images sides  
    &nbsp;&nbsp;&nbsp;&nbsp;Images will be resized to have their longest side of this value (has to be a multiple of 64)  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *1280*  
    **backbone**=*string*  
    &nbsp;&nbsp;&nbsp;&nbsp;Backbone architecture  
    &nbsp;&nbsp;&nbsp;&nbsp;Allowed values: *resnet50, resnet101*  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *resnet101*  
    **-e**  
    &nbsp;&nbsp;&nbsp;&nbsp;Pretrained weights were trained on another classes / resolution / sizes  
    **-s**  
    &nbsp;&nbsp;&nbsp;&nbsp;Do not use 10 % of images and save their list to logs dir  
    **-b**  
    &nbsp;&nbsp;&nbsp;&nbsp;Train also batch normalization layers (not recommended for small batches)  
    **-n**  
    &nbsp;&nbsp;&nbsp;&nbsp;No resizing or padding of images (images must be of the same size)  
    **--help**  
    &nbsp;&nbsp;&nbsp;&nbsp;Print usage summary  
    **--verbose**  
    &nbsp;&nbsp;&nbsp;&nbsp;Verbose module output  
    **--quiet**  
    &nbsp;&nbsp;&nbsp;&nbsp;Quiet module output  
    **--qq**  
    &nbsp;&nbsp;&nbsp;&nbsp;Very quiet module output  
    **--ui**  
    &nbsp;&nbsp;&nbsp;&nbsp;Force launching GUI dialog

=== "Python (grass.script)"

    **training_dataset** : str, *required*  
    &nbsp;&nbsp;&nbsp;&nbsp;Path to the dataset with images and masks  
    &nbsp;&nbsp;&nbsp;&nbsp;Name of input directory  
    &nbsp;&nbsp;&nbsp;&nbsp;Used as: input, dir, *name*  
    **model** : str, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Path to the .h5 file to use as initial values  
    &nbsp;&nbsp;&nbsp;&nbsp;Keep empty to train from a scratch  
    &nbsp;&nbsp;&nbsp;&nbsp;Used as: input, file, *name*  
    **classes** : str | list[str], *required*  
    &nbsp;&nbsp;&nbsp;&nbsp;Names of classes separated with ","  
    **logs** : str, *required*  
    &nbsp;&nbsp;&nbsp;&nbsp;Path to the directory in which will be models saved  
    &nbsp;&nbsp;&nbsp;&nbsp;Name of input directory  
    &nbsp;&nbsp;&nbsp;&nbsp;Used as: input, dir, *name*  
    **name** : str, *required*  
    &nbsp;&nbsp;&nbsp;&nbsp;Name for output models  
    **epochs** : int, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Number of epochs  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *200*  
    **steps_per_epoch** : int, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Steps per each epoch  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *3000*  
    **rois_per_image** : int, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;How many ROIs train per image  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *64*  
    **images_per_gpu** : int, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Number of images per GPU  
    &nbsp;&nbsp;&nbsp;&nbsp;Bigger number means faster training but needs a bigger GPU  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *1*  
    **gpu_count** : int, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Number of GPUs to be used  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *1*  
    **mini_mask_size** : int | list[int] | str, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Size of mini mask separated with ","  
    &nbsp;&nbsp;&nbsp;&nbsp;To use full sized masks, keep empty. Mini mask saves memory at the expense of precision  
    **validation_steps** : int, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Number of validation steps  
    &nbsp;&nbsp;&nbsp;&nbsp;Bigger number means more accurate estimation of the model precision  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *100*  
    **images_min_dim** : int, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Minimum length of images sides  
    &nbsp;&nbsp;&nbsp;&nbsp;Images will be resized to have their shortest side at least of this value (has to be a multiple of 64)  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *256*  
    **images_max_dim** : int, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Maximum length of images sides  
    &nbsp;&nbsp;&nbsp;&nbsp;Images will be resized to have their longest side of this value (has to be a multiple of 64)  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *1280*  
    **backbone** : str, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Backbone architecture  
    &nbsp;&nbsp;&nbsp;&nbsp;Allowed values: *resnet50, resnet101*  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *resnet101*  
    **flags** : str, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Allowed values: *e*, *s*, *b*, *n*  
    &nbsp;&nbsp;&nbsp;&nbsp;**e**  
    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Pretrained weights were trained on another classes / resolution / sizes  
    &nbsp;&nbsp;&nbsp;&nbsp;**s**  
    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Do not use 10 % of images and save their list to logs dir  
    &nbsp;&nbsp;&nbsp;&nbsp;**b**  
    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Train also batch normalization layers (not recommended for small batches)  
    &nbsp;&nbsp;&nbsp;&nbsp;**n**  
    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;No resizing or padding of images (images must be of the same size)  
    **verbose** : bool, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Verbose module output  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *None*  
    **quiet** : bool, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Quiet module output  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *None*  
    **superquiet** : bool, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Very quiet module output  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *None*  

=== "Python (grass.tools)"

    **training_dataset** : str, *required*  
    &nbsp;&nbsp;&nbsp;&nbsp;Path to the dataset with images and masks  
    &nbsp;&nbsp;&nbsp;&nbsp;Name of input directory  
    &nbsp;&nbsp;&nbsp;&nbsp;Used as: input, dir, *name*  
    **model** : str | io.StringIO, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Path to the .h5 file to use as initial values  
    &nbsp;&nbsp;&nbsp;&nbsp;Keep empty to train from a scratch  
    &nbsp;&nbsp;&nbsp;&nbsp;Used as: input, file, *name*  
    **classes** : str | list[str], *required*  
    &nbsp;&nbsp;&nbsp;&nbsp;Names of classes separated with ","  
    **logs** : str, *required*  
    &nbsp;&nbsp;&nbsp;&nbsp;Path to the directory in which will be models saved  
    &nbsp;&nbsp;&nbsp;&nbsp;Name of input directory  
    &nbsp;&nbsp;&nbsp;&nbsp;Used as: input, dir, *name*  
    **name** : str, *required*  
    &nbsp;&nbsp;&nbsp;&nbsp;Name for output models  
    **epochs** : int, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Number of epochs  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *200*  
    **steps_per_epoch** : int, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Steps per each epoch  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *3000*  
    **rois_per_image** : int, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;How many ROIs train per image  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *64*  
    **images_per_gpu** : int, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Number of images per GPU  
    &nbsp;&nbsp;&nbsp;&nbsp;Bigger number means faster training but needs a bigger GPU  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *1*  
    **gpu_count** : int, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Number of GPUs to be used  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *1*  
    **mini_mask_size** : int | list[int] | str, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Size of mini mask separated with ","  
    &nbsp;&nbsp;&nbsp;&nbsp;To use full sized masks, keep empty. Mini mask saves memory at the expense of precision  
    **validation_steps** : int, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Number of validation steps  
    &nbsp;&nbsp;&nbsp;&nbsp;Bigger number means more accurate estimation of the model precision  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *100*  
    **images_min_dim** : int, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Minimum length of images sides  
    &nbsp;&nbsp;&nbsp;&nbsp;Images will be resized to have their shortest side at least of this value (has to be a multiple of 64)  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *256*  
    **images_max_dim** : int, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Maximum length of images sides  
    &nbsp;&nbsp;&nbsp;&nbsp;Images will be resized to have their longest side of this value (has to be a multiple of 64)  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *1280*  
    **backbone** : str, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Backbone architecture  
    &nbsp;&nbsp;&nbsp;&nbsp;Allowed values: *resnet50, resnet101*  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *resnet101*  
    **flags** : str, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Allowed values: *e*, *s*, *b*, *n*  
    &nbsp;&nbsp;&nbsp;&nbsp;**e**  
    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Pretrained weights were trained on another classes / resolution / sizes  
    &nbsp;&nbsp;&nbsp;&nbsp;**s**  
    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Do not use 10 % of images and save their list to logs dir  
    &nbsp;&nbsp;&nbsp;&nbsp;**b**  
    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Train also batch normalization layers (not recommended for small batches)  
    &nbsp;&nbsp;&nbsp;&nbsp;**n**  
    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;No resizing or padding of images (images must be of the same size)  
    **verbose** : bool, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Verbose module output  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *None*  
    **quiet** : bool, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Quiet module output  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *None*  
    **superquiet** : bool, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Very quiet module output  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *None*  

    Returns:

    **result** : grass.tools.support.ToolResult | None  
    If the tool produces text as standard output, a *ToolResult* object will be returned. Otherwise, `None` will be returned.

    Raises:

    *grass.tools.ToolError*: When the tool ended with an error.

## DESCRIPTION

*i.ann.maskrcnn.train* allows the user to train a Mask R-CNN model on
his own dataset. The dataset has to be prepared in a predefined
structure.

### DATASET STRUCTURE

Training dataset should be in the following structure:

dataset-directory

- imagenumber
  - imagenumber.jpg (training image)
  - imagenumber-class1-number.png (mask for one instance of class1)
  - imagenumber-class1-number.png (mask for another instance of
        class1)
  - ...
- imagenumber2
  - imagenumber2.jpg
  - imagenumber2-class1-number.png (mask for one instance of class1)
  - imagenumber2-class2-number.png (mask for another class instance)
  - ...

The described structure of directories is required. Pictures must be
\*.jpg files with 3 channels (for example RGB), masks must be \*.png
files consisting of numbers between 1 and 255 (object instance) and 0s
(elsewhere). A mask file for each instance of an object should be
provided separately distinguished by the suffix number.

## NOTES

If you are using initial weights (the *model* parameter), epochs are
divided into three segments. Firstly training layers 5+, then
fine-tuning layers 4+ and the last segment is fine-tuning the whole
architecture. Ending number of epochs is shown for your segment, not for
the whole training.

The usage of the *-b* flag will result in an activation of batch
normalization layers training. By default, this option is set to False,
as it is not recommended to train them when using just small batches
(batch is defined by the *images\_per\_gpu* parameter).

If the dataset consists of images of the same size, the user may use the
*-n* flag to avoid resizing or padding of images. When the flag is not
used, images are resized to have their longer side equal to the value of
the *images\_max\_dim* parameter and the shorter side longer or equal to
the value of the *images\_min\_dim* parameter and zero-padded to be of
shape

images\_max\_dim x images\_max\_dim

. It results in the fact that even images of different sizes may be
used.

After each epoch, the current model is saved. It allows the user to stop
the training when he feels satisfied with loss functions. It also allows
the user to test models even during the training (and, again, stop it
even before the last epoch).

## EXAMPLES

Dataset for examples:

crops

- 000000
  - 000000.jpg
  - 000000-corn-0.png
  - 000000-corn-1.png
  - ...
- 000001
  - 000001.jpg
  - 000001-corn-0.png
  - 000001-rice-0.png
  - ...

### Training from scratch

```sh
i.ann.maskrcnn.train training_dataset=/home/user/Documents/crops classes=corn,rice logs=/home/user/Documents/logs name=crops
```

After default number of epochs, we will get a model where the first
class is trained to detect corn fields and the second one to detect rice
fields.

If we use the command with reversed classes order, we will get a model
where the first class is trained to detect rice fields and the second
one to detect corn fields.

```sh
i.ann.maskrcnn.train training_dataset=/home/user/Documents/crops classes=rice,corn logs=/home/user/Documents/logs name=crops
```

The name of the model does not have to be the same as the dataset folder
but should be referring to the task of the dataset. A good name for this
one (referring also to the order of classes) could be also this one:

```sh
i.ann.maskrcnn.train training_dataset=/home/user/Documents/crops classes=rice,corn logs=/home/user/Documents/logs name=rice_corn
```

### Training from a pretrained model

We can use a pretrained model to make our training faster. It is
necessary for the model to be trained on the same channels and similar
features, but it does not have to be the same ones (e.g. model trained
on swimming pools in maps can be used for a training on buildings in
maps).

A model trained on different classes (use *-e* flag to exclude head
weights).

```sh
i.ann.maskrcnn.train training_dataset=/home/user/Documents/crops classes=corn,rice logs=/home/user/Documents/logs name=crops model=/home/user/Documents/models/buildings.h5 -e
```

A model trained on the same classes.

```sh
i.ann.maskrcnn.train training_dataset=/home/user/Documents/crops classes=corn,rice logs=/home/user/Documents/logs name=crops model=/home/user/Documents/models/corn_rice.h5
```

### Fine-tuning a model

It is also possible to stop your training and then continue. To continue
in the training, just use the last saved epoch as a pretrained model.

```sh
i.ann.maskrcnn.train training_dataset=/home/user/Documents/crops classes=corn,rice logs=/home/user/Documents/logs name=crops model=/home/user/Documents/models/mask_rcnn_crops_0005.h5
```

## SEE ALSO

*[Mask R-CNN in GRASS GIS](i.ann.maskrcnn.md),
[i.ann.maskrcnn.detect](i.ann.maskrcnn.detect.md)*

## AUTHOR

Ondrej Pesek

## SOURCE CODE

Available at: [i.ann.maskrcnn.train source code](https://github.com/OSGeo/grass-addons/tree/grass8/imagery/i.ann.maskrcnn/i.ann.maskrcnn.train)
([history](https://github.com/OSGeo/grass-addons/commits/grass8/imagery/i.ann.maskrcnn/i.ann.maskrcnn.train))  
Latest change: Friday Feb 21 10:10:05 2025 in commit [7d78fe3](https://github.com/OSGeo/grass-addons/commit/7d78fe34868674c3b6050ba1924e1c5675d155c9)
