---
name: m.crawl.thredds.py
description: List dataset urls from a Thredds Data Server (TDS) catalog.
keywords: [ temporal, import, download, data, metadata, netcdf, thredds, opendap ]
---

# m.crawl.thredds.py

List dataset urls from a Thredds Data Server (TDS) catalog.

=== "Command line"

    **m.crawl.thredds.py**
    **input**=*string*
    [**print**=*string* [,*string*,...]]
    **services**=*string* [,*string*,...]
    [**filter**=*string*]
    [**skip**=*string* [,*string*,...]]
    [**output**=*name*]
    [**separator**=*character*]
    [**modified_before**=*string*]
    [**modified_after**=*string*]
    [**authentication**=*name*]
    [**nprocs**=*Number of cores*]
    [**--overwrite**]
    [**--verbose**]
    [**--quiet**]
    [**--qq**]
    [**--ui**]

    Example:

    ```sh
    m.crawl.thredds.py input=string services=httpserver
    ```

=== "Python (grass.script)"

    *grass.script.run_command*("***m.crawl.thredds.py***",
        **input**,
        **print**=*None*,
        **services**=*"httpserver"*,
        **filter**=*".\*"*,
        **skip**=*None*,
        **output**=*"-"*,
        **separator**=*"pipe"*,
        **modified_before**=*None*,
        **modified_after**=*None*,
        **authentication**=*None*,
        **nprocs**=*1*,
        **overwrite**=*None*,
        **verbose**=*None*,
        **quiet**=*None*,
        **superquiet**=*None*)

    Example:

    ```python
    gs.run_command("m.crawl.thredds.py", input="string", services="httpserver")
    ```

=== "Python (grass.tools)"

    *grass.tools.Tools.m_crawl_thredds_py*(**input**,
        **print**=*None*,
        **services**=*"httpserver"*,
        **filter**=*".\*"*,
        **skip**=*None*,
        **output**=*"-"*,
        **separator**=*"pipe"*,
        **modified_before**=*None*,
        **modified_after**=*None*,
        **authentication**=*None*,
        **nprocs**=*1*,
        **overwrite**=*None*,
        **verbose**=*None*,
        **quiet**=*None*,
        **superquiet**=*None*)

    Example:

    ```python
    tools = Tools()
    tools.m_crawl_thredds_py(input="string", services="httpserver")
    ```

    This grass.tools API is experimental in version 8.5 and expected to be stable in version 8.6.

## Parameters

=== "Command line"

    **input**=*string* **[required]**  
    &nbsp;&nbsp;&nbsp;&nbsp;URL of a catalog on a thredds server  
    **print**=*string* [,*string*,...]  
    &nbsp;&nbsp;&nbsp;&nbsp;Additional information to print  
    &nbsp;&nbsp;&nbsp;&nbsp;Allowed values: *service, dataset_size*  
    **services**=*string* [,*string*,...] **[required]**  
    &nbsp;&nbsp;&nbsp;&nbsp;Services of thredds server to crawl  
    &nbsp;&nbsp;&nbsp;&nbsp;Comma separated list of services names (lower case) of thredds server to crawl, typical services are: httpserver, netcdfsubset, opendap, wms  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *httpserver*  
    **filter**=*string*  
    &nbsp;&nbsp;&nbsp;&nbsp;Regular expression for filtering dataset and catalog URLs  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *.\**  
    **skip**=*string* [,*string*,...]  
    &nbsp;&nbsp;&nbsp;&nbsp;Regular expression(s) for skipping sub-catalogs / URLs (e.g. ".\*jpeg.\*,.\*metadata.\*)"  
    **output**=*name*  
    &nbsp;&nbsp;&nbsp;&nbsp;Name of the output file (stdout if omitted)  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *-*  
    **separator**=*character*  
    &nbsp;&nbsp;&nbsp;&nbsp;Field separator  
    &nbsp;&nbsp;&nbsp;&nbsp;Special characters: pipe, comma, space, tab, newline  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *pipe*  
    **modified_before**=*string*  
    &nbsp;&nbsp;&nbsp;&nbsp;Latest modification timestamp of datasets to include in the output  
    &nbsp;&nbsp;&nbsp;&nbsp;ISO-formated date or timestamp (e.g. "2000-01-01T12:12:55.03456Z" or "2000-01-01")  
    **modified_after**=*string*  
    &nbsp;&nbsp;&nbsp;&nbsp;Earliest modification timestamp of datasets to include in the output  
    &nbsp;&nbsp;&nbsp;&nbsp;ISO-formated date or timestamp (e.g. "2000-01-01T12:12:55.03456Z" or "2000-01-01")  
    **authentication**=*name*  
    &nbsp;&nbsp;&nbsp;&nbsp;Authentication for thredds server  
    &nbsp;&nbsp;&nbsp;&nbsp;File with authentication information (username and password) for thredds server  
    **nprocs**=*Number of cores*  
    &nbsp;&nbsp;&nbsp;&nbsp;Number of cores to use for crawling thredds server  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *1*  
    **--overwrite**  
    &nbsp;&nbsp;&nbsp;&nbsp;Allow output files to overwrite existing files  
    **--help**  
    &nbsp;&nbsp;&nbsp;&nbsp;Print usage summary  
    **--verbose**  
    &nbsp;&nbsp;&nbsp;&nbsp;Verbose module output  
    **--quiet**  
    &nbsp;&nbsp;&nbsp;&nbsp;Quiet module output  
    **--qq**  
    &nbsp;&nbsp;&nbsp;&nbsp;Very quiet module output  
    **--ui**  
    &nbsp;&nbsp;&nbsp;&nbsp;Force launching GUI dialog

=== "Python (grass.script)"

    **input** : str, *required*  
    &nbsp;&nbsp;&nbsp;&nbsp;URL of a catalog on a thredds server  
    **print** : str | list[str], *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Additional information to print  
    &nbsp;&nbsp;&nbsp;&nbsp;Allowed values: *service, dataset_size*  
    **services** : str | list[str], *required*  
    &nbsp;&nbsp;&nbsp;&nbsp;Services of thredds server to crawl  
    &nbsp;&nbsp;&nbsp;&nbsp;Comma separated list of services names (lower case) of thredds server to crawl, typical services are: httpserver, netcdfsubset, opendap, wms  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *httpserver*  
    **filter** : str, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Regular expression for filtering dataset and catalog URLs  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *.\**  
    **skip** : str | list[str], *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Regular expression(s) for skipping sub-catalogs / URLs (e.g. ".\*jpeg.\*,.\*metadata.\*)"  
    **output** : str, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Name of the output file (stdout if omitted)  
    &nbsp;&nbsp;&nbsp;&nbsp;Used as: output, file, *name*  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *-*  
    **separator** : str, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Field separator  
    &nbsp;&nbsp;&nbsp;&nbsp;Special characters: pipe, comma, space, tab, newline  
    &nbsp;&nbsp;&nbsp;&nbsp;Used as: input, separator, *character*  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *pipe*  
    **modified_before** : str, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Latest modification timestamp of datasets to include in the output  
    &nbsp;&nbsp;&nbsp;&nbsp;ISO-formated date or timestamp (e.g. "2000-01-01T12:12:55.03456Z" or "2000-01-01")  
    **modified_after** : str, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Earliest modification timestamp of datasets to include in the output  
    &nbsp;&nbsp;&nbsp;&nbsp;ISO-formated date or timestamp (e.g. "2000-01-01T12:12:55.03456Z" or "2000-01-01")  
    **authentication** : str, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Authentication for thredds server  
    &nbsp;&nbsp;&nbsp;&nbsp;File with authentication information (username and password) for thredds server  
    &nbsp;&nbsp;&nbsp;&nbsp;Used as: input, file, *name*  
    **nprocs** : int, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Number of cores to use for crawling thredds server  
    &nbsp;&nbsp;&nbsp;&nbsp;Used as: *Number of cores*  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *1*  
    **overwrite** : bool, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Allow output files to overwrite existing files  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *None*  
    **verbose** : bool, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Verbose module output  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *None*  
    **quiet** : bool, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Quiet module output  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *None*  
    **superquiet** : bool, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Very quiet module output  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *None*  

=== "Python (grass.tools)"

    **input** : str, *required*  
    &nbsp;&nbsp;&nbsp;&nbsp;URL of a catalog on a thredds server  
    **print** : str | list[str], *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Additional information to print  
    &nbsp;&nbsp;&nbsp;&nbsp;Allowed values: *service, dataset_size*  
    **services** : str | list[str], *required*  
    &nbsp;&nbsp;&nbsp;&nbsp;Services of thredds server to crawl  
    &nbsp;&nbsp;&nbsp;&nbsp;Comma separated list of services names (lower case) of thredds server to crawl, typical services are: httpserver, netcdfsubset, opendap, wms  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *httpserver*  
    **filter** : str, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Regular expression for filtering dataset and catalog URLs  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *.\**  
    **skip** : str | list[str], *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Regular expression(s) for skipping sub-catalogs / URLs (e.g. ".\*jpeg.\*,.\*metadata.\*)"  
    **output** : str, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Name of the output file (stdout if omitted)  
    &nbsp;&nbsp;&nbsp;&nbsp;Used as: output, file, *name*  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *-*  
    **separator** : str, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Field separator  
    &nbsp;&nbsp;&nbsp;&nbsp;Special characters: pipe, comma, space, tab, newline  
    &nbsp;&nbsp;&nbsp;&nbsp;Used as: input, separator, *character*  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *pipe*  
    **modified_before** : str, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Latest modification timestamp of datasets to include in the output  
    &nbsp;&nbsp;&nbsp;&nbsp;ISO-formated date or timestamp (e.g. "2000-01-01T12:12:55.03456Z" or "2000-01-01")  
    **modified_after** : str, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Earliest modification timestamp of datasets to include in the output  
    &nbsp;&nbsp;&nbsp;&nbsp;ISO-formated date or timestamp (e.g. "2000-01-01T12:12:55.03456Z" or "2000-01-01")  
    **authentication** : str | io.StringIO, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Authentication for thredds server  
    &nbsp;&nbsp;&nbsp;&nbsp;File with authentication information (username and password) for thredds server  
    &nbsp;&nbsp;&nbsp;&nbsp;Used as: input, file, *name*  
    **nprocs** : int, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Number of cores to use for crawling thredds server  
    &nbsp;&nbsp;&nbsp;&nbsp;Used as: *Number of cores*  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *1*  
    **overwrite** : bool, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Allow output files to overwrite existing files  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *None*  
    **verbose** : bool, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Verbose module output  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *None*  
    **quiet** : bool, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Quiet module output  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *None*  
    **superquiet** : bool, *optional*  
    &nbsp;&nbsp;&nbsp;&nbsp;Very quiet module output  
    &nbsp;&nbsp;&nbsp;&nbsp;Default: *None*  

    Returns:

    **result** : grass.tools.support.ToolResult | None  
    If the tool produces text as standard output, a *ToolResult* object will be returned. Otherwise, `None` will be returned.

    Raises:

    *grass.tools.ToolError*: When the tool ended with an error.

## DESCRIPTION

An increasing amount of spatio-temporal data, like climate observations
and forecast data or satellite imagery is provided through [Thredds Data
Servers (TDS)](https://www.unidata.ucar.edu/software/tds/).

*m.crawl.thredds* crawls the catalog of a Thredds Data Server (TDS)
starting from the catalog-URL provided in the **input**. It is a wrapper
module around the Python library
[thredds\_crawler](https://github.com/ioos/thredds_crawler).
*m.crawl.thredds* returns a list of dataset URLs, optionally with
additional information on the service type and data size. Depending on
the format of the crawled datasets, the output of *m.crawl.thredds* may
be used as input to *t.rast.import.netcdf*.

The returned list of datasets can be filtered:

- based on the modification time of the dataset using a range of
    relevant timestamps defined by the **modified\_before** and
    **modified\_after** option(s)
- based on the file name using a regular expression in the **filter**
    option.

When crawling larger Thredds installations, skipping irrelevant branches
of the server's tree of datasets can greatly speed-up the process. In
the **skip** option, branches (and also leaf datasets) can be excluded
from the search by a comma-separated list of regular expression strings,
e.g. ".\*metadata.\*" would direct the module to not look for datasets
inside a "metadata" directory.

Authentication to the Thredds Server (if required) can be provided
either through a text-file, where the first line contains the username
and the second the password, or by interactive user input (if
*authentication=-*). Alternatively, username and password can be passed
through environment variables *THREDDS\_USER* and *THREDDS\_PASSWORD*.

## NOTES

The Thredds data catalog is crawled recursively. Providing the URL to
the root of a catalog on a Thredds server with many hierarchies and
datasets can therefore be quite time consuming, even if executed in
parallel (**nprocs** \> 1).

## EXAMPLES

List modelled climate observation datasets from the Norwegian
Meteorological Institute (met.no)

```sh
# Get a list of all data for "seNorge"
m.crawl.thredds input="https://thredds.met.no/thredds/catalog/senorge/seNorge_2018/Archive/catalog.xml"
https://thredds.met.no/thredds/fileServer/senorge/seNorge_2018/Archive/seNorge2018_2021.nc
(...)
https://thredds.met.no/thredds/fileServer/senorge/seNorge_2018/Archive/seNorge2018_1957.nc

# Get a list of the most recent data for "seNorge"
m.crawl.thredds input="https://thredds.met.no/thredds/catalog/senorge/seNorge_2018/Archive/catalog.xml" modified_after="2021-02-01"
https://thredds.met.no/thredds/fileServer/senorge/seNorge_2018/Archive/seNorge2018_2021.nc
https://thredds.met.no/thredds/fileServer/senorge/seNorge_2018/Archive/seNorge2018_2020.nc

# Get a list of the most recent data for "seNorge" that match a regular expression
# Note the "." beofor the "*"
m.crawl.thredds input="https://thredds.met.no/thredds/catalog/senorge/seNorge_2018/Archive/catalog.xml" \
modified_after="2021-02-01" filter=".*2018_202.*"
https://thredds.met.no/thredds/fileServer/senorge/seNorge_2018/Archive/seNorge2018_2021.nc
https://thredds.met.no/thredds/fileServer/senorge/seNorge_2018/Archive/seNorge2018_2020.nc
```

List Sentinel-2A data from the Norwegian Ground Segment (NBS) for the 2.
Feb 2021

```sh
# Get a list of all Sentinel-2A data for 2. Feb 2021 with dataset size
m.crawl.thredds input="https://nbstds.met.no/thredds/catalog/NBS/S2A/2021/02/28/catalog.xml" print="data_size"
https://nbstds.met.no/thredds/fileServer/NBS/S2A/2021/02/28/S2A_MSIL1C_20210228T103021_N0202_R108_T35WPU_20210228T201033_DTERRENGDATA.nc|107.6
(...)
https://nbstds.met.no/thredds/fileServer/NBS/S2A/2021/02/28/S2A_MSIL1C_20210228T103021_N0202_R108_T32VNL_20210228T201033_DTERRENGDATA.nc|166.1

# Get a list of WMS end-points to all Sentinel-2A data for 2. Feb 2021
m.crawl.thredds input="https://nbstds.met.no/thredds/catalog/NBS/S2A/2021/02/28/catalog.xml" services="wms"
https://nbstds.met.no/thredds/wms/NBS/S2A/2021/02/28/S2A_MSIL1C_20210228T103021_N0202_R108_T35WPU_20210228T201033_DTERRENGDATA.nc
(...)
https://nbstds.met.no/thredds/wms/NBS/S2A/2021/02/28/S2A_MSIL1C_20210228T103021_N0202_R108_T32VNL_20210228T201033_DTERRENGDATA.nc
```

## REQUIREMENTS

*m.crawl.thredds* is a wrapper around the
[thredds\_crawler](https://github.com/ioos/thredds_crawler) Python
library.

## SEE ALSO

*[i.sentinel.download](https://grass.osgeo.org/grass-stable/manuals/addons/i.sentinel.download.html),
[t.rast.import.netcdf](https://grass.osgeo.org/grass-stable/manuals/addons/t.rast.import.netcdf.html)*

## AUTHORS

Stefan Blumentrath, [Norwegian Institute for Nature Research (NINA),
Oslo](https://www.nina.no/Kontakt/Ansatte/Ansattinformasjon.aspx?AnsattID=14230)

## SOURCE CODE

Available at: [m.crawl.thredds source code](https://github.com/OSGeo/grass-addons/tree/grass8/src/misc/m.crawl.thredds)
([history](https://github.com/OSGeo/grass-addons/commits/grass8/src/misc/m.crawl.thredds))  
Latest change: Friday Feb 21 10:10:05 2025 in commit [7d78fe3](https://github.com/OSGeo/grass-addons/commit/7d78fe34868674c3b6050ba1924e1c5675d155c9)
