.. _gdal_pipeline:

================================================================================
``gdal pipeline``
================================================================================

.. versionadded:: 3.12

.. only:: html

    Process a dataset applying several steps.

.. Index:: gdal pipeline

Description
-----------

:program:`gdal pipeline` execute a pipeline, taking a raster or input dataset,
execute steps and finally writing a raster or vector dataset.

Most steps proceed in on-demand evaluation of raster blocks or features,
unless otherwise stated in their documentation, without "materializing" the
resulting dataset of the operation of each step. It may be desirable sometimes
for performance purposes to proceed to materializing an intermediate dataset
to disk using :ref:`gdal_raster_materialize` or :ref:`gdal_vector_materialize`.

Synopsis
--------

.. program-output:: gdal pipeline --help-doc=main

A pipeline chains several steps, separated with the `!` (exclamation mark) character.
The first step must be ``read``, ``calc``, ``concat``, ``mosaic`` or ``stack``,
and the last one ``info``, ``tile`` or ``write``.
Each step has its own positional or non-positional arguments.
Apart from ``read``, ``calc``, ``concat``, ``mosaic``, ``stack``, ``info``, ``tile``, ``partition`` and ``write``,
all other steps can potentially be used several times in a pipeline.

.. example::
   :title: Compute the footprint of a raster and apply a buffer on the footprint

   .. code-block:: bash

        $ gdal pipeline ! read in.tif ! footprint ! buffer 20 ! write out.gpkg --overwrite

For steps that have both *raster* data type as input and output, consult :ref:`gdal_raster_pipeline`.
For steps that have both *vector* data type as input and output, consult :ref:`gdal_vector_pipeline`.

The following steps accept raster input and generate vector output:

* contour

.. program-output:: gdal pipeline --help-doc=contour

Details for options can be found in :ref:`gdal_raster_contour`.

* footprint

.. program-output:: gdal pipeline --help-doc=footprint

Details for options can be found in :ref:`gdal_raster_footprint`.

* polygonize

.. program-output:: gdal pipeline --help-doc=polygonize

Details for options can be found in :ref:`gdal_raster_polygonize`.

The following steps accept raster vector and generate raster output:

* grid

.. program-output:: gdal pipeline --help-doc=grid

Details for options can be found in :ref:`gdal_vector_grid`.

* rasterize

.. program-output:: gdal pipeline --help-doc=rasterize

Details for options can be found in :ref:`gdal_vector_rasterize`.

* tee

.. program-output:: gdal pipeline --help-doc=tee-raster

Details for options can be found in :ref:`gdal_output_nested_pipeline`.

GDALG output (on-the-fly / streamed dataset)
--------------------------------------------

A pipeline can be serialized as a JSON file using the ``GDALG`` output format.
The resulting file can then be opened as a dataset using the
:ref:`raster.gdalg` or :ref:`vector.gdalg` driver, and apply the specified pipeline in a on-the-fly /
streamed way.

The ``command_line`` member of the JSON file should nominally be the whole command
line without the final ``write`` step, and is what is generated by
``gdal pipeline ! .... ! write out.gdalg.json``.

.. code-block:: json

    {
        "type": "gdal_streamed_alg",
        "command_line": "gdal pipeline ! read in.tif ! footprint ! buffer 20"
    }

The final ``write`` step can be added but if so it must explicitly specify the
``stream`` output format and a non-significant output dataset name.

.. code-block:: json

    {
        "type": "gdal_streamed_alg",
        "command_line": "gdal pipeline ! read in.tif ! footprint ! buffer 20 ! write --output-format=streamed streamed_dataset"
    }

.. _gdal_pipeline_substitutions:

Substitutions
-------------

It is also possible to use :program:`gdal pipeline` to use a pipeline already
serialized in a ``.gdalg.json`` file, and customize its existing steps, typically
changing an input filename, specifying an output filename, or adding/modifying arguments
of steps.

The syntax is:

::

    gdal pipeline <filename.gdalg.json> --<step-name>.<arg-name>=value


When specifying an existing argument of a step of a pipeline, the value from the
pipeline is overridden by the one specified on the :program:`gdal pipeline` command line.

Let's imagine we have a :file:`raster_reproject.gdalg.json` with the following content:

.. code-block:: json

    {
        "type": "gdal_streamed_alg",
        "command_line": "gdal pipeline ! read in.tif ! reproject --dst-crs=EPSG:4326 ! edit --metadata=CHANGES=reprojected"
    }

It is possible to run it with the following command line, overriding the
``input`` argument of the ``read`` step, and implicitly adding a final ``write``
step with an ``output`` argument.

.. code-block:: bash

    $ gdal pipeline raster_reproject.gdalg.json --read.input=other_input.tif --write.output=out.tif


When there is no ambiguity, it is also possible to omit the step name, and just
specify the argument name (if there is an ambiguity, :program:`gdal pipeline`
will emit an error, so this is safe to do):

.. code-block:: bash

    $ gdal pipeline raster_reproject.gdalg.json --input=other_input.tif --output=out.tif --co COMPRESS=LZW --overwrite


When a step appears several times in the pipeline, it must specified as
``<step-name>[<idx>]``, where ``<idx>`` is a zero-based index.

For example, given:

.. code-block:: json

    {
        "type": "gdal_streamed_alg",
        "command_line": "gdal pipeline ! read in.tif ! edit --metadata=before=value ! reproject --dst-crs=EPSG:4326 ! edit --metadata=CHANGES=reprojected"
    }

the following command line may be used:

.. code-block:: bash

    $ gdal pipeline raster_reproject.gdalg.json --edit[0].metadata=before=modified --output=out.tif


Execution of pipelines and argument substitutions can also be done in Python with:

.. code-block:: python

    gdal.Run("pipeline", pipeline="raster_reproject.gdalg.json", output="out.tif", arguments={"edit[0].metadata": "before=modified"})


.. _gdal_nested_pipeline:

Nested pipeline
---------------

.. include:: gdal_cli_include/gdal_nested_pipeline_intro.rst

Input nested pipeline
*********************

Wherever an input dataset is expected as an auxiliary dataset, it is possible
to specify it as the result of a nested pipeline. The content of an input
nested pipeline is identical to the outer pipeline, except it must not end with
an output-generating step like ``info``, ``tile`` or ``write``

.. example::
   :title: Combine the output of shaded relief map and hypsometric rendering on a DEM to create a colorized shaded relief map.

   .. code-block:: bash

        $ gdal pipeline read n43.tif ! \
                        color-map --color-map color_file.txt ! \
                        color-merge --grayscale \
                            [ read n43.tif ! hillshade -z 30 ] ! \
                        write out.tif --overwrite

In the above example, the value of the ``grayscale`` argument of the ``color-merge``
step is set as the output of the nested pipeline ``read n43.tif ! hillshade -z 30``.

.. _gdal_output_nested_pipeline:

Output nested pipeline
**********************

The ``tee`` step in a pipeline forwards the input dataset as its output,
and additionally executes one or several nested pipelines that take this input
dataset as input and do other processing to eventually write the output of that
processing. The first step of a ``tee`` output nested pipeline *must* not be
``read``, ``calc``, ``concat``, ``mosaic`` or ``stack``, and its last step
*must* be ``write`` or ``tile``. The ``tee`` operator can be either used in
the middle of a pipeline or as its last step.

The below example shows an example where the ``tee`` operator executes two
output nested pipelines.

.. example::
   :title: Split the content of a "cities" layer according to whether its population is below or above 1 million.

   .. code-block:: bash

      $ gdal pipeline read cities.gpkg ! \
              tee [ filter --where "pop < 1e6" ! write small_cities.gpkg ] \
                  [ filter --where "pop >= 1e6" ! write big_cities.gpkg ]


The below example shows a more complicated use case, including two occurrences of ``tee``,
with one of them being an output nested pipeline inside an input nested pipeline.

.. example::
   :title: Combine the output of shaded relief map and hypsometric rendering on
           a DEM to create a colorized shaded relief map, and write intermediate
           hillshade and colorized dataset

   .. code-block:: bash

        $ gdal pipeline read n43.tif ! \
                        color-map --color-map color_file.txt ! \
                        tee [ write colored.tif --overwrite ] ! \
                        color-merge --grayscale \
                            [ read n43.tif ! hillshade -z 30  ! tee [ write hillshade.tif --overwrite ] ] ! \
                        write colored-hillshade.tif --overwrite

Examples
--------

.. example::
   :title: Compute the footprint of a raster and apply a buffer on the footprint

   .. code-block:: bash

        $ gdal pipeline ! read in.tif ! footprint ! buffer 20 ! write out.gpkg --overwrite

.. example::
   :title: Rasterize and reproject

   .. code-block:: bash

        $ gdal pipeline ! read in.gpkg ! rasterize --size 1000,1000 ! reproject --dst-crs EPSG:4326 ! write out.tif --overwrite

.. example::
   :title: Use an existing pipeline that rasterizes and reprojects, but change its input file and target CRS, and specify the output file

   .. code-block:: bash

        $ gdal pipeline raster_reproject.gdalg.json --input=my.gpkg --output=out.tif --dst-crs=EPSG:32631
