.. _gdal_vector_pipeline:

================================================================================
``gdal vector pipeline``
================================================================================

.. versionadded:: 3.11

.. only:: html

    Process a vector dataset applying several steps.

.. Index:: gdal vector pipeline

Description
-----------

:program:`gdal vector pipeline` can be used to process a vector dataset and
perform various processing steps that accept vector and generate vector.

For pipelines mixing raster and vector, consult :ref:`gdal_pipeline`.

Most steps proceed in on-demand evaluation of features,
unless otherwise stated in their documentation, without "materializing" the
resulting dataset of the operation of each step. It may be desirable sometimes
for performance purposes to proceed to materializing an intermediate dataset
to disk using :ref:`gdal_vector_materialize`.

Synopsis
--------

.. program-output:: gdal vector pipeline --help-doc=main

A pipeline chains several steps, separated with the `!` (exclamation mark) character.
The first step must be ``read`` or ``concat``, and the last one ``info``, ``partition`` or ``write``. Each step has its
own positional or non-positional arguments.
Apart from ``read``, ``concat``, ``info``, ``partition`` and ``write``,
all other steps can potentially be used several times in a pipeline.

Potential steps are:

* read

.. program-output:: gdal vector pipeline --help-doc=read

* buffer

.. program-output:: gdal vector pipeline --help-doc=buffer

Details for options can be found in :ref:`gdal_vector_buffer`.

* concat

.. program-output:: gdal vector pipeline --help-doc=concat

Details for options can be found in :ref:`gdal_vector_concat`.

* clip

.. program-output:: gdal vector pipeline --help-doc=clip

Details for options can be found in :ref:`gdal_vector_clip`.

* edit

.. program-output:: gdal vector pipeline --help-doc=edit

Details for options can be found in :ref:`gdal_vector_edit`.

* explode-collections

.. program-output:: gdal vector pipeline --help-doc=explode-collections

Details for options can be found in :ref:`gdal_vector_explode_collections`.

* filter

.. program-output:: gdal vector pipeline --help-doc=filter

Details for options can be found in :ref:`gdal_vector_filter`.

* limit

.. program-output:: gdal vector pipeline --help-doc=limit

* make-valid

.. program-output:: gdal vector pipeline --help-doc=make-valid

Details for options can be found in :ref:`gdal_vector_make_valid`.

* materialize

.. program-output:: gdal vector pipeline --help-doc=materialize

Details for options can be found in :ref:`gdal_vector_materialize`.

* reproject

.. program-output:: gdal vector pipeline --help-doc=reproject

Details for options can be found in :ref:`gdal_vector_reproject`.

* segmentize

.. program-output:: gdal vector pipeline --help-doc=segmentize

Details for options can be found in :ref:`gdal_vector_segmentize`.

* select

.. program-output:: gdal vector pipeline --help-doc=select

Details for options can be found in :ref:`gdal_vector_select`.

* set-field-type

.. program-output:: gdal vector pipeline --help-doc=set-field-type

Details for options can be found in :ref:`gdal_vector_set_field_type`.

* set-geom-type

.. program-output:: gdal vector pipeline --help-doc=set-geom-type

Details for options can be found in :ref:`gdal_vector_set_geom_type`.

* simplify

.. program-output:: gdal vector pipeline --help-doc=simplify

Details for options can be found in :ref:`gdal_vector_simplify`.

* simplify-coverage

.. program-output:: gdal vector pipeline --help-doc=simplify-coverage

Details for options can be found in :ref:`gdal_vector_simplify_coverage`.

* sql

.. program-output:: gdal vector pipeline --help-doc=sql

Details for options can be found in :ref:`gdal_vector_sql`.

* swap-xy

.. program-output:: gdal vector pipeline --help-doc=swap-xy

Details for options can be found in :ref:`gdal_vector_swap_xy`.

* info

.. versionadded:: 3.12

.. program-output:: gdal vector pipeline --help-doc=info

Details for options can be found in :ref:`gdal_vector_info`.

* partition

.. versionadded:: 3.12

.. program-output:: gdal vector pipeline --help-doc=partition

Details for options can be found in :ref:`gdal_vector_partition`.

* write

.. program-output:: gdal vector pipeline --help-doc=write

GDALG output (on-the-fly / streamed dataset)
--------------------------------------------

A pipeline can be serialized as a JSON file using the ``GDALG`` output format.
The resulting file can then be opened as a vector dataset using the
:ref:`vector.gdalg` driver, and apply the specified pipeline in a on-the-fly /
streamed way.

The ``command_line`` member of the JSON file should nominally be the whole command
line without the final ``write`` step, and is what is generated by
``gdal vector pipeline ! .... ! write out.gdalg.json``.

.. code-block:: json

    {
        "type": "gdal_streamed_alg",
        "command_line": "gdal vector pipeline ! read in.gpkg ! reproject --dst-crs=EPSG:32632"
    }

The final ``write`` step can be added but if so it must explicitly specify the
``stream`` output format and a non-significant output dataset name.

.. code-block:: json

    {
        "type": "gdal_streamed_alg",
        "command_line": "gdal vector pipeline ! read in.gpkg ! reproject --dst-crs=EPSG:32632 ! write --output-format=streamed streamed_dataset"
    }



Substitutions
-------------

.. versionadded:: 3.12

It is possible to use :program:`gdal pipeline` to use a pipeline already
serialized in a .gdal.json file, and customize its existing steps, typically
changing an input filename, specifying an output filename, or adding/modifying arguments
of steps.

See :ref:`gdal_pipeline_substitutions`.


Nested pipeline
---------------

.. versionadded:: 3.12

.. include:: gdal_cli_include/gdal_nested_pipeline_intro.rst

See :ref:`gdal_nested_pipeline`.


Examples
--------

.. example::
   :title: Reproject a GeoPackage file to CRS EPSG:32632 ("WGS 84 / UTM zone 32N")

   .. code-block:: bash

        $ gdal vector pipeline ! read in.gpkg ! reproject --dst-crs=EPSG:32632 ! write out.gpkg --overwrite

.. example::
   :title: Serialize the command of a reprojection of a GeoPackage file in a GDALG file, and later read it

   .. code-block:: bash

        $ gdal vector pipeline ! read in.gpkg ! reproject --dst-crs=EPSG:32632 ! write in_epsg_32632.gdalg.json --overwrite
        $ gdal vector info in_epsg_32632.gdalg.json

.. example:: Union 2 source shapefiles (with similar structure), reproject them to EPSG:32632, keep only cities larger than 1 million inhabitants and write to a GeoPackage
   :title:

   .. code-block:: bash

        $ gdal vector pipeline ! concat --single --dst-crs=EPSG:32632 france.shp belgium.shp ! filter --where "pop > 1e6" ! write out.gpkg --overwrite
