Building Indexes

Indexes are what make using a spatial database for large databases possible. Without indexing, any search for a feature would require a "sequential scan" of every record in the database. Indexing speeds up searching by organizing the data into a search tree which can be quickly traversed to find a particular record. PostgreSQL supports three kinds of indexes by default: B-Tree indexes, R-Tree indexes, and GiST indexes.

GiST Indexes

GiST stands for "Generalized Search Tree" and is a generalized form of R-Tree indexing. In addition to GIS indexing, GiST is used to speed up searches on all kinds of irregular data structures (integer arrays, spectral data, etc) which are not amenable to normal B-Tree indexing.

Once a GIS data table exceeds a few thousand rows, you will want to build an index to speed up spatial searches of the data (unless all your searches are based on attributes, in which case you'll want to build a normal index on the attribute fields).

The syntax for building a GiST index on a "geometry" column is as follows:

CREATE INDEX [indexname] ON [tablename] USING GIST (
 [geometryfield] GIST_GEOMETRY_OPS ) WITH ( ISLOSSY ); 

Building a spatial index is a computationally intensive exercise: on tables of around 1 million rows, on a 300MHz Solaris machine, we have found building a GiST index takes about 1 hour.

GiST indexes have two advantages over R-Tree indexes in PostgreSQL. Firstly, GiST indexes build much faster than R-Trees; we have found that it takes about 4 times more time to build an R-Tree than a GiST index on an identical table. Secondly, GiST indexes support the concept of "lossiness" which is important when dealing with GIS objects larger than the PostgreSQL 8K page size. Lossiness allows PostgreSQL to store only the "important" part of an object in an index -- in the case of GIS objects, just the bounding box. GIS objects larger than 8K will cause R-Tree indexes to fail in the build phase.

Using Indexes

Ordinarily, indexes invisibly speed up data access: once the index is built, the query planner tranparently decides when to use index information to speed up a query plan. Unfortunately, the PostgreSQL query planner does not optimize the use of GiST indexes well, so sometimes searches which should use a spatial index instead default to a sequence scan of the whole table.

If you find your spatial indexes are not being used (or your attribute indexes, for that matter) there are a couple things you can do: