How Greenplum plans a JOIN

What is a “JOIN”

INNER JOIN is an AND, OUTER JOIN is an OR.

id name       id  name
-- ----       --  ----
1  Pirate     1   Rutabaga
2  Monkey     2   Pirate
3  Ninja      3   Darth Vader
4  Spaghetti  4   Ninja

INNER JOIN’s results are those tuples in table A AND B.

# SELECT * FROM TableA
INNER JOIN TableB
ON TableA.name = TableB.name;

 id |  name  | id |  name
----+--------+----+--------
  3 | Ninja  |  4 | Ninja
  1 | Pirate |  2 | Pirate
(2 rows)

FULL [OUTER] JOIN’s results are those tuples in table A OR B.

# SELECT * FROM TableA
FULL OUTER JOIN TableB
ON TableA.name = TableB.name;
 id |   name    | id |    name
----+-----------+----+-------------
  3 | Ninja     |  4 | Ninja
    |           |  3 | Darth Vader
  2 | Monkey    |    |
  1 | Pirate    |  2 | Pirate
    |           |  1 | Rutabaga
  4 | Spaghetti |    |
(6 rows)

LEFT [OUTER] JOIN’s results are those tuples in table A OR (A AND B).

# SELECT * FROM TableA
LEFT OUTER JOIN TableB
ON TableA.name = TableB.name;

 id |   name    | id |  name
----+-----------+----+--------
  1 | Pirate    |  2 | Pirate
  3 | Ninja     |  4 | Ninja
  2 | Monkey    |    |
  4 | Spaghetti |    |
(4 rows)

JOIN is not simple in Greenplum

Greenplum is (was ?) a shared-nothing distributed data warehouse, data transferring (motion) between nodes costs a lot.

To have a better performance, Greenplum needs to avoid motions and loops as possible.

How does Greenplum choose the path

build_simple_rel()

This stage is to get the info how data are distributed, GpPolicy.

The data to store are partitioned onto segment databases, the other data, for internal use like catalog, are not partitioned.

Entry database, I take it as the main backend, which holds none or all, but not a part of the data when processing.

/*
 * GpPolicyType represents a type of policy under which a relation's
 * tuples may be assigned to a component database.
 */
typedef enum GpPolicyType
{
    POLICYTYPE_PARTITIONED,     /* Tuples partitioned onto segment database. */
    POLICYTYPE_ENTRY            /* Tuples stored on entry database. */
} GpPolicyType;

set_base_rel_pathlists()

Finds all paths available for scanning each base-relation entry, sequential scan and any available indices are considered.

RTE_FUNCTION: create_functionscan_path()
RTE_RELATION: create_external_path() / create_aocs_path() / create_seqscan_path() / create_index_paths() …
RTE_VALUES: create_valuesscan_path()

create_xxx_path()

Decide Locus (location, describing the distribution of a base relation).

The difference between entry policy and entry locus is that, entry database is a concept of how data store, entry locus is where data come from.

For instances, catalog’s locus is General and its policy is Entry; The value to insert, its locus is Entry, data go into master firstly; generate_series()’s locus is SingleQE, it could be planned onto any db.

typedef enum CdbLocusType
{
    CdbLocusType_Null,
    CdbLocusType_Entry,         /* a single backend process on the entry db:
                                 * usually the qDisp itself, but could be a
                                 * qExec started by the entry postmaster.
                                 */
    CdbLocusType_SingleQE,      /* a single backend process on any db: the
                                 * qDisp itself, or a qExec started by a
                                 * segment postmaster or the entry postmaster.
                                 */
    CdbLocusType_General,       /* compatible with any locus (data is
                                 * self-contained in the query plan or
                                 * generally available in any qExec or qDisp) */
    CdbLocusType_Replicated,    /* replicated over all qExecs of an N-gang */
    CdbLocusType_Hashed,        /* hash partitioned over all qExecs of N-gang */
    CdbLocusType_HashedOJ,      /* result of hash partitioned outer join */
    CdbLocusType_Strewn,        /* partitioned on no known function */
    CdbLocusType_End            /* = last valid CdbLocusType + 1 */
} CdbLocusType;

make_rel_from_joinlist()

Build access paths using a “joinlist” to guide the join path search, nestloop, merge, hash?

standard_join_search() -> set_cheapest()

for_each_cell(p, lnext(list_head(parent_rel->pathlist)))
{
	Path       *path = (Path *) lfirst(p);
	compare_path_costs(..., path, ...)
	if (BETTER)
		cheapest_total_path = path;
}

Quizzes

What and where are the motions? Entry, SingleQE, General, Hashed and Strewn locuses, join and left join, each other?