Documentation
 
 
 

6.8. Hierarchical Queries

6.8.1. Hierarchical Queries

A hierarchical query is a type of query that returns the rows of the result set in a hierarchical order based upon data forming a parent-child relationship. A hierarchy is typically represented by an inverted tree structure. The tree is comprised of interconnected nodes. Each node may be connected to none, one, or multiple child nodes. Each node is connected to one parent node except for the top node which has no parent. This node is the root node. Each tree has exactly one root node. Nodes that don't have any children are called leaf nodes. A tree always has at least one leaf node.

In a hierarchical query the rows of the result set represent the nodes of one or more trees.

Note: It is possible that a single, given row may appear in more than one tree and thus appear more than once in the result set.

The hierarchical relationship in a query is described by the CONNECT BY clause which forms the basis of the order in which rows are returned in the result set. The context of where the CONNECT BY clause and its associated optional clauses appear in the SELECT command is shown below.

SELECT select_list
FROM table_expression
[ WHERE ... ]
[ START WITH start_expression ]
CONNECT BY { PRIOR parent_expr = child_expr | child_expr = PRIOR
parent_expr }
[ ORDER SIBLINGS BY column1 [ ASC | DESC ] [, column2 [ ASC | DESC ] ] ...
[ GROUP BY ... ]
[ HAVING ... ]
[ other ... ]

select_list and table_expression are as described in the prior query topics. other are any additional legal SELECT command clauses such as LIMIT. The clauses pertinent to hierarchical queries, START WITH, CONNECT BY, and ORDER SIBLINGS BY are described in the following sections.

6.8.2. Defining the Parent/Child Relationship

For any given row, its parent and its children are determined by the CONNECT BY clause. The CONNECT BY clause must consist of two expressions compared with the equals (=) operator. In addition, one of these two expressions must be preceded by the keyword, PRIOR.

For any given row, to determine its children:

  • Evaluate parent_expr on the given row

  • Evaluate child_expr on any other row resulting from the evaluation of table_expression

  • If parent_expr = child_expr, then this row is a child node of the given parent row

  • Repeat the process for all remaining rows in table_expression. All rows that satisfy the equation in step 3 are the children nodes of the given parent row.

Note: The evaluation process to determine if a row is a child node occurs on every row returned by table_expression before the WHERE clause is applied to table_expression.

By iteratively repeating this process treating each child node found in the prior steps as a parent, an inverted tree of nodes is constructed. The process is complete when the final set of child nodes have no children of their own - these are the leaf nodes.

A SELECT command that includes a CONNECT BY clause typically includes the START WITH clause. The START WITH clause determines the rows that are to be the root nodes - i.e., the rows that are the initial parent nodes upon which the algorithm described previously is to be applied. This is further explained in the following section.

6.8.3. Selecting the Root Nodes

The START WITH clause is used to determine the row(s) selected by table_expression that are to be used as the root nodes. All rows selected by table_expression where start_expression evaluates to true becomes a root node of a tree. Thus, the number of potential trees in the result set is equal to the number of root nodes. As a consequence, if the START WITH clause is omitted, then every row returned by table_expression is a root of its own tree.

6.8.4. Organization Tree in the Sample Application

Consider the emp table of the sample application. The rows of the emp table form a hierarchy based upon the mgr column which contains the employee number of the employee's manager. Each employee has at most, one manager. KING is the president of the company so he has no manager, therefore KING's mgr column is null. Also, it is possible for an employee to act as a manager for more than one employee. This relationship forms a typical, tree-structured, hierarchical organization chart as illustrated below.

To form a hierarchical query based upon this relationship, the SELECT command includes the clause, CONNECT BY PRIOR empno = mgr. For example, given the company president, KING, with employee number 7839, any employee whose mgr column is 7839 is a direct report of KING which is true for JONES, BLAKE, and CLARK (these are the child nodes of KING). Similarly, for employee, JONES, any other employee with mgr column equal to 7566 is a child node of JONES - these are SCOTT and FORD in this example.

The top of the organization chart is KING so there is one root node in this tree. The START WITH mgr IS NULL clause selects only KING as the initial root node.

The complete SELECT command is shown below.

SELECT ename, empno, mgr 
FROM emp
START WITH mgr IS NULL
CONNECT BY PRIOR empno = mgr;

The rows in the query output traverse each branch from the root to leaf moving in a top-to-bottom, left-to-right order Below is the output from this query.

ename  | empno | mgr
--------+-------+------
 KING   |  7839 |
 JONES  |  7566 | 7839
 SCOTT  |  7788 | 7566
 ADAMS  |  7876 | 7788
 FORD   |  7902 | 7566
 SMITH  |  7369 | 7902
 BLAKE  |  7698 | 7839
 ALLEN  |  7499 | 7698
 WARD   |  7521 | 7698
 MARTIN |  7654 | 7698
 TURNER |  7844 | 7698
 JAMES  |  7900 | 7698
 CLARK  |  7782 | 7839
 MILLER |  7934 | 7782
(14 rows)

6.8.5. Node Level

LEVEL is a pseudocolumn that can be used wherever a column can appear in the SELECT command. For each row in the result set, LEVEL returns a non-zero integer value designating the depth in the hierarchy of the node represented by this row. The LEVEL for root nodes is 1. The LEVEL for direct children of root nodes is 2, and so on.

The following query is a modification of the previous query with the addition of the LEVEL pseudocolumn. In addition, using the LEVEL value, the employee names are indented to further emphasize the depth in the hierarchy of each row.

SELECT LEVEL, LPAD (' ', 2 * (LEVEL - 1)) || ename "employee", empno, mgr
FROM emp START WITH mgr IS NULL
CONNECT BY PRIOR empno = mgr;

The output from this query follows.

level |  employee   | empno | mgr
-------+-------------+-------+------
     1 | KING        |  7839 |
     2 |   JONES     |  7566 | 7839
     3 |     SCOTT   |  7788 | 7566
     4 |       ADAMS |  7876 | 7788
     3 |     FORD    |  7902 | 7566
     4 |       SMITH |  7369 | 7902
     2 |   BLAKE     |  7698 | 7839
     3 |     ALLEN   |  7499 | 7698
     3 |     WARD    |  7521 | 7698
     3 |     MARTIN  |  7654 | 7698
     3 |     TURNER  |  7844 | 7698
     3 |     JAMES   |  7900 | 7698
     2 |   CLARK     |  7782 | 7839
     3 |     MILLER  |  7934 | 7782
(14 rows)

6.8.6. Ordering the Siblings

Nodes that share a common parent and are at the same level are called siblings. For example in the above output, employees Allen, Ward, Martin, Turner, and James are siblings since they are all at level three with parent, Blake. Jones, Blake, and Clark are siblings since they are at level two and King is their common parent.

The result set can be ordered so the siblings appear in ascending or descending order by selected column value(s) using the ORDER SIBLINGS BY clause. This a special case of the ORDER BY clause that can be used only with hierarchical queries.

The previous query is further modified with the addition of ORDER SIBLINGS BY ename ASC.

SELECT LEVEL, LPAD (' ', 2 * (LEVEL - 1)) || ename "employee", empno, mgr
FROM emp START WITH mgr IS NULL
CONNECT BY PRIOR empno = mgr
ORDER SIBLINGS BY ename ASC;

The output from the prior query is now modified so the siblings appear in ascending order by name. Siblings BLAKE, CLARK, and JONES are now alphabetically arranged under KING. Siblings ALLEN, JAMES, MARTIN, TURNER, and WARD are alphabetically arranged under BLAKE, and so on.

level |  employee   | empno | mgr
-------+-------------+-------+------
     1 | KING        |  7839 |
     2 |   BLAKE     |  7698 | 7839
     3 |     ALLEN   |  7499 | 7698
     3 |     JAMES   |  7900 | 7698
     3 |     MARTIN  |  7654 | 7698
     3 |     TURNER  |  7844 | 7698
     3 |     WARD    |  7521 | 7698
     2 |   CLARK     |  7782 | 7839
     3 |     MILLER  |  7934 | 7782
     2 |   JONES     |  7566 | 7839
     3 |     FORD    |  7902 | 7566
     4 |       SMITH |  7369 | 7902
     3 |     SCOTT   |  7788 | 7566
     4 |       ADAMS |  7876 | 7788
(14 rows)

This final example adds the WHERE clause and starts with three root nodes. After the node tree is constructed, the WHERE clause filters out rows in the tree to form the result set.

SELECT LEVEL, LPAD (' ', 2 * (LEVEL - 1)) || ename "employee", empno, mgr
FROM emp WHERE mgr IN (7839, 7782, 7902, 7788)
START WITH ename IN ('BLAKE','CLARK','JONES')
CONNECT BY PRIOR empno = mgr
ORDER SIBLINGS BY ename ASC;

The output from the query shows three root nodes (level one) - BLAKE, CLARK, and JONES. In addition, rows that do not satisfy the WHERE clause have been eliminated from the output.

 level | employee  | empno | mgr
-------+-----------+-------+------
     1 | BLAKE     |  7698 | 7839
     1 | CLARK     |  7782 | 7839
     2 |   MILLER  |  7934 | 7782
     1 | JONES     |  7566 | 7839
     3 |     SMITH |  7369 | 7902
     3 |     ADAMS |  7876 | 7788
(6 rows)

 
 ©2004-2007 EnterpriseDB All Rights Reserved