Multidimensional analysis v16

Multidimensional analysis refers to the process of examining data using various combinations of dimensions. Dimensions are categories used to classify data such as time, geography, a company’s departments, product lines, and so forth. This process is commonly used in data warehousing applications. The results associated with a particular set of dimensions are called facts. Facts are typically figures associated with product sales, profits, volumes, counts, and so on.

To obtain these facts according to a set of dimensions in a relational database system, you typically use SQL aggregation. SQL aggregation basically means data is grouped according to certain criteria (dimensions), and the result set consists of aggregates of facts, such as counts, sums, and averages of the data in each group.

Aggregating results

The GROUP BY clause of the SQL SELECT command supports the following extensions that simplify the process of producing aggregate results:

  • ROLLUP extension
  • CUBE extension
  • GROUPING SETS extension

In addition, you can use the GROUPING function and the GROUPING_ID function in the SELECT list or the HAVING clause to aid with the interpretation of the results when you use these extensions.

Note

The sample dept and emp tables are used extensively in this discussion to provide usage examples. The following changes were applied to these tables to provide more informative results:

UPDATE dept SET loc = 'BOSTON' WHERE deptno = 20;
INSERT INTO emp (empno,ename,job,deptno) VALUES (9001,'SMITH','CLERK',40);
INSERT INTO emp (empno,ename,job,deptno) VALUES (9002,'JONES','ANALYST',40);
INSERT INTO emp (empno,ename,job,deptno) VALUES (9003,'ROGERS','MANAGER',40);

The following rows from a join of the emp and dept tables are used:

SELECT loc, dname, job, empno FROM emp e, dept d
WHERE e.deptno = d.deptno
ORDER BY 1, 2, 3, 4;
Output
   loc    |   dname    |    job    | empno
----------+------------+-----------+-------
 BOSTON   | OPERATIONS | ANALYST   |  9002
 BOSTON   | OPERATIONS | CLERK     |  9001
 BOSTON   | OPERATIONS | MANAGER   |  9003
 BOSTON   | RESEARCH   | ANALYST   |  7788
 BOSTON   | RESEARCH   | ANALYST   |  7902
 BOSTON   | RESEARCH   | CLERK     |  7369
 BOSTON   | RESEARCH   | CLERK     |  7876
 BOSTON   | RESEARCH   | MANAGER   |  7566
 CHICAGO  | SALES      | CLERK     |  7900
 CHICAGO  | SALES      | MANAGER   |  7698
 CHICAGO  | SALES      | SALESMAN  |  7499
 CHICAGO  | SALES      | SALESMAN  |  7521
 CHICAGO  | SALES      | SALESMAN  |  7654
 CHICAGO  | SALES      | SALESMAN  |  7844
 NEW YORK | ACCOUNTING | CLERK     |  7934
 NEW YORK | ACCOUNTING | MANAGER   |  7782
 NEW YORK | ACCOUNTING | PRESIDENT |  7839
(17 rows)

The loc, dname, and job columns are used for the dimensions of the SQL aggregations used in the examples. The resulting facts of the aggregations are the number of employees obtained by using the COUNT(*) function.

Aggregation example

A basic query grouping the loc, dname, and job columns is given by the following:

SELECT loc, dname, job, COUNT(*) AS "employees" FROM emp e, dept d
WHERE e.deptno = d.deptno
GROUP BY loc, dname, job
ORDER BY 1, 2, 3;

The rows of this result set using the basic GROUP BY clause without extensions are referred to as the base aggregate rows.

Output
loc      |   dname    |    job    | employees
---------+------------+-----------+-----------
BOSTON   | OPERATIONS | ANALYST   |         1
BOSTON   | OPERATIONS | CLERK     |         1
BOSTON   | OPERATIONS | MANAGER   |         1
BOSTON   | RESEARCH   | ANALYST   |         2
BOSTON   | RESEARCH   | CLERK     |         2
BOSTON   | RESEARCH   | MANAGER   |         1
CHICAGO  | SALES      | CLERK     |         1
CHICAGO  | SALES      | MANAGER   |         1
CHICAGO  | SALES      | SALESMAN  |         4
NEW YORK | ACCOUNTING | CLERK     |         1
NEW YORK | ACCOUNTING | MANAGER   |         1
NEW YORK | ACCOUNTING | PRESIDENT |         1
(12 rows)

Useful extensions

The ROLLUP and CUBE extensions add to the base aggregate rows by providing additional levels of subtotals to the result set.

The GROUPING SETS extension lets you combine different types of groupings into a single result set.

The GROUPING and GROUPING_ID functions help you to interpret the result set.

rollup_extension cube_extension grouping_sets_extension grouping_function grouping_id_function