Aggregate Functions v12

Aggregate functions compute a single result value from a set of input values. The built-in aggregate functions are listed in the following tables.

FunctionArgument TypeReturn TypeDescription
AVG (expression)INTEGER, REAL, DOUBLE PRECISION, NUMBERNUMBER for any integer type, DOUBLE PRECISION for a floating-point argument, otherwise the same as the argument data typeThe average (arithmetic mean) of all input values
COUNT(*)BIGINTNumber of input rows
COUNT (expression)AnyBIGINTNumber of input rows for which the value of expression is not null
MAX (expression)Any numeric, string, date/time, or bytea typeSame as argument typeMaximum value of expression across all input values
MIN (expression)Any numeric, string, date/time, or bytea typeSame as argument typeMinimum value of expression across all input values
SUM (expression)INTEGER, REAL, DOUBLE PRECISION, NUMBERBIGINT for SMALLINT or INTEGER arguments, NUMBER for BIGINT arguments, DOUBLE PRECISION for floating-point arguments, otherwise the same as the argument data typeSum of expression across all input values

It should be noted that except for COUNT, these functions return a null value when no rows are selected. In particular, SUM of no rows returns null, not zero as one might expect. The COALESCE function may be used to substitute zero for null when necessary.

The following table shows the aggregate functions typically used in statistical analysis. (These are separated out merely to avoid cluttering the listing of more-commonly-used aggregates.) Where the description mentions N, it means the number of input rows for which all the input expressions are non-null. In all cases, null is returned if the computation is meaningless, for example when N is zero.

FunctionArgument TypeReturn TypeDescription
CORR(Y, X)DOUBLE PRECISIONDOUBLE PRECISIONCorrelation coefficient
COVAR_POP(Y, X)DOUBLE PRECISIONDOUBLE PRECISIONPopulation covariance
COVAR_SAMP(Y, X)DOUBLE PRECISIONDOUBLE PRECISIONSample covariance
REGR_AVGX(Y, X)DOUBLE PRECISIONDOUBLE PRECISIONAverage of the independent variable (sum(X) / N)
REGR_AVGY(Y, X)DOUBLE PRECISIONDOUBLE PRECISIONAverage of the dependent variable (sum(Y) / N)
REGR_COUNT(Y, X)DOUBLE PRECISIONDOUBLE PRECISIONNumber of input rows in which both expressions are nonnull
REGR_INTERCEPT(Y, X)DOUBLE PRECISIONDOUBLE PRECISIONy-intercept of the least-squares-fit linear equation determined by the (X, Y) pairs
REGR_R2(Y, X)DOUBLE PRECISIONDOUBLE PRECISIONSquare of the correlation coefficient
REGR_SLOPE(Y, X)DOUBLE PRECISIONDOUBLE PRECISIONSlope of the least-squares-fit linear equation determined by the (X, Y) pairs
REGR_SXX(Y, X)DOUBLE PRECISIONDOUBLE PRECISIONSum (X2) – sum (X) 2 / N (“sum of squares” of the independent variable)
REGR_SXY(Y, X)DOUBLE PRECISIONDOUBLE PRECISIONSum (X*Y) – sum (X) * sum (Y) / N (“sum of products” of independent times dependent variable)
REGR_SYY(Y, X)DOUBLE PRECISIONDOUBLE PRECISIONSum (Y2) – sum (Y) 2 / N (“sum of squares” of the dependent variable)
STDDEV(expression)INTEGER, REAL, DOUBLE PRECISION, NUMBERDOUBLE PRECISION for floating-point arguments, otherwise NUMBERHistoric alias for STDDEV_SAMP
STDDEV_POP(expression)INTEGER, REAL, DOUBLE PRECISION, NUMBERDOUBLE PRECISION for floating-point arguments, otherwise NUMBERPopulation standard deviation of the input values
STDDEV_SAMP(expression)INTEGER, REAL, DOUBLE PRECISION, NUMBERDOUBLE PRECISION for floating-point arguments, otherwise NUMBERSample standard deviation of the input values
VARIANCE(expression)INTEGER, REAL, DOUBLE PRECISION, NUMBERDOUBLE PRECISION for floating-point arguments, otherwise NUMBERHistorical alias for VAR_SAMP
VAR_POP(expression)INTEGER, REAL, DOUBLE PRECISION, NUMBERDOUBLE PRECISION for floating-point arguments, otherwise NUMBERPopulation variance of the input values (square of the population standard deviation)
VAR_SAMP(expression)INTEGER, REAL, DOUBLE PRECISION, NUMBERDOUBLE PRECISION for floating-point arguments, otherwise NUMBERSample variance of the input values (square of the sample standard deviation)

LISTAGG

Advanced Server has added the LISTAGG function to support string aggregation. LISTAGG is an aggregate function that concatenates data from multiple rows into a single row in an ordered manner. You can optionally include a custom delimiter for your data.

The LISTAGG function mandates the use of an ORDER BY clause under a WITHIN GROUP clause to concatenate values of the measure column, and then generate the ordered aggregated data.

Objective

  • LISTAGG can be used without any grouping. In this case, the LISTAGG function operates on all rows in a table and returns a single row.
  • LISTAGG can be used with the GROUP BY clause. In this case, the LISTAGG function operates on each group and returns an aggregated output for each group.
  • LISTAGG can be used with the OVER clause. In this case, the LISTAGG function partitions a query result set into groups based on the expression in the query_partition_by_clause and then aggregates data in each group.

Synopsis

LISTAGG( <measure_expr> [, <delimiter> ]) WITHIN GROUP( <order_by_clause> )
[ OVER <query_partition_by_clause> ]

Parameters

measure_expr

measure_expr (mandatory) specifies the column or expression that assigns a value to aggregate. NULL values are ignored.

delimiter

delimiter (optional) specifies a string that separates the concatenated values in the result row. The delimiter can be a NULL value, string, character literal, column name, or constant expression. If ignored, the LISTAGG function uses a NULL value by default.

order_by_clause

order_by_clause (mandatory) determines the sort order in which the concatenated values are returned.

query_partition_by_clause

query_partition_by_clause (optional) allows LISTAGG function to be used as an analytic function and sets the range of records for each group in the OVER clause.

Return Type

The LISTAGG function returns a string value.

Examples

The following example concatenates the values in the emp table and lists all the employees separated by a delimiter comma.

First, create a table named emp and then insert records into the emp table.

edb=# CREATE TABLE emp
edb-#        (EMPNO NUMBER(4) NOT NULL,
edb(#         ENAME VARCHAR2(10),
edb(#         JOB VARCHAR2(9),
edb(#         MGR NUMBER(4),
edb(#         HIREDATE DATE,
edb(#         SAL NUMBER(7, 2),
edb(#         COMM NUMBER(7, 2),
edb(#         DEPTNO NUMBER(2));
CREATE TABLE
edb=# INSERT INTO emp VALUES
edb-#         (7499, 'ALLEN',  'SALESMAN',  7698,
edb(#         TO_DATE('20-FEB-1981', 'DD-MON-YYYY'), 1600,  300, 30);
INSERT 0 1
edb=# INSERT INTO emp VALUES
edb-#         (7521, 'WARD',   'SALESMAN',  7698,
edb(#         TO_DATE('22-FEB-1981', 'DD-MON-YYYY'), 1250,  500, 30);
INSERT 0 1
edb=# INSERT INTO emp VALUES
edb-#         (7566, 'JONES',  'MANAGER',   7839,
edb(#         TO_DATE('2-APR-1981', 'DD-MON-YYYY'),  2975, NULL, 20);
INSERT 0 1
edb=# INSERT INTO emp VALUES
edb-#         (7654, 'MARTIN', 'SALESMAN',  7698,
edb(#         TO_DATE('28-SEP-1981', 'DD-MON-YYYY'), 1250, 1400, 30);
INSERT 0 1
edb=# INSERT INTO emp VALUES
edb-#         (7698, 'BLAKE',  'MANAGER',   7839,
edb(#         TO_DATE('1-MAY-1981', 'DD-MON-YYYY'),  2850, NULL, 30);
INSERT 0 1

edb=# SELECT LISTAGG(ENAME, ',') WITHIN GROUP (ORDER BY ENAME) FROM emp;
               listagg
-------------------------------------
 ALLEN,BLAKE,JONES,MARTIN,WARD
(1 row)

The following example uses PARTITION BY clause with LISTAGG in emp table and generates output based on a partition by deptno that applies to each partition and not on the entire table.

edb=# SELECT DISTINCT DEPTNO, LISTAGG(ENAME, ',') WITHIN GROUP (ORDER BY
ENAME) OVER(PARTITION BY DEPTNO) FROM emp;
 deptno |         listagg
--------+-------------------------
     30 | ALLEN,BLAKE,MARTIN,WARD
     20 | JONES
(2 rows)

The following example is identical to the previous example, except it includes the GROUP BY clause.

edb=# SELECT DEPTNO, LISTAGG(ENAME, ',') WITHIN GROUP (ORDER BY ENAME) FROM
emp GROUP BY DEPTNO;
 deptno |         listagg
--------+-------------------------
     20 | JONES
     30 | ALLEN,BLAKE,MARTIN,WARD
(2 rows)

MEDIAN

The MEDIAN function calculates the middle value of an expression from a given range of values; NULL values are ignored. The MEDIAN function returns an error if a query does not reference the user-defined table.

Objective

  • MEDIAN can be used without any grouping. In this case, the MEDIAN function operates on all rows in a table and returns a single row.
  • MEDIAN can be used with the OVER clause. In this case, the MEDIAN function partitions a query result set into groups based on the expression specified in the PARTITION BY clause and then aggregates data in each group.

Synopsis

MEDIAN( <median_expression> ) [ OVER ( [ PARTITION BY... ] ) ]

Parameters

median_expression

median_expression (mandatory) is a target column or expression that the MEDIAN function operates on and returns a median value. It can be a numeric, datetime, or interval data type.

PARTITION BY

PARTITION BY clause (optional) allows a MEDIAN function to be used as an analytic function and sets the range of records for each group in the OVER clause.

Return Types

The return type is determined by the input data type of expression. The following table illustrates the return type for each input type.

Input TypeReturn Type
NUMERICNUMERIC
FLOAT, DOUBLE PRECISIONDOUBLE PRECISION
REALREAL
INTERVALINTERVAL
DATEDATE
TIMESTAMPTIMESTAMP
TIMESTAMPTZTIMESTAMPTZ

Examples

In the following example, a query returns the median salary for each department in the emp table:

edb=# SELECT * FROM emp;
 empno| ename |   job   | mgr  |      hiredate     |   sal  | comm   | deptno
 -----+-------+---------+------+-------------------+--------+--------+-------
 7369 | SMITH | CLERK   | 7902 | 17-DEC-80 00:00:00|  800.00|        |   20
 7499 | ALLEN | SALESMAN| 7698 | 20-FEB-81 00:00:00| 1600.00| 300.0  |   30
 7521 | WARD  | SALESMAN| 7698 | 22-FEB-81 00:00:00| 1250.00| 500.00 |   30
 7566 | JONES | MANAGER | 7839 | 02-APR-81 00:00:00| 2975.00|        |   20
 7654 | MARTIN| SALESMAN| 7698 | 28-SEP-81 00:00:00| 1250.00| 1400.00|   30
(5 rows)

 edb=# SELECT MEDIAN (SAL) FROM emp;
  median
 --------
    1250
 (1 row)

The following example uses PARTITION BY clause with MEDIAN in emp table and returns the median salary based on a partition by deptno:

edb=# SELECT EMPNO, ENAME, DEPTNO, MEDIAN (SAL) OVER (PARTITION BY DEPTNO)
FROM emp;
 empno | ename  | deptno | median
-------+--------+--------+--------
  7369 | SMITH  |     20 | 1887.5
  7566 | JONES  |     20 | 1887.5
  7499 | ALLEN  |     30 |   1250
  7521 | WARD   |     30 |   1250
  7654 | MARTIN |     30 |   1250
(5 rows)

The MEDIAN function can be compared with PERCENTILE_CONT. In the following example, MEDIAN generates the same result as PERCENTILE_CONT:

edb=# SELECT MEDIAN (SAL), PERCENTILE_CONT(0.5) WITHIN GROUP(ORDER BY SAL)
FROM emp;
 median | percentile_cont
--------+-----------------
   1250 |            1250
(1 row)