Aggregate Functions v13
Aggregate functions compute a single result value from a set of input values. The built-in aggregate functions are listed in the following tables.
| Function | Argument Type | Return Type | Description |
|---|---|---|---|
AVG (expression) | INTEGER, REAL, DOUBLE PRECISION, NUMBER | NUMBER for any integer type, DOUBLE PRECISION for a floating-point argument, otherwise the same as the argument data type | The average (arithmetic mean) of all input values |
COUNT(*) | BIGINT | Number of input rows | |
COUNT (expression) | Any | BIGINT | Number of input rows for which the value of expression is not null |
MAX (expression) | Any numeric, string, date/time, or bytea type | Same as argument type | Maximum value of expression across all input values |
MIN (expression) | Any numeric, string, date/time, or bytea type | Same as argument type | Minimum value of expression across all input values |
SUM (expression) | INTEGER, REAL, DOUBLE PRECISION, NUMBER | BIGINT for SMALLINT or INTEGER arguments, NUMBER for BIGINT arguments, DOUBLE PRECISION for floating-point arguments, otherwise the same as the argument data type | Sum of expression across all input values |
It should be noted that except for COUNT, these functions return a null value when no rows are selected. In particular, SUM of no rows returns null, not zero as one might expect. The COALESCE function may be used to substitute zero for null when necessary.
The following table shows the aggregate functions typically used in statistical analysis. (These are separated out merely to avoid cluttering the listing of more-commonly-used aggregates.) Where the description mentions N, it means the number of input rows for which all the input expressions are non-null. In all cases, null is returned if the computation is meaningless, for example when N is zero.
| Function | Argument Type | Return Type | Description |
|---|---|---|---|
CORR(Y, X) | DOUBLE PRECISION | DOUBLE PRECISION | Correlation coefficient |
COVAR_POP(Y, X) | DOUBLE PRECISION | DOUBLE PRECISION | Population covariance |
COVAR_SAMP(Y, X) | DOUBLE PRECISION | DOUBLE PRECISION | Sample covariance |
REGR_AVGX(Y, X) | DOUBLE PRECISION | DOUBLE PRECISION | Average of the independent variable (sum(X) / N) |
REGR_AVGY(Y, X) | DOUBLE PRECISION | DOUBLE PRECISION | Average of the dependent variable (sum(Y) / N) |
REGR_COUNT(Y, X) | DOUBLE PRECISION | DOUBLE PRECISION | Number of input rows in which both expressions are nonnull |
REGR_INTERCEPT(Y, X) | DOUBLE PRECISION | DOUBLE PRECISION | y-intercept of the least-squares-fit linear equation determined by the (X, Y) pairs |
REGR_R2(Y, X) | DOUBLE PRECISION | DOUBLE PRECISION | Square of the correlation coefficient |
REGR_SLOPE(Y, X) | DOUBLE PRECISION | DOUBLE PRECISION | Slope of the least-squares-fit linear equation determined by the (X, Y) pairs |
REGR_SXX(Y, X) | DOUBLE PRECISION | DOUBLE PRECISION | Sum (X2) – sum (X) 2 / N (“sum of squares” of the independent variable) |
REGR_SXY(Y, X) | DOUBLE PRECISION | DOUBLE PRECISION | Sum (X*Y) – sum (X) * sum (Y) / N (“sum of products” of independent times dependent variable) |
REGR_SYY(Y, X) | DOUBLE PRECISION | DOUBLE PRECISION | Sum (Y2) – sum (Y) 2 / N (“sum of squares” of the dependent variable) |
STDDEV(expression) | INTEGER, REAL, DOUBLE PRECISION, NUMBER | DOUBLE PRECISION for floating-point arguments, otherwise NUMBER | Historic alias for STDDEV_SAMP |
STDDEV_POP(expression) | INTEGER, REAL, DOUBLE PRECISION, NUMBER | DOUBLE PRECISION for floating-point arguments, otherwise NUMBER | Population standard deviation of the input values |
STDDEV_SAMP(expression) | INTEGER, REAL, DOUBLE PRECISION, NUMBER | DOUBLE PRECISION for floating-point arguments, otherwise NUMBER | Sample standard deviation of the input values |
VARIANCE(expression) | INTEGER, REAL, DOUBLE PRECISION, NUMBER | DOUBLE PRECISION for floating-point arguments, otherwise NUMBER | Historical alias for VAR_SAMP |
VAR_POP(expression) | INTEGER, REAL, DOUBLE PRECISION, NUMBER | DOUBLE PRECISION for floating-point arguments, otherwise NUMBER | Population variance of the input values (square of the population standard deviation) |
VAR_SAMP(expression) | INTEGER, REAL, DOUBLE PRECISION, NUMBER | DOUBLE PRECISION for floating-point arguments, otherwise NUMBER | Sample variance of the input values (square of the sample standard deviation) |
LISTAGG
Advanced Server has added the LISTAGG function to support string aggregation. LISTAGG is an aggregate function that concatenates data from multiple rows into a single row in an ordered manner. You can optionally include a custom delimiter for your data.
The LISTAGG function mandates the use of an ORDER BY clause under a WITHIN GROUP clause to concatenate values of the measure column, and then generate the ordered aggregated data.
Objective
LISTAGGcan be used without any grouping. In this case, theLISTAGGfunction operates on all rows in a table and returns a single row.LISTAGGcan be used with theGROUP BYclause. In this case, theLISTAGGfunction operates on each group and returns an aggregated output for each group.LISTAGGcan be used with theOVERclause. In this case, theLISTAGGfunction partitions a query result set into groups based on the expression in thequery_partition_by_clauseand then aggregates data in each group.
Synopsis
LISTAGG( <measure_expr> [, <delimiter> ]) WITHIN GROUP( <order_by_clause> ) [ OVER <query_partition_by_clause> ]
Parameters
measure_expr
measure_expr (mandatory) specifies the column or expression that assigns a value to aggregate. NULL values are ignored.
delimiter
delimiter (optional) specifies a string that separates the concatenated values in the result row. The delimiter can be a NULL value, string, character literal, column name, or constant expression. If ignored, the LISTAGG function uses a NULL value by default.
order_by_clause
order_by_clause (mandatory) determines the sort order in which the concatenated values are returned.
query_partition_by_clause
query_partition_by_clause (optional) allows LISTAGG function to be used as an analytic function and sets the range of records for each group in the OVER clause.
Return Type
The LISTAGG function returns a string value.
Examples
The following example concatenates the values in the emp table and lists all the employees separated by a delimiter comma.
First, create a table named emp and then insert records into the emp table.
edb=# CREATE TABLE emp edb-# (EMPNO NUMBER(4) NOT NULL, edb(# ENAME VARCHAR2(10), edb(# JOB VARCHAR2(9), edb(# MGR NUMBER(4), edb(# HIREDATE DATE, edb(# SAL NUMBER(7, 2), edb(# COMM NUMBER(7, 2), edb(# DEPTNO NUMBER(2)); CREATE TABLE
edb=# INSERT INTO emp VALUES
edb-# (7499, 'ALLEN', 'SALESMAN', 7698,
edb(# TO_DATE('20-FEB-1981', 'DD-MON-YYYY'), 1600, 300, 30);
INSERT 0 1
edb=# INSERT INTO emp VALUES
edb-# (7521, 'WARD', 'SALESMAN', 7698,
edb(# TO_DATE('22-FEB-1981', 'DD-MON-YYYY'), 1250, 500, 30);
INSERT 0 1
edb=# INSERT INTO emp VALUES
edb-# (7566, 'JONES', 'MANAGER', 7839,
edb(# TO_DATE('2-APR-1981', 'DD-MON-YYYY'), 2975, NULL, 20);
INSERT 0 1
edb=# INSERT INTO emp VALUES
edb-# (7654, 'MARTIN', 'SALESMAN', 7698,
edb(# TO_DATE('28-SEP-1981', 'DD-MON-YYYY'), 1250, 1400, 30);
INSERT 0 1
edb=# INSERT INTO emp VALUES
edb-# (7698, 'BLAKE', 'MANAGER', 7839,
edb(# TO_DATE('1-MAY-1981', 'DD-MON-YYYY'), 2850, NULL, 30);
INSERT 0 1
edb=# SELECT LISTAGG(ENAME, ',') WITHIN GROUP (ORDER BY ENAME) FROM emp;
listagg
-------------------------------------
ALLEN,BLAKE,JONES,MARTIN,WARD
(1 row)The following example uses PARTITION BY clause with LISTAGG in emp table and generates output based on a partition by deptno that applies to each partition and not on the entire table.
edb=# SELECT DISTINCT DEPTNO, LISTAGG(ENAME, ',') WITHIN GROUP (ORDER BY
ENAME) OVER(PARTITION BY DEPTNO) FROM emp;
deptno | listagg
--------+-------------------------
30 | ALLEN,BLAKE,MARTIN,WARD
20 | JONES
(2 rows)The following example is identical to the previous example, except it includes the GROUP BY clause.
edb=# SELECT DEPTNO, LISTAGG(ENAME, ',') WITHIN GROUP (ORDER BY ENAME) FROM
emp GROUP BY DEPTNO;
deptno | listagg
--------+-------------------------
20 | JONES
30 | ALLEN,BLAKE,MARTIN,WARD
(2 rows)MEDIAN
The MEDIAN function calculates the middle value of an expression from a given range of values; NULL values are ignored. The MEDIAN function returns an error if a query does not reference the user-defined table.
Objective
MEDIANcan be used without any grouping. In this case, theMEDIANfunction operates on all rows in a table and returns a single row.MEDIANcan be used with theOVERclause. In this case, theMEDIANfunction partitions a query result set into groups based on theexpressionspecified in thePARTITION BYclause and then aggregates data in each group.
Synopsis
MEDIAN( <median_expression> ) [ OVER ( [ PARTITION BY... ] ) ]
Parameters
median_expression
median_expression (mandatory) is a target column or expression that the MEDIAN function operates on and returns a median value. It can be a numeric, datetime, or interval data type.
PARTITION BY
PARTITION BY clause (optional) allows a MEDIAN function to be used as an analytic function and sets the range of records for each group in the OVER clause.
Return Types
The return type is determined by the input data type of expression. The following table illustrates the return type for each input type.
| Input Type | Return Type |
|---|---|
BIGINT | NUMERIC |
FLOAT, DOUBLE PRECISION | DOUBLE PRECISION |
INTEGER | NUMERIC |
INTERVAL | INTERVAL |
NUMERIC | NUMERIC |
REAL | REAL |
SMALLINT | NUMERIC |
TIMESTAMP | TIMESTAMP |
TIMESTAMPTZ | TIMESTAMPTZ |
Examples
In the following example, a query returns the median salary for each department in the emp table:
edb=# SELECT * FROM emp;
empno| ename | job | mgr | hiredate | sal | comm | deptno
-----+-------+---------+------+-------------------+--------+--------+-------
7369 | SMITH | CLERK | 7902 | 17-DEC-80 00:00:00| 800.00| | 20
7499 | ALLEN | SALESMAN| 7698 | 20-FEB-81 00:00:00| 1600.00| 300.0 | 30
7521 | WARD | SALESMAN| 7698 | 22-FEB-81 00:00:00| 1250.00| 500.00 | 30
7566 | JONES | MANAGER | 7839 | 02-APR-81 00:00:00| 2975.00| | 20
7654 | MARTIN| SALESMAN| 7698 | 28-SEP-81 00:00:00| 1250.00| 1400.00| 30
(5 rows)
edb=# SELECT MEDIAN (SAL) FROM emp;
median
--------
1250
(1 row)The following example uses PARTITION BY clause with MEDIAN in emp table and returns the median salary based on a partition by deptno:
edb=# SELECT EMPNO, ENAME, DEPTNO, MEDIAN (SAL) OVER (PARTITION BY DEPTNO) FROM emp; empno | ename | deptno | median -------+--------+--------+-------- 7369 | SMITH | 20 | 1887.5 7566 | JONES | 20 | 1887.5 7499 | ALLEN | 30 | 1250 7521 | WARD | 30 | 1250 7654 | MARTIN | 30 | 1250 (5 rows)
The MEDIAN function can be compared with PERCENTILE_CONT. In the following example, MEDIAN generates the same result as PERCENTILE_CONT:
edb=# SELECT MEDIAN (SAL), PERCENTILE_CONT(0.5) WITHIN GROUP(ORDER BY SAL) FROM emp; median | percentile_cont --------+----------------- 1250 | 1250 (1 row)
STATS_MODE
The STATS_MODE function takes a set of values as an argument and returns the value that occurs with the highest frequency. If multiple values are appearing with the same frequency, the STATS_MODE function arbitrarily chooses the first value and returns only that one value.
Objective
STATS_MODEfunction can be used without any grouping. In this case, theSTATS_MODEfunction operates on all the rows in a table and returns a single value.STATS_MODEcan be used as an ordered-set aggregate function using theWITHIN GROUPclause. In this case, theSTATS_MODEfunction operates on the ordered data set.STATS_MODEcan be used with theGROUP BYclause. In this case, theSTATS_MODEfunction operates on each group and returns the most frequent and aggregated output for each group.
Synopsis
STATS_MODE( <expr> )
OR
STATS_MODE() WITHIN GROUP ( ORDER BY sort_expression )
Parameters
expr
An expression or value to assign to the column.
Return Type
The STATS_MODE function returns a value that appears frequently. However, if all the values of a column are NULL, the STATS_MODE returns NULL.
Examples
The following example returns the mode of salary in the emp table:
edb=# SELECT * FROM emp; empno| ename | job | mgr | hiredate | sal | comm | deptno ------+-------+---------+------+-------------------+--------+--------+------- 7369 | SMITH | CLERK | 7902 | 17-DEC-80 00:00:00| 800.00| | 20 7499 | ALLEN | SALESMAN| 7698 | 20-FEB-81 00:00:00| 1600.00| 300.0 | 30 7521 | WARD | SALESMAN| 7698 | 22-FEB-81 00:00:00| 1250.00| 500.00 | 30 7566 | JONES | MANAGER | 7839 | 02-APR-81 00:00:00| 2975.00| | 20 7654 | MARTIN| SALESMAN| 7698 | 28-SEP-81 00:00:00| 1250.00| 1400.00| 30 (5 rows) edb=# SELECT STATS_MODE(SAL) FROM emp; stats_mode ------------ 1250.00 (1 row)
The following example uses GROUP BY and ORDER BY clause with STATS_MODE in emp table and returns the salary based on a partition by deptno:
edb=# SELECT STATS_MODE(SAL) FROM emp GROUP BY DEPTNO ORDER BY DEPTNO; stats_mode ------------ 800.00 1250.00 (2 rows)
The following example uses the WITHIN GROUP clause with the STATS_MODE function to perform aggregation on the ordered data set.
SELECT STATS_MODE() WITHIN GROUP(ORDER BY SAL) FROM emp; stats_mode ------------ 1250.00 (1 row)
- On this page
- LISTAGG
- MEDIAN
- STATS_MODE