Thom Brown Engineering Project Manager Dec 6, 2019
SUMMARY: This article reviews the differences between stored procedures and functions in Postgres and the types of functionality they provide. Topics covered with examples include:
1. Stored procedures
1. pre-Postgres v.11
2. Postgres v.11 and after
2. Aggregate functions
3. Trigger functions
4. Functional stability
5. Function security
1. SECURITY DEFINER
2. SECURITY INVOKER
6. Other function features
In Postgres, the main functional difference between a function and a stored procedure is that a function returns a result, whereas a stored procedure does not. This is because the intention behind a stored procedure is to perform some sort of activity and then finish, which would then return control to the caller. Before PostgreSQL version 11, stored procedures were effectively functions that didn't return data. But now there is a way to explicitly declare stored procedures, which also has the advantage of being able to open a new transaction, and they are now called differently too.
Firstly, let's start with an example of both to get a feel of how they differ.
Before version 11, to declare a stored procedure we would just use a function with a return type of void, because we aren't looking to get any data back:
CREATE FUNCTION deactivate_unpaid_accounts() RETURNS void LANGUAGE SQL AS $$ UPDATE accounts SET active = false WHERE balance < 0; $$;
We could then call it using SELECT:
This would return a NULL void type result.
Beginning with version 11, we can say outright that we are declaring a PROCEDURE:
CREATE PROCEDURE deactivate_unpaid_accounts() LANGUAGE SQL AS $$ UPDATE accounts SET active = false WHERE balance < 0; $$;
This time, we call it using CALL:
This stored procedure does not return any result, although will show any errors that occurred during its invocation.
The syntax for declaring a function is the same in all PostgreSQL versions:
CREATE REPLACE FUNCTION account_type_count(account_type text) RETURNS integer LANGUAGE plpgsql AS $$ DECLARE account_count int; BEGIN SELECT count(*) INTO account_count FROM accounts WHERE accounts.account_type = $1; RETURN account_count; END; $$;
From these examples, you can see that you can use functions to update and retrieve data, or to just perform a procedure, but now we also have procedures, which are designed specifically with performing a set of actions in mind.
Functions come in even more flavors though. We have the regular kind, which are used in queries, but we also have some additional ones, which we'll get to shortly.
Function parameters can be referenced using either positional references or named references. Positional references work by using “$1” to reference the first parameter, then “$2” for the second, and so on. For named references, you can just use the name of the parameter. So if you had a parameter called “my_value,” you could just reference it using either method:
SELECT $1 + 1; SELECT my_value + 1;
This is true for functions written in SQL or PL/pgSQL languages. You can also call the function and specifically set the parameters in whichever order you like using named notation:
SELECT create_user(age => 50, country => 'France', username => 'Henri');
You can set the default value for a parameter if the user doesn't specify it using the DEFAULT keyword and a value of your choosing:
CREATE FUNCTION create_user(username text, country text, age int, active bool DEFAULT true)...
This means we could use named notation, not specifying the "active" parameter, and it would automatically set that value to "true."
There is also the OUT parameter, which is a way to produce an output that returns those fields in the result. And INOUT, which we use to define a parameter that accepts input, but will also be included in the output.
CREATE FUNCTION dog_years(INOUT age int, OUT dog_years int) LANGUAGE SQL AS $$ SELECT age, age * 7; $$; SELECT * FROM dog_years(8); age | dog_years -----+----------- 8 | 56 (1 row)
You may also have a function where you want the user to be able to specify multiple items, but you don't know how many to expect because the number can vary. This is when you can use the VARIADIC parameter, which must always be the last parameter defined, and you define it as an array type:
CREATE FUNCTION create_post(title text, content text, VARIADIC tags text) RETURNS integer LANGUAGE SQL AS $$ INSERT INTO posts (title, content, tags) VALUES (title, content, array_to_string(tags,',')) RETURNING id; $$;
You could then call this function like so:
SELECT create_post('My daily activity','I went to the cinema, got some lunch, caught up with friends and went to a party','cinema','movies','lunch','friends','reunion','party');
Note that the type of each value passed to the VARIADIC parameter has to match the function.
Aggregate functions are functions used by aggregates in order to reach the result required. In short, you would only need such functions if you're creating your own custom aggregates.
For example, if we wanted to create an aggregate to perform conditional counts, we'd need to create a function for our aggregate to use. This function maintains a count, which is passed to the function as the first parameter by the aggregate, and increments it based on the result of an expression passed to the function by the second parameter.
CREATE OR REPLACE FUNCTION countif_add(current_count int, expression bool) RETURNS int AS $BODY$ SELECT CASE expression WHEN true THEN current_count + 1 ELSE current_count END; $BODY$ LANGUAGE SQL IMMUTABLE;
This function isn't very useful by itself, but if we then create an AGGREGATE to use the function, it then becomes useful.
CREATE AGGREGATE count_if (boolean) ( sfunc = countif_add, stype = int, initcond = 0 );
Here we're telling it to use our countif_add function to build up the aggregate function. We can then use it like so:
SELECT count_if(age < 30) FROM users;
Note that we could easily use a FILTER clause with a COUNT() function here instead, but this is just for illustration.
Trigger functions don't contain any code themselves, but instead call functions when the trigger's conditions are met. Trigger functions only return a type of "trigger."
For example, we'll create a function that logs information when a table is changed.
We have a table of user information:
CREATE TABLE users ( id serial PRIMARY KEY, username text NOT NULL, display_name text NOT NULL, tag_line text NULL );
Then a second table to log the values in that table for when they change:
CREATE TABLE user_change_log (entry_time timestamp with time zone, entry users);
Now we'll create a function that inserts changed rows into the “user_change_log” table:
CREATE FUNCTION log_user_change() RETURNS trigger LANGUAGE plpgsql AS $$ BEGIN IF NEW <> OLD THEN INSERT INTO user_change_log(entry_time, entry) VALUES (now(), NEW); END IF; RETURN NEW; END; $$;
Finally, we’ll create a trigger on the “users” table to execute that function when updates are made to the table:
CREATE TRIGGER user_changes AFTER UPDATE ON users FOR EACH ROW EXECUTE PROCEDURE log_user_change();
There are two other types of trigger that use functions too: constraint triggers and event triggers. Constraint triggers work in much the same way as we've just demonstrated, but have additional properties. Event triggers must use functions that return a type of event_trigger, and are fired when certain commands are fired. These functions have additional values associated with the event available.
Functions can have different stabilities, meaning that calling them multiple times with the same inputs can result in different outputs, or that the result can be affected by parameters that can be changed at any point. You can declare the stability on the function by specifying either:
IMMUTABLE — given the same inputs, the result will never change no matter what
STABLE — given the same inputs, the result can change between statements
VOLATILE — given the same inputs, the result can change at any point, and may even make changes to the database to make subsequent call results change
Why would you care about specifying which one of these applies? Well, PostgreSQL's optimizer will assess whether it can cache the result of the function when the call is being made multiple times, which saves the function from being called unnecessarily.
If your function takes two parameters and adds them together, the result will be IMMUTABLE, because nothing can affect the operation, and the planner knows that if it has called the function before with the same parameters, it doesn't need to bother running it again; it just needs to remember what the result was from last time. This can be particularly helpful if a lot of calculations are performed, as it will save a lot of time and improve performance.
However, if your result depends on parameters such as timezone, or use functions that are themselves STABLE or VOLATILE, then your function has to match that same stability level.
You can declare the stability on your function like this:
CREATE FUNCTION add_numbers (first_number integer, second_number integer) RETURNS integer LANGUAGE SQL IMMUTABLE AS $$ SELECT $1 + $2; $$;
A function can be run either with the privileges of the user that is calling the function, or the privileges of the user that owns the function. This can be very useful if you want a user to be able to do something using a function, but only in a very controlled way.
Using the example above, you would just add SECURITY DEFINER after the word "IMMUTABLE" (or instead of, if you don't care about the stability). This would allow users to run the function, and the function would then perform actions as the owner of the function, who may have privileges the calling user doesn't have. But if you only want it to run as the user calling the function, you would instead use SECURITY INVOKER.
Other function features
Functions can have many more options, including whether it's a WINDOW function (only currently useful for C language functions), whether it's parallel-safe, the cost of running the function for the executor to bear in mind, the configuration parameters specifically for this function (using SET to set), and various others.
As the Engineering Project Manager, Thom coordinates and schedules the project work of the engineering team, manages any escalated issues related to community PostgreSQL and the EDB Postgres™ Advanced Server, and liaises with the support, release, test, and change management teams. His activities in the PostgreSQL community include responding to user questions relating to feature use, configuration and performance, testing new feature patches, and providing input to development discussions.