This article is the first in a series about migrating from Oracle to PostgreSQL. We will attempt to cover each of the topics of migration in some amount of depth and provide multiple options to the same issues and architectures.
For this first article, we’ll take a look at Oracles’ ROWNUM and ROWID.
In Oracle, the ROWNUM is assigned to each row of a set as the records are returned. It is an integer data type that starts at 1, with monotonically increasing numbers.
This pseudo column applies to the entire row, regardless of the origin of the row. That is, multiple tables joined together in a single result set will only provide a single ROWNUM. Changing the order of the data also changes the association of the ROWNUM with the data.
There are several cases where ROWNUM is used. Let’s start with a simple Oracle example.
SELECT * FROM mytable
WHERE ROWNUM <= 5
Which of course will return the first 5 rows in random order. PostgreSQL can solve this problem in several ways.
Consider using the LIMIT operator like so:
SELECT *
FROM mytable
LIMIT 5
This has an identical effect and a very similar EXPLAIN plan to the Oracle offering. This solution can also be used with the OFFSET keyword. Together these two terms provide efficient pagination through the rows.
You could also use the SQL standard way of doing things.
SELECT *
FROM mytable
FETCH FIRST 5 ROWS ONLY
This statement would work on any standards compatible SQL engine, and removes any implementation ambiguity.
That solved the trivial example, but let’s dig a bit deeper. Let’s assume that the ROWNUM value is actually meaningful, and find some ways of working with it.
We could add a rownum to an output by creating a sequence, and then selecting from it on the fly.
CREATE SEQUENCE monotonic_iterator;
SELECT nextval('monotonic_iterator') rownum, *
FROM mytable;
DROP SEQUENCE monotonic_iterator;
However, we run into several problems when trying to use this sequence. For starters, the rownum is not accessible to the WHERE clause.
SELECT nextval('monotonic_iterator') rownum, *
FROM mytable
WHERE rownum %2=0;
ERROR: column "rownum" does not exist
When we try to get something useful out of that, we run into a strange anomaly.
SELECT nextval('monotonic_iterator') rownum, *
FROM mytable
WHERE mod(nextval('monotonic_iterator'),1)=0;
rownum | x
--------+------
127 | 1000
129 | 1001
131 | 1002
133 | 1003
135 | 1004
137 | 1005
139 | 1006
141 | 1007
143 | 1008
145 | 1009
147 | 1010
149 | 1011
151 | 1012
153 | 1013
155 | 1014
157 | 1015
159 | 1016
161 | 1017
163 | 1018
165 | 1019
167 | 1020
So, when we call the nextval() function repeatedly, it does not refer to the first invocation. Instead, it polls the sequence value again. This is obviously not what we intended.
Also, the sequence might have already existed under that name, so creating and destroying it would require some naming gymnastics.
In addition to that, the ROWNUM can’t be used in a LIMIT clause.
So, this method does not prove to be very useful after all, or at least comes with some large caveats.
We can add the row number to a series with an aggregate function like this:
SELECT row_number() OVER () ROWNUM, *
FROM mytable
This gives us a simple way to add the desired rownum, without invoking some silly counter of all the previous rows, or drawing from a sequence.
This eliminates the first problem, but does not get rid of the issue that WHERE and LIMIT are not useful.
Suppose for a moment that you would like to retrieve the even numbered rows for a sampling. This would give the ROWNUM a bit more meaning than a simple row limiter.
We can gain access to the rownum by wrapping the query as a sub query:
SELECT rownum, x
FROM (SELECT row_number() OVER () rownum, *
FROM mytable ) b
WHERE rownum%2=0;
rownum | x
--------+------
2 | 1001
4 | 1003
6 | 1005
8 | 1007
10 | 1009
12 | 1011
14 | 1013
16 | 1015
18 | 1017
20 | 1019
(10 rows)
This gives the expected result. The LIMIT clause may then be placed in the sub query, or the outer query, depending on the desired results.
SELECT rownum, x
FROM (SELECT row_number() OVER () rownum, *
FROM mytable ) b
WHERE rownum < 5;
rownum | x
--------+------
1 | 1000
2 | 1001
3 | 1002
4 | 1003
In most cases, you will want to limit the results as quickly as possible. This has to do with PostgreSQL efficiency, but that my friends, is a story for another column.
Of course Oracle implements some version of the same aggregate functions, and you could do approximately the same thing there also.
But “what about ROWID?” you ask. Well, that’s different. ROWID is an indicator in Oracle of the order of rows on disk for a given table.
In PostgreSQL, this is implemented with a page and leaf identifier. The identifier is visible in a query as a pseudo column with the name of “ctid”. You can call this column in a query explicitly by name.
SELECT ctid, xmin, xmax, *
FROM mytable
This will give you the approximate location on disk of your data. But be forewarned, the MVCC system of PostgreSQL only guarantees this to be true in your transaction context. If someone modifies that data in a different transaction, the ctid that you received is now obsolete. The data has been modified and moved elsewhere. You should only use this feature for local transactions, for example as a tie breaker in deduplication.
The ctid is not really useful for row ordering, as the VACUUM process may also reposition the row in the table. If your objective is to sort the rows into data entry order, this detail must be maintained separately in the table as user data.
So, now you know the ins and outs of the ROWNUM and ROWID from the PostgreSQL perspective. Please ask any questions about it in the comment section below. The answers are likely to show up as a modification of the article in order for everyone to profit from the knowledge.