Oracle to PostgreSQL: ROWNUM and ROWID

May 10, 2020

This article is the first in a series about migrating from Oracle to PostgreSQL. We will attempt to cover each of the topics of migration in some amount of depth and provide multiple options to the same issues and architectures.

For this first article, we’ll take a look at Oracles’ ROWNUM and ROWID.

In Oracle, the ROWNUM is assigned to each row of a set as the records are returned. It is an integer data type that starts at 1, with monotonically increasing numbers.

This pseudo column applies to the entire row, regardless of the origin of the row. That is, multiple tables joined together in a single result set will only provide a single ROWNUM. Changing the order of the data also changes the association of the ROWNUM with the data.

There are several cases where ROWNUM is used. Let’s start with a simple Oracle example.

SELECT * FROM mytable
WHERE ROWNUM <= 5

Which of course will return the first 5 rows in random order. PostgreSQL can solve this problem in several ways.

Consider using the LIMIT operator like so:

SELECT *
FROM mytable
LIMIT 5

This has an identical effect and a very similar EXPLAIN plan to the Oracle offering. This solution can also be used with the OFFSET keyword. Together these two terms provide efficient pagination through the rows.

You could also use the SQL standard way of doing things.

SELECT *
FROM mytable
FETCH FIRST 5 ROWS ONLY

This statement would work on any standards compatible SQL engine, and removes any implementation ambiguity.

That solved the trivial example, but let’s dig a bit deeper. Let’s assume that the ROWNUM value is actually meaningful, and find some ways of working with it.

We could add a rownum to an output by creating a sequence, and then selecting from it on the fly.

CREATE SEQUENCE monotonic_iterator;

SELECT nextval('monotonic_iterator') rownum, *
FROM mytable;

DROP SEQUENCE monotonic_iterator;

However, we run into several problems when trying to use this sequence. For starters, the rownum is not accessible to the WHERE clause.

SELECT  nextval('monotonic_iterator') rownum, *
FROM mytable
WHERE rownum %2=0;

ERROR:  column "rownum" does not exist

When we try to get something useful out of that, we run into a strange anomaly.

SELECT  nextval('monotonic_iterator') rownum, *
FROM mytable
WHERE mod(nextval('monotonic_iterator'),1)=0;

 rownum |  x
--------+------
    127 | 1000
    129 | 1001
    131 | 1002
    133 | 1003
    135 | 1004
    137 | 1005
    139 | 1006
    141 | 1007
    143 | 1008
    145 | 1009
    147 | 1010
    149 | 1011
    151 | 1012
    153 | 1013
    155 | 1014
    157 | 1015
    159 | 1016
    161 | 1017
    163 | 1018
    165 | 1019
    167 | 1020

So, when we call the nextval() function repeatedly, it does not refer to the first invocation. Instead, it polls the sequence value again. This is obviously not what we intended.

Also, the sequence might have already existed under that name, so creating and destroying it would require some naming gymnastics.

In addition to that, the ROWNUM can’t be used in a LIMIT clause.

So, this method does not prove to be very useful after all, or at least comes with some large caveats.

We can add the row number to a series with an aggregate function like this:

SELECT row_number() OVER () ROWNUM, *
FROM mytable

This gives us a simple way to add the desired rownum, without invoking some silly counter of all the previous rows, or drawing from a sequence.

This eliminates the first problem, but does not get rid of the issue that WHERE and LIMIT are not useful.

Suppose for a moment that you would like to retrieve the even numbered rows for a sampling. This would give the ROWNUM a bit more meaning than a simple row limiter.

We can gain access to the rownum by wrapping the query as a sub query:

SELECT rownum, x
FROM (SELECT  row_number() OVER () rownum, *
    FROM mytable ) b
WHERE rownum%2=0;

 rownum |  x
--------+------
      2 | 1001
      4 | 1003
      6 | 1005
      8 | 1007
     10 | 1009
     12 | 1011
     14 | 1013
     16 | 1015
     18 | 1017
     20 | 1019
(10 rows)

This gives the expected result. The LIMIT clause may then be placed in the sub query, or the outer query, depending on the desired results.

SELECT rownum, x
FROM (SELECT  row_number() OVER () rownum, *
    FROM mytable ) b
WHERE rownum < 5;

 rownum |  x
--------+------
      1 | 1000
      2 | 1001
      3 | 1002
      4 | 1003

In most cases, you will want to limit the results as quickly as possible. This has to do with PostgreSQL efficiency, but that my friends, is a story for another column.

Of course Oracle implements some version of the same aggregate functions, and you could do approximately the same thing there also.

But “what about ROWID?” you ask. Well, that’s different. ROWID is an indicator in Oracle of the order of rows on disk for a given table.

In PostgreSQL, this is implemented with a page and leaf identifier. The identifier is visible in a query as a pseudo column with the name of “ctid”. You can call this column in a query explicitly by name.

SELECT ctid, xmin, xmax, *
FROM mytable

This will give you the approximate location on disk of your data. But be forewarned, the MVCC system of PostgreSQL only guarantees this to be true in your transaction context. If someone modifies that data in a different transaction, the ctid that you received is now obsolete. The data has been modified and moved elsewhere. You should only use this feature for local transactions, for example as a tie breaker in deduplication.

The ctid is not really useful for row ordering, as the VACUUM process may also reposition the row in the table. If your objective is to sort the rows into data entry order, this detail must be maintained separately in the table as user data.

So, now you know the ins and outs of the ROWNUM and ROWID from the PostgreSQL perspective. Please ask any questions about it in the comment section below. The answers are likely to show up as a modification of the article in order for everyone to profit from the knowledge.

Share this

More Blogs