I’m going to demonstrate how it is possible to use dblink in Greenplum 4.0.4.0
What’s dblink?
——————
dblink
is a PostgreSQL contrib module that allows to execute queries on another database.
Current PostgreSQL’s architeture requires users to connect to a specific database on a server. Therefore, it is not possible to natively perform an SQL query on a different database.
Normally we discourage using dblink
to query remote databases.
Databases are self-contained objects in PostgreSQL, and they should be designed with that in mind.
We always advice users to work with schemas. But sometimes, this might not be enough.
Anyway, I’m writing this article to show you how PostgreSQL and Greenplum are related.
Requisities
————–
You need a working copy of Greenplum 4.0.4 and a directory containing the PostgreSQL 8.2 source code, I’ve tested it on a Linux operating system (CentOS 5 to be exact).
For this example, I assume that Greenplum is installed in /usr/local/greenplum-db-4.0.4.0
and PostgreSQL source directory is /home/gpadmin/postgresql-8.2.22
.
I’m using PostgreSQL 8.2 because Greenplum is based on fork on that version.
You can obtain PostgreSQL 8.2 source from http://www.postgresql.org/ftp/source/v8.2.22/
or directly from git, please refer to http://wiki.postgresql.org/wiki/Working\_with\_git
for instruction on how to use git with PostgreSQL.
Installing dblink
——————-
Installing dblink is as easy as installing a contrib module in PostgreSQL.
* as gpadmin
user, source the greenplum_path.sh
file:
$ source /usr/local/greenplum-db-4.0.4.0/greenplum_path.sh
You should have this line in your .bashrc file after the installation process.
* Check that pg_config --pgxs
is the Greenplum one:
$ pg_config --pgxs
/usr/local/greenplum-db-4.0.4.0/lib/postgresql/pgxs/src/makefiles/pgxs.mk
* Enter the dblink directory:
$ cd /home/gpadmin/postgresql-8.2.22/contrib/dblink/
* Edit the Makefile
and add -w
to PG_CPPFLAGS
, so that line number four looks like this:
PG_CPPFLAGS = -I$(libpq_srcdir) -w
This option tells gcc
to ignore all warnings (that’s a _dirty_ hack but mandatory).
* Install the module directly in a database:
$ createdb my_db
$ make USE_PGXS=1 install
$ psql -f dblink.sql my_database
At this time, my_database
contains dblink
, so it is able to execute queries in remote databases.
Using dblink
—————
It is now time to test it, executing some queries.
* Create a database to query, with an example table:
$ createdb my_other_db
$ psql -c "CREATE TABLE t AS SELECT generate_series(1,1000) AS v" my_other_db
* Create the dblink connection to that database:
$ psql my_db
my_db=# SELECT dblink_connect('myconn', 'dbname=my_other_db');
dblink_connect
-------------------
OK
(1 row)
* Execute queries using that connection:
$ psql my_db
my_db=# SELECT * FROM dblink( 'myconn', 'SELECT v FROM t')
AS t1 ( v INTEGER ) LIMIT 5;
Please note that you must specify the expected set of columns in the calling query, because dblink
returns a set of record
(to be generic).
Conclusions
—————
Even though dblink
would not be my preferred choice (also because it is not directly supported by Greenplum – or for Greenplum), this article shows you a simple method to use it in those cases where you can’t really do without it.