Tushar Ahuja Sr. QA Manager Dec 20, 2019
SUMMARY: This article lists 10 open source tools to help developers working with PostgreSQL.
5. Foreign data wrappers (FDWs)
Every developer could use a little help. Here we are sharing a list of tools that will give developers more information from the PostgreSQL server database so they can make better decisions in their queries, procedures, and configurational changes to the database. We will focus on only those tools whose license type is open source.
pgAdmin is a tool that has been specifically designed for database management and has a GUI to manage (or administration) PostgreSQL databases. This also helps the developer to expedite PostgreSQL-specific debugging and analysis. A high level list of features includes:
- An SQL editor where we can write/execute SQL queries with color syntax highlighting.
- Graphical query plan display.
- Procedural language debugger for PL/pgSQL.
- Create, view, and edit all common PostgreSQL objects.
- Supports PostgreSQL server-side encodings—e.g., SQL_ASCII, EUC_CN.
- Backup, restore, vacuum, and analyze support.
- SQL/shell/batch job scheduling agent (pgAgent).
- User-friendly error messages.
- Graphs on the Dashboard tab, which give an active analysis of the usage statistics for the databases.
- Auto-detection and support for objects discovered at run-time.
The pgAdmin tool is available on Linux, Unix, Mac OS X, and Windows. The latest version is PgAdmin 4, which has been completely rewritten as a web app in Python, using Flask and Qt framework.
I must mention Dave Page, who is the founder of pgAdmin. I feel lucky that I am working with him now at the same company, Enterprise DB.
Since developers often need to break things in order to replicate an issue or diagnose an issue, pg_catcheck is a tool that can help ensure that they are not working with a corrupt copy of the database. As the name suggests, pg_catcheck is a catalog integrity checker for PostgreSQL.
This project has been created by the folks at EnterpriseDB and released under the same license as PostgreSQL itself. This is a very simple tool for identifying system catalog corruption.
If there is some corruption in system catalog—for instance, if some table OID is deleted from pg_class—when we run pg_catcheck it will show all the system catalog tables that are affected.
[tushar@localhost bin]$ ./pg_catcheck -p 5432 --postgres postgres notice: pg_type row has invalid typrelid "16384": no matching entry in pg_class row identity: oid="16386" notice: pg_attribute row has invalid attrelid "16384": no matching entry in pg_class row identity: attrelid="16384" attname="n" attnum="1" notice: pg_attribute row has invalid attrelid "16384": no matching entry in pg_class row identity: attrelid="16384" attname="ctid" attnum="-1" notice: pg_attribute row has invalid attrelid "16384": no matching entry in pg_class row identity: attrelid="16384" attname="xmin" attnum="-3" ----- progress: done (10 inconsistencies, 0 warnings, 0 errors)
pg_catcheck is not going to fix the problem, but will inform you about what parts are broken, if any. It is supported on Linux and Windows.
For more information (including documentation and support) and to download the source code, please visit https://github.com/EnterpriseDB/pg_catcheck.
PgBouncer is an open source, lightweight connection pooler for PostgreSQL. It maintains a pool of connections for each unique user-database pair. It gives the connection to the user and returns it back when the client disconnects. This can be very helpful in reducing connection overhead.
pgBouncer has a virtual database called “pgbouncer,” from which we can fire commands to control the server and view statistics.
To configure and work with pgBouncer, there are a few files that need to be set up first:
- The configuration file name is pgbouncer.ini. We can see the contents of pgbouncer.ini file at https://github.com/pgbouncer/pgbouncer/blob/master/etc/pgbouncer.ini.
Under [databases] in the code, we can specify the database which we want to manage—e.g., postgres = host=127.0.0.1 port=5432
- In “userlist.txt” file, we need to provide the database username and passwords that we want to allow to connect via pgBouncer port.
Once we are done with configuration and the userlist.txt file, we can start the service, either manually or with systemd service. After starting this service, we can connect to psql using default pgbouncer port=6432:
[centos@tusharcentos6 bin]$ ./psql pgbouncer -p 6432 -U postgres Password for user postgres: psql.bin (11.5.12, server 22.214.171.124/bouncer) Type "help" for help. pgbouncer=# show pools; database | user | cl_active | cl_waiting | sv_active | sv_idle | sv_used | sv_tested | sv_login | maxwait | maxwait_us | pool_mode -----------+-----------+-----------+------------+-----------+---------+---------+-----------+----------+---------+------------+----------- pgbouncer | pgbouncer | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | statement (1 row)
pgBouncer can be installed on Linux using source code, YUM, or APT command. For Windows installers are available.
For more information please visit http://www.pgbouncer.org/.
Pgpool-II is a middleware that works between servers and a database client.
Some of the features include:
- Connection pooling.
- Load balancing.
- High availability (Replication, automatic failover).
- Distribution of select queries between master and replicas.
The current Pgpool-II stable version is 4.0. Pgpool-II works on Linux, Solaris, FreeBSD, and most UNIX-like architectures. Windows is not supported.
To configure Pgpool-II, there are a few files that need to be set up first:
- pgpool.conf is the main configuration file, where we need to provide all the details, such as Master/Replica mode, connection_pooling information, load balancing mode, pid and log file location, authentication, ssl connections, etc.
We can see the contents of the pgpool.conf file at https://github.com/pgpool/pgpool2/blob/master/src/sample/pgpool.conf.sample.
For example, M=master database information:
backend_hostname0 = 'localhost' backend_port0 = 5432 backend_weight0 = 1 backend_data_directory0 = '/var/lib/pgsql-9.6//data'
- The pcp_conf file contains the user/password file for authentication.
The Pgpool-II default port is 9999 and pcp_port is 9898, but these can be changed in the pgpool.conf file.
After starting the Pgpool-II service:
[root@tusharcentos6 bin]# ./pgpool -d -D -f pgpool.conf -n > /tmp/pgpool.log 2019-11-15 09:05:50: pid 3697: WARNING: "pool_conn_dbname" is depreciated, use "health_check_database" to configure health check database pid 3697: DEBUG: initializing pool configuration pid 3697: DETAIL: num_backends: 1 total_weight: 1.000000 pid 3697: DEBUG: initializing pool configuration pid 3697: DETAIL: backend 0 weight: 2147483647.000000 flag: 0000
We can connect to psql via port=9999
[root@tusharcentos6 bin]# ./psql -U postgres postgres -p 9999 psql.bin (11.5.12) Type "help" for help. postgres=# show port; port ------ 5432 (1 row)
For more information, please visit:
Pgpooladmin is a GUI tool to monitor, start, stop, and change settings for Pgpool-II:
5. Foreign data wrappers (FDWs)
PostgreSQL can link to other systems to fetch data via foreign data wrappers (FDWs) . When we fire a query (e.g., SELECT) against a foreign table, the FDW will fetch the result from the external data source and print the output via foreign table.
Below are some popular FDWs:
a) postgres_fdw is used to perform DML operations (select, insert, update, delete) with external PostgreSQL servers in PostgreSQL. postgres_fdw is a contrib module and can be found in the contrib/ folder of PostgreSQL sources. For more information please visit https://www.postgresql.org/docs/11/postgres-fdw.html.
b) mysql_fdw is used to perform DML operations (select, insert, update, delete) with MySQL servers in PostgreSQL. We need to download the source code from GitHub and compile against the PostgreSQL server. For more information please visit https://github.com/EnterpriseDB/mysql_fdw.
c) oracle_fdw is used to perform DML operations (select, insert, update, delete) with Oracle servers in PostgreSQL. For more information please visit https://github.com/laurenz/oracle_fdw.
d) mongo_fdw is is used to perform DML operations (select, insert, update, delete) with MongoDB in PostgreSQL. For more information please visit https://github.com/EnterpriseDB/mongo_fdw.
e) hdfs_fdw is used for the Hadoop distributed file system. It can work with Hive or Spark. It provides the capability to read data from the Hadoop Distributed File System (HDFS) using SQL and use it for online transactional processing (OLTP) and analytical purposes. For more information please visit https://github.com/EnterpriseDB/hdfs_fdw.
Slony-I is a master-to-multiple-standbys replication system for PostgreSQL.
It’s one of the older options for replication available and operates as a trigger-based replication method that is a “master to multiple replicas” solution.
Some of the features include:
- Works with different versions of PostgreSQL.
- Minimal downtime upgrades (or high availability)
- Cascading replication
- Log shipping
- Multiple database data sharing
There are a few commands that we should be aware of:
- INIT CLUSTER: Defines the slony replication cluster.
- SLON: The daemon that manages replication. Each node has its own daemon
- SUBSCRIBER SET: Initiates replication for a replication set (i.e., a set of tables)
- SLONIK: A command line utility to create and update a Slony configuration.
For more information please visit http://slony.info/.
This tool allows us to write Oracle-style procedure/function in PostgreSQL.
It also supports Oracle-style:
- date/time function—e.g., Add_months,last_date,next_day, trunc, round,months_between,to_date, sysdate
- Oracle system supplied packages—e.g., Dbms_output, dbms_random, utl_file, dbms_alert, dbms_pipe
- DUAL Table
- Other functions like concat, nvl, nvl2, lnnvl, decode, and oracle.substr
- VARCHAR2 and NVARCHAR2 support
For more information please visit https://github.com/orafce/orafce.
SQLsmith is a SQL random query generator that helps developers find bugs at an early stage of feature development. This tool has found lots of critical issues, including security vulnerabilities in released versions of PostgreSQL. Some of the types of errors it has caught are noted on its GitHub page: https://github.com/anse1/sqlsmith/wiki#score-list.
SQLsmith connects to a postgresql database to retrieve the schema for query generation. All generated statements are rolled back. Note that it also fires queries like pg_terminate_backend(), so it’s better not to run with a user who has no such permission to execute that function. It plays around with system catalog functions and finds server crashes and other errors.
postgres=# select 70 as c0, pg_catalog.has_server_privilege( cast(ref_0.indexdef as text), cast(cast(coalesce((select name from pg_catalog.pg_settings limit 1 offset 16), null) as text) as text)) as c1, pg_catalog.pg_export_snapshot() as c2, ref_0.indexdef as c3, ref_0.indexname as c4 from pg_catalog.pg_indexes as ref_0 where (ref_0.tablespace = ref_0.tablespace) or (46 = 22) limit 103; TRAP: FailedAssertion("!(keylen < 64)", File: "hashfunc.c", Line: 139) server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request.
For more information please visit https://github.com/anse1/sqlsmith.
As the name suggests, Ora2Pg is used to migrate Oracle objects into PostgreSQL. This tool will connect to an Oracle database and generate SQL scripts that contain table structures and data that can be executed against PostgreSQL.
Some of the features include:
- Export full database schema (such as tables, views including materialized view, sequences, and indexes).
- Export specific tables.
- Export user-defined functions, triggers, procedures, and packages.
- Migrate Oracle user-defined types.
- Supports Oracle BLOB object as BYTEA,
- Supports DBLINK as Oracle FDW.
- Supports SYNONYMS as views.
A sample database migration report can be found here: http://ora2pg.darold.net/report.html.
For more information please visit https://ora2pg.darold.net/start.html.
pgBackrest is a backup and recovery tool for PostgreSQL. It is completely free and open source under MIT License.
Some of the key features include:
- Parallel backup and restore
- Full, incremental, and differential backups
- Backup integrity
- Backup rotation and archive expiration
- Backup resume
The current stable release version is 2.19. For more information please visit https://pgbackrest.org/.
Hope it helps!