Benchmarking PostgreSQL: Setting up a RAID Array from Scratch

April 15, 2021

As part of my work on the CTO Team here at EDB, I do daily benchmarking of the PostgreSQL development branch with 500 concurrent users in order to make sure performance does not go down and to identify when in case it does. This also helps us determine which features improve performance for this TPROC-C workload.

We’ve noticed that we get a wide variety of results, even though I’m using a bare metal instance on AWS. In order to get more control over the environment (and therefore hopefully more stability), I am moving my benchmarking of PostgreSQL off of AWS and onto a dedicated machine in one of EDB's datacenters.

I am not the best when it comes to system administration, so the goal of this blog post is to help others who don't normally do this, and also to remind myself of what I did.


Steps to set up a RAID array

The first thing I need to do is mount the disks I'll be using.  While the system drive is a RAID 10 array of four SSD drives on a hardware controller, I want to use RAID 0 arrays of NVMe drives for the write ahead log and data directory, and they require a software RAID setup.  I don't need mirroring because all of the data is recreatable should one of the disks fail.  If this were a production system, I would use RAID 10.  To do this, I need to install mdadm.

dnf install mdadm

The first mount point is for the wal files and the second is the data directory.

mkdir /pgwal
mkdir /pgdata

The wal files will be on two drives and the data directory will be on four.  Does it make any sense to put a continuous append-only stream on two disks, or would one be sufficient?  I don't know!  I'll put on my todo list to benchmark both ways, so expect a new blog post for that in the future.

mdadm --create /dev/md0 --level=0 --raid-devices=2 /dev/nvme0n1 /dev/nvme1n1
mdadm --create /dev/md1 --level=0 --raid-devices=4 /dev/nvme2n1 /dev/nvme3n1 /dev/nvme4n1 /dev/nvme5n1
mkfs.xfs /dev/md0
mkfs.xfs /dev/md1

I decided to use XFS for these drives, for no particular reason.  That, too, might be varied in the future to see if there is something better.

In order to mount them, I need their UUIDs and I get that like this:

lsblk -o NAME,UUID /dev/md0
lsblk -o NAME,UUID /dev/md1

And now I can put them in my /etc/fstab:

UUID=c28bbe22-b9bb-4973-90b8-a0d11293e982 /pgwal    xfs   
defaults,noatime    1 1

UUID=a8d2a8ef-de0a-40e8-8d9f-daa47f21ff64 /pgdata    xfs   
defaults,noatime    1 1

Now I can mount them, load the data, and start benchmarking!

mount -a

I will give the full specs of this server next time when I have some results to show. Here are the results of my daily benchmarking of the PostgreSQL development tree for the month of March 2021, performed on the current AWS environment.

PostgreSQL March Benchmarks

As can be seen from the graph, the average performance of PostgreSQL HEAD has stayed fairly consistent throughout March. As usual, all of the raw data is available on GitHub.

Stay tuned for next month’s benchmarks!

Share this

Relevant Blogs

How to use pgAgent in pgAdmin 4

pgAgent is a tool used for scheduling jobs for PostgreSQL databases. Each job consists of steps and schedules. Creating a pgAgent job To create or manage a job, use the...
May 10, 2023

Setup pgAdmin development environment

pgAdmin is the most popular and feature rich Open Source administration and development platform for PostgreSQL. In this blog, I will walk you through how to set up the pgAdmin...
May 10, 2023

More Blogs

pgAdmin with Kerberos and Active Directory

pgAdmin supports Kerberos authentication for user logins as well as connecting to databases. Kerberos is a popular authentication method but many people find it difficult to set up especially with...
March 21, 2023

pgAgent Setup

pgAgent is a job scheduling agent for Postgres databases, capable of running multi-step batch or shell scripts and SQL tasks on complex schedules, which may be managed using pgAdmin. pgAgent...
March 21, 2023