Introduction

PostgreSQL ships with a handy little utility called pgbench. It will launch multiple processes that all execute scripts against a test database as fast as it can handle them. Most pgbench results you'll see focus on the gross transactions per second or TPS results. For many reasons, these can be very deceiving. A much more useful way to look at what happened is to graph TPS and its mirror image, latency, over time. Doing takes some customization to pgbench and a bit of scripting, and that's what I'm doing to lead you through here.

First, let's take a peek forward at the kind of results you get out of this exercise. When I ran the no-branch pgbench on my test system, it reported about 500 transactions per second with 4 clients. That's helpful, but the graph of transactions per second is a lot more informative: It's obvious that we're really getting close to 600 transactions per second most of the time, but when checkpoints kick in they're trashing performance for several seconds. A look at the latency graph shows exactly how long those pauses are: Here's easy to spot that the checkpoints are freezing all four clients for about 8 seconds each time they happen, and the checkpoints are happening about every 50 seconds. This is quite a different thing altogether from naively reading the pgbench results and expecting 500 transactions per second to fire.

Patching pgbench

pgbench can produce a log of latency results by using the -l command line switch. But without timestamping of each transactions, you can't get a nice view like the above. You need to update your copy of pgbench to add this capability. You can also obtain a copy the patch at http://www.westnet.com/~gsmith/content/postgresql/pgbenchpatch.txt Here's a sample of applying it:
# cd postgresql-8.2.3
# wget http://www.westnet.com/~gsmith/content/postgresql/pgbenchpatch.txt
# patch -p0 < pgbenchpatch.txt
patching file README.pgbench
patching file pgbench.c
# cd contrib/pgbench
# make
You can confirm the patch applied correctly by seeing if your output matches this:
# cat pgbench.c | grep "%d %d"
                        fprintf(LOGFILE, "%d %d %.0f %d %ld %ld\n",
                        fprintf(LOGFILE, "%d %d %.0f\n",
Copy the resulting pgbench to wherever you want to run it from; we'll assume /usr/bin/pgbench (where it will go if you make install for the following. Create a timing table in your database of choice like this:
postgres=# CREATE TABLE timing(ts timestamp,filenum int, latency numeric(9,3));
Create this script as log-to-csv.py :
import sys
import datetime
import csv

out=csv.writer(sys.stdout,delimiter=',')
for l in sys.stdin:
  l=l.strip()
  (client,trans,latency,filenum,sec,usec)=l.split(" ")
  latency=float(latency) / 1000
  timestamp=float(sec)+float(usec) / 1000000
  d=datetime.datetime.fromtimestamp(timestamp)
  r=[d.isoformat(" "),filenum,latency]
  out.writerow(r)
Create this script as benchwarmer:
#!/bin/sh
HOST=localhost
USER=postgres
DB=pgbench
tottrans=10000
c=1
t=`expr $tottrans / $c`
echo Cleaning up database $DB
psql -h $HOST -U $USER -c 'truncate table history' $DB
psql -h $HOST -U $USER -c 'vacuum' $DB
psql -h $HOST -U $USER -c 'vacuum full' $DB
psql -h $HOST -U $USER -c 'vacuum analyze' $DB
psql -h $HOST -U $USER -c 'checkpoint' $DB
echo $t transactions for each of $c concurrent users... 1>&2
/usr/bin/pgbench -l -x -N -n -U $USER -t $t -h $HOST -c $c $DB &
p=$!
wait $p
mv pgbench_log.${p} pgbench.log
cat pgbench.log | cut -f 3 -d " " | sort -n | tail
Now make it executable, run it, process the results, and import them into your results database:
# chmod +x benchwarmer
# ./benchwarmer
# python log-to-csv.py < pgbench.log > timing.csv
Run psql from your database, then do
psql -d pgbench -c " copy timing from stdin with csv" < timing.csv 
Alternately, you might do the import from psql like this:
postgres=# copy timing from '/fullpath/timing.csv' with csv;
Generate the TPS and latency data with:
# psql -d pgbench -A -t -F' ' -c "select extract(epoch from date_trunc('second',ts)),count(*) from timing group by date_trunc('second',ts) order by date_trunc('second',ts)" > tpsdata.txt
# psql -d pgbench -A -t -F' ' -c "select extract(epoch from ts),latency from timing" > latency.txt
Create a gnuplot script called tps.script:
set term png
set output "tps.png"
set title "TPS"
set grid xtics ytics
set xlabel "Time during test"
set ylabel "TPS"
set xdata time
set timefmt "%s"
plot "tpsdata.txt" using 1:2 with lines
And another called latency.script:
set term png
set output "latency.png"
set title "Latency"
set grid xtics ytics
set xlabel "Time during test"
set ylabel "Latency (ms)"
set xdata time
set timefmt "%s"
plot "latency.txt" using 1:2
Now you can generate the graphs you see above by running gnuplot:
gnuplot tps.script
gnuplot latency.script

TODO List

This page is a work in progress. The main expansion being prepared takes all of the above scripts and turns them into a single command you can run. Some other things being worked on include: