First, let's take a peek forward at the kind of results you get out of this exercise. When I ran the no-branch pgbench on my test system, it reported about 500 transactions per second with 4 clients. That's helpful, but the graph of transactions per second is a lot more informative:
It's obvious that we're really getting close to 600 transactions per second most of the time, but when checkpoints kick in they're trashing performance for several seconds. A look at the latency graph shows exactly how long those pauses are:
Here's easy to spot that the checkpoints are freezing all four clients for about 8 seconds each time they happen, and the checkpoints are happening about every 50 seconds. This is quite a different thing altogether from naively reading the pgbench results and expecting 500 transactions per second to fire.
# cd postgresql-8.2.3 # wget http://www.westnet.com/~gsmith/content/postgresql/pgbenchpatch.txt # patch -p0 < pgbenchpatch.txt patching file README.pgbench patching file pgbench.c # cd contrib/pgbench # makeYou can confirm the patch applied correctly by seeing if your output matches this:
# cat pgbench.c | grep "%d %d"
fprintf(LOGFILE, "%d %d %.0f %d %ld %ld\n",
fprintf(LOGFILE, "%d %d %.0f\n",
Copy the resulting pgbench to wherever you want to run it from;
we'll assume /usr/bin/pgbench (where it will go if you make install for the following.
Create a timing table in your database of choice like this:
postgres=# CREATE TABLE timing(ts timestamp,filenum int, latency numeric(9,3));Create this script as log-to-csv.py :
import sys
import datetime
import csv
out=csv.writer(sys.stdout,delimiter=',')
for l in sys.stdin:
l=l.strip()
(client,trans,latency,filenum,sec,usec)=l.split(" ")
latency=float(latency) / 1000
timestamp=float(sec)+float(usec) / 1000000
d=datetime.datetime.fromtimestamp(timestamp)
r=[d.isoformat(" "),filenum,latency]
out.writerow(r)
Create this script as benchwarmer:
#!/bin/sh
HOST=localhost
USER=postgres
DB=pgbench
tottrans=10000
c=1
t=`expr $tottrans / $c`
echo Cleaning up database $DB
psql -h $HOST -U $USER -c 'truncate table history' $DB
psql -h $HOST -U $USER -c 'vacuum' $DB
psql -h $HOST -U $USER -c 'vacuum full' $DB
psql -h $HOST -U $USER -c 'vacuum analyze' $DB
psql -h $HOST -U $USER -c 'checkpoint' $DB
echo $t transactions for each of $c concurrent users... 1>&2
/usr/bin/pgbench -l -x -N -n -U $USER -t $t -h $HOST -c $c $DB &
p=$!
wait $p
mv pgbench_log.${p} pgbench.log
cat pgbench.log | cut -f 3 -d " " | sort -n | tail
Now make it executable, run it, process the results, and import them into your results database:
# chmod +x benchwarmer # ./benchwarmer # python log-to-csv.py < pgbench.log > timing.csvRun psql from your database, then do
psql -d pgbench -c " copy timing from stdin with csv" < timing.csvAlternately, you might do the import from psql like this:
postgres=# copy timing from '/fullpath/timing.csv' with csv;Generate the TPS and latency data with:
# psql -d pgbench -A -t -F' ' -c "select extract(epoch from date_trunc('second',ts)),count(*) from timing group by date_trunc('second',ts) order by date_trunc('second',ts)" > tpsdata.txt
# psql -d pgbench -A -t -F' ' -c "select extract(epoch from ts),latency from timing" > latency.txt
Create a gnuplot script called tps.script:
set term png set output "tps.png" set title "TPS" set grid xtics ytics set xlabel "Time during test" set ylabel "TPS" set xdata time set timefmt "%s" plot "tpsdata.txt" using 1:2 with linesAnd another called latency.script:
set term png set output "latency.png" set title "Latency" set grid xtics ytics set xlabel "Time during test" set ylabel "Latency (ms)" set xdata time set timefmt "%s" plot "latency.txt" using 1:2Now you can generate the graphs you see above by running gnuplot:
gnuplot tps.script gnuplot latency.script