Determining the right pgbench database size scale

pgbench is a handy utility that's included with PostgreSQL. It's purpose is to determine how quickly some simple transactions can be executed against a database, and it will create that test database for you. The option that determines how big of a database it creates is referred to as the database scale. Here's what the documentation has to say on about this parameter:
-s scaling_factor: this should be used with -i (initialize) option. number of tuples generated will be multiple of the scaling factor. For example, -s 100 will imply 10M (10,000,000) tuples in the accounts table. default is 1. NOTE: scaling factor should be at least as large as the largest number of clients you intend to test; else you'll mostly be measuring update contention.
That seems straightforward enough: set scale to be larger than the maximum number of clients you want to test against and you're done, right? If only it were so easy. Finding the right setting for the database scale is much more complicated than this; the size of the database and its indexes will determine whether you're testing memory performance, simple disk I/O, or disk seeking behavior. The best way to learn how this works is to watch how the pgbench results change as the size of the database increases, and that's what I'm going to lead you through here.

Basic database parameters

We start with creating the database cluster using initdb. As is documented in our example server info, this system has 1GB of RAM. Following the guidelines of the 5 minute guide, we pick a starting value for shared_buffers equal to 1/4 of the memory in this system, around 250MB. That's on the low side of normal for a 8.2 server, but we need to keep some space for the OS. The way pgbench results scale is based on the buffer cache size, so what we set here isn't too important as long as it's in the right general range and we recognize when things can and can't fit in the cache size provided.

The basic pgbench tests don't care about effective_cache_size, because the sample database doesn't have any complicated indexes, so we can ignore that for now. The other initial change to the postgresql.conf I make that I'm not going to go into too much detail on yet is increasing checkpoint_segments; we'll get to tuning that properly later. Here's the total changes to the postgresql.conf file to get started:

shared_buffers=30,000 # approx 250 MB
checkpoint_segments=8

After setting the parameters, start the cluster up and build the database using something like this:

pg_ctl start
createdb pgbench
cd contrib/pgbench
make
./pgbench -i -s 1 pgbench
That's a database with a scale of 1, which means 100,000 rows in the accounts table. We'll get our first test result like this:
./pgbench -t 2000 -c 8 -S pgbench
This runs the select-only test with 8 clients. 4 clients per CPU/core is a reasonable value here to get enough clients to load up all your CPUs, which is how I got to 8 for the dual-core Athlon processor in the test server.

Here's a trio of results from running this three times. That's the minimum number of runs I'd suggest to consider the result significant; there is way too much variation in pgbench results (even if you run the test for a longer period) to just take any single result as useful. We'll only be considering the TPS figure with the connection established overhead taken out.
102471041910405
Now, how big of a database were we working with here? These sample queries will make sure all the statistics about your database are current, then look at the raw database size and the size of the various tables/indexes. I only include the size of the accounts table here, because the branches/tellers ones are insignificant in comparison and the history table is empty at the start of a test.

pgbench# vacuum analyze;
pgbench# checkpoint;
pgbench# SELECT relname, reltuples, pg_size_pretty(relpages * 8192) as size FROM pg_class ORDER BY relpages DESC;
              relname              | reltuples |    size
-----------------------------------+-----------+------------
 accounts                          |    100000 | 13 MB
 accounts_pkey                     |    100000 | 1768 kB

pgbench# select datname,pg_size_pretty(pg_database_size(oid)) from pg_database;
  datname  | pg_size_pretty
-----------+----------------
 pgbench   | 18 MB
Now, the thing to do here is to step up the scale while running this whole series:
  1. Rebuild the database using pgbench -i -s
  2. analyze/checkpoint to put the database in a consistant state
  3. Note starting size of database and tables
  4. Run test 3 times
The normal sequence I run is looking at the scales of 1, 5, 10, 20, 30, 40, 50, 75, 100, 200, 500. Then I fill in some points in the middle of those based on what I see; in this case, I filled in 25, 35, and 45 because that was the most interesting part of the data. Here are the results:
ScaleDB Size (MB)accountsaccounts_pkeytps1tps2tps3
118131.8102471041910405
5766499996100859963
1014912817983298029809
2029425634901888819222
2536732043881389798960
3043938451846187878712
3551244860882687548765
4058551268463046615918
4565757677113312101275
5073064086673800848
751093961128368374365
10014561281171190174174
20029072623350165137130
50072636714897999288
(Note that errors in the conversion into MB have crept into the top end of this table because the data collected exceeded the range of an integer, so I couldn't use pg_size_pretty anymore)

You can likely re-use much of this table for your own tests, because unless you're using a very different version of PostgreSQL the first four columns listing sizes will be the same each time if collected correctly; you'll just have different results for the three test results. Here's what that looks like as a graph:

This was generated by taking the middle of the three TPS values for scaling factor and plotting thost against the scale. This is generally a good approach for useful pgbench results: run three times, use the middle TPS number. To generate your own graph like this, you can use my sample data as a template and feed that into the gnuplot script that drew this graph.

What does it mean? Running a select-heavy test like this gives results that fit into three categories:

CPU/Memory bound: Scale 1 - 35, 18 - 512MB database. When the data is small enough to fit in the shared_buffer cache, your disk performance doesn't matter one bit to the results once everything is in there. And it's hard to even get the data out of memory; it's cached there when you do the writes to build the database, and even if your restart the server your operating system will likely have cached everything as well. You either need to reboot or do something drastic to clear out the database cache in order to actually test disk performance again, and even then the data will get read back very quickly and then you're back to testing memory operations again.

These results show we're memory bound right up to a scale of 35, and get 8500-10400 SELECTs per second in that range. The database size at that scale is 512MB. We have a 250MB shared_buffer cache, at least another 250MB sitting in the operating system cache, and the disk controller cache has its own 256MB of RAM. No wonder we can operate so fast on this data; the disk drives aren't even involved.

Disk I/O bound: Scale 40 - 75, 585 - 1095MB database. In this range, we can fit almost all the data we need in one of our bits of memory, but not quite all of it. There is a bit of disk shuffling going on sometimes, not so much that results are completely pulled down by it though.

Disk seek bound: Scale 100+, 1456MB or larger database. These results are from the area where only a fraction of the database can be fit into memory. The results are now in a similar range to what the bonnie++ test results benchmarked my maximum seeks per second.

The exact breakpoints here shift a bit if we use a different client load, but the basic curve doesn't change much. Where your breakpoints end up at very much depends on exactly how big all your caches are. There are a few important things to walk away with here:

To demonstrate how common this curve is, here's another example. This is from a Quad-Core Q6600 system with 8GB of RAM, Areca disk controller, and 3-disk RAID 0 for database with another for the WAL; 64-bit Linux kernel 2.6.22.19, PostgreSQL 8.4-devel, and shared_buffers='2048MB':

Note how the curve spikes downward at 8GB the same way the smaller system dove at 1GB. Raw data for that example:
scale dbsize tps
1 21 21468
10 155 20156
25 380 19171
50 754 18374
100 1501 18333
150 2249 18233
200 2997 17900
300 4492 17297
350 5240 15088
400 5987 3656
500 7483 1490
600 8978 949
700 10473 773
800 11969 668


Notes, references, and credits

Copyright 2009 Gregory Smith. Last update 3/17/2009.