Part of the release engineering process at Swarm64 is to run performance benchmarks that assure us new versions of Swarm64 DA and PG Nitrous cause PostgreSQL performance to improve and not regress.
We run TPC-H and TPC-DS benchmarks to measure query-intensive analytic (OLAP) performance. For mixed workloads such as hybrid analytic transaction processing (HTAP) or transaction-enhanced analytics (TEA), we’ve developed a new open source benchmark that I’ll describe below.
All of our benchmarks are open source and available in the Swarm64 DA benchmark toolkit on Github.
What is HTAP and why does it matter to PostgreSQL users?
An HTAP database system is able to support transaction processing (OLTP) while simultaneously answering complex queries of the data. HTAP is easier and less costly than running two copies of the database (e.g., a system of record vs. a system of insight), to support OLTP vs. OLAP workloads, respectively. HTAP can also enable queries to return more real-time answers by eliminating the time it takes to replicate or ETL data from one database to another.
The need for benchmarks specific to mixed workloads arises as OLTP and OLAP pose different demands on database architectures, and the combination of both is particularly challenging. This is because a mixed workload is either primarily OLTP (but also running analytical queries), or it’s primarily OLAP (but also executing OLTP updates to the data and maintaining ACID properties). These are called HTAP and TEA, respectively. Since Swarm64 DA users run both HTAP or TEA, we need a way to benchmark Swarm64 DA-accelerated PostgreSQL for both of these scenarios.
Existing HTAP benchmarks
Two HTAP benchmark implementations stand out: CH-benCHmark and HTAPBench. Both, as well as the Swarm64 HTAP benchmark, have a transactional workload based on the TPC-C benchmark, and an analytical workload based on the TPC-H benchmark.
These HTAP benchmark implementations use the same hybrid schema, containing all TPC-C relations along with the addition of some TPC-H relations that have no counterpart in TPC-C. This allows them to run all TPC-C transactions (the transactional workload) without changes, and TPC-H queries with some changes over the same data (the analytical workload).
A key decision point when dealing with mixed workloads is to choose the “primary workload,” the workload without performance degradation. Here, the main difference between the two aforementioned benchmarks is that HTAPBench considers the transactional workload as primary and automatically determines how many OLAP queries can be run at the same time without a performance penalty.
On the other hand, CH-benCHmark does not consider any workload primary. It simply runs all the configured transaction streams or query streams in parallel. Achieving the same result as HTAPBench is possible with CH-benCHmark but manual work is required. One would have to run the benchmark multiple times with different configurations. The extra effort has a valuable payoff, however, in that it also allows for constructing additional workloads that HTAPBench does not support, e.g., considering the OLAP workload primary. This flexibility in selecting the workload is one of the reasons we decided to follow the CH-benCHmark approach.
TPC-H uses a static data set that does not change during the run, and timestamps used in query filters are drawn from a static distribution. Running the OLTP workload in the background, however, continuously adds new data. To ensure that this new data is queried, the time window a particular query looks at must be moved into the future at the same rate new data is added, so that a query actually process a similar volume of data (e.g., the previous month), instead of a larger and larger volume. Without this, the analytical query times would become very inconsistent, thus making comparisons difficult. This sliding window technique, implemented in HTAPBench but not available in CH-benCHmark, is incorporated in the S64 DA Benchmark Toolkit.
The HTAP benchmark in the Swarm64 DA benchmark toolkit
Our implementation uses the same schema, transactions, and queries as CH-benCHmark and HTAPBench. The transactional workload is a direct port of sysbench-tpcc (based on TPC-C), while the analytical workload part is a newly developed implementation that adopts some ideas from HTAPBench (based on TPC-H).
One aspect in which our implementation deviates from HTAPBench is how distances between timestamps are computed. HTAPBench contains a compiled-in linear scaling that has to be determined during a training run. This is then used during the initial data population phase to ensure comparable timestamp densities between population and execution phases. In our implementation, we use the fact that the data interval in TPC-H is 2405 days (from 1992-01-01 to 1998-08-02), regardless of the scale factor. The number of orders in this interval allows us to compute the average time distance between transactions. We use this computed distance in both, the population and the execution phase, thus simulating the timestamp distance as found in TPC-H. This allows us to ensure repeatable and homogenous results without the need for a training run. In fact, in our implementation no analytical query is issued until there is enough data to cover the interval. This is part of the burn-in step of the benchmark. It is also important that this data growth is done via a mix of INSERT, UPDATES, and DELETES, which creates a non-uniform physical data distribution (more realistic for a transactional or hybrid system) than a linear load, which is common in pure analytical benchmarks.
The Swarm64 HTAP benchmark makes it easier to obtain reproducible results when the transactional load is the primary workload by offering the possibility of setting a specific target for the number of transactions per second (TPS) that the system wants to maintain. Otherwise, the benchmark could issue more TPS at the cost of reducing the analytical performance. Of course, any reasonable TPS must be determined by first running the benchmark without analytical load and observe the steady-state TPS rate achieved by running it for a few hours.
The Swarm64 HTAP benchmark also allows you to monitor the performance metrics for both transactional and analytical workload live while a benchmark is running. For unattended use cases, these monitoring results can be stored in a database as well .CSV files.
And finally, our HTAP benchmark can also be used to benchmark systems composed of a transactional database and a replicated database for analytics, as you can specify different target databases for the transactional and the analytical queries. However, this probably means, that the analytical queries operate on data that goes behind the transactional database.
Running an HTAP benchmark
As an example, let’s see how to run a very small HTAP benchmark. First, get the Swarm64 DA benchmark toolkit and install the toolkit dependencies:
git clone https://github.com/swarm64/s64da-benchmark-toolkit.git
pip install -r requirements.txt
Then prepare a benchmark using the prepare_benchmark script and specify the desired scale factor:
./prepare_benchmark \ --dsn postgresql://postgres@localhost:5432/htap \ --benchmark htap \ --scale-factor 1 \ --schema s64da_native_enhanced
In TPC-H the scale factor defines the total dataset size. In the HTAP benchmark it also roughly defines the initial size of data in the tables and the number of warehouses–20 times the scale factor. But during the benchmark execution the data size will increase. If you want to run the HTAP benchmark against PostgreSQL without Swarm64 DA, then just use the psql_native schema.
The Swarm64 DA benchmark toolkit displays a message for every table it is loading:
Once all the data is ingested, it creates the relevant indexes and runs the ANALYZE command on all the tables in the database:
After the data ingestion and database preparation step, we are ready to run the benchmark using the run_benchmark script. For instance, to run an HTAP benchmark for 30 minutes with 10 workers doing the OLTP workload, 1 worker performing an OLAP workload, and a target TPS of 800, execute the following command:
./run_benchmark \ --dsn postgresql://postgres@localhost:5432/htap \ htap \ --oltp-workers 10 \ --olap-workers 1 \ --duration 1800 \ --target-tps 800
Note that these values are quite small. This is in order to be able to run the tool in almost any hardware. More realistic parameters would be a scale factor of 100 in the preparation step, 64 OLTP workers here, and a target TPS of several thousand.
During the benchmark execution, you can monitor the up-to-date state of the transaction rates, number of transactions, latencies, and the status of the analytical queries for each of the analytical streams. This saves you a lot of time when dealing with the complex interaction patterns of mixed workloads. You can also see some information about the main tables and data on the status of the Swarm64 DA columnstore indexes if any is present.
At the beginning we are in the burn-in phase of the benchmark, where no analytical query is being executed, you can see them in a ‘Waiting‘ status. These do not start until there is the required seven years of data (as described previously). From that point on, the sliding window technique kicks in. As we can see at the beginning there is around five years of data.
Later on, the system reports to be on the HTAP phase of the benchmark, where analytical queries run in parallel to the transactional workload.
Once the benchmark is finished, a results summary displays with statistics about the transactional and analytical workloads, which includes how many times the analytical streams have been completed, i.e., the complete set of analytical queries, and how long it took to complete each set:
With the Swarm64 HTAP benchmark now you can measure you own PostgreSQL installation under a demanding HTAP workload and contrast it against Swarm64 DA. Check our side-by-side HTAP benchmark video here.
Swarm64 DA and PG Nitrous both include instructions for using the Swarm64 DA Benchmark Toolkit. If you’d like to try Swarm64 DA or PG Nitrous, you get them for free here.