In our popular post, PostgreSQL parallelism can be like a chocolate box, Sebastian showed how to test your query parallelism potential following simple steps.
This time I want to go back to the main challenges of parallel query execution in PostgreSQL and show you how the Swarm64 DA PostgreSQL drop-in extension removes several limitations unleashing parallel query execution.
Parallel query execution in native PostgreSQL
PostgreSQL parallel query feature can devise query plans which leverage multiple CPUs to answer queries faster. You can check if your query will be executed in parallel simply by running EXPLAIN or EXPLAIN ANALYZE.
EXPLAIN ANALYZE SELECT COUNT(*) FROM orders WHERE o_orderdate BETWEEN ‘1996-01-01’ AND ‘1996-06-30’
Whenever there is parallelism in your query, you will have a gather node on the plan (i.e. node #2 on the plan visualization on the left). All operations that sit below the gather node are executed in parallel — the parallel sequential scan and partial aggregate with 16 workers each. In contrast, the plan of the same query on the right does not have workers planned; there is no parallelism, and it runs slower. The parallelized query is 10x faster than the non-parallelized one (3.18 sec vs 31 sec).
Not all queries benefit from parallel execution. Low latency, transactional queries will not benefit due to the overhead of orchestrating a query parallel execution, while queries that touch a large amount of data (like the one above) will benefit. Systems that need to support transactional and occasionally longer-running queries will also benefit from faster execution of the longer-running queries freeing up system resources to support more users.
If you want your query to be executed in parallel, first make sure that parallelism is enabled on your system. The
max_parallel_workers_per_gather setting needs to be greater than zero, which should be the case for PostgreSQL (default value is 2). This is the maximum degree of parallelism that a single query can achieve. We usually recommend to our clients to set it to 75% of the number of virtual CPU cores deployed to your database.
Second, check for situations that may inhibit PostgreSQL parallelism such as CURSORS (used by several BI tools by default) and functions lacking a PARALLEL_SAFE mode. See the complete list here.
Third, be aware of which operations support parallelism and write your queries accordingly to help the planner generate parallel plans. For example,
DISTINCT, LATERAL JOIN, and correlated queries will cause your query plan to turn serial.
If you want to increase parallelism, you can tune the parallel query settings in your PostgreSQL configuration file. However, make sure you have a good understanding of how to best size the parameters according to your hardware resources, and how they relate to each other. For example,
work_mem is applied individually to each worker which means that the total memory utilization is dependent on the degree of parallelism (
As you get deeper into PostgreSQL parallel query execution, you may notice that adding more parallel workers does not further improve your query runtime. PostgreSQL in many cases will not utilize all available workers since it will limit the maximum amount (we have noticed that typically 12 or 16 is the upper limit of parallel workers). Here is an excellent analysis of why this occurs.
How Swarm64 DA scales PostgreSQL parallel query execution?
To answer this question, we will compare the query plans of PostgreSQL without the Swarm64 DA extension against the plan of the same query when Swarm64 DA features are added to PostgreSQL.
We will use Query 16 (parts/supplier relationship) from the widely recognized TPC-H decision support benchmark. The goal of this query is to find how many unique suppliers can supply parts having certain attributes. For this exercise, we are using scale-factor 1000 (approximately 1.5TB of data) and a machine with 64 virtual CPU cores and 376 GB RAM.
--TPC-H Q16 select p_brand, p_type, p_size, count(distinct ps_suppkey) as supplier_cnt from partsupp, part where p_partkey = ps_partkey and p_brand <> ':1' and p_type not like ':2%' and p_size in (:3, :4, :5, :6, :7, :8, :9, :10) and ps_suppkey not in ( select s_suppkey from supplier where s_comment like '%Customer%Complaints%' ) group by p_brand, p_type, p_size order by supplier_cnt desc, p_brand, p_type, p_size;
The query above counts the number of distinct suppliers per brand, type, and size and then sorts the result from highest to lowest. It then joins the partsupp and part tables and excludes rows having a specific brand and type and certain sizes. It also excludes suppliers that have had complaints made against them. This is representative of a typical query you might encounter in an analytical workload.
Let’s take a bird’s eye view of the entire query plan – native PostgreSQL on the left and with Swarm64 DA on the right.
We can notice that the picture on the right is much denser. The Swarm64 DA plan has much higher parallelism, 52 parallel workers, against the 14 workers planned by native PostgreSQL. Furthermore, looking at the plan more in detail, we will notice that Swarm64 DA adds a few steps in the plan marked with “Parallel S64 DA…”. All this resulting in 17x faster query execution.
Now we will dive deeper into the most relevant nodes for achieving a scalable parallel query execution of this query.
Let’s follow the query execution order and start with the initial fetch of data (at the bottom of the plan). Native PostgreSQL uses a Parallel Seq Scan and an Index Only Scan, in contrast with the Parallel S64 DA Columnstore Scan when the Swarm64 DA extension is loaded.
Columnar storage is a key database feature for data analytics. By storing data in columnar format, the SELECT ignores data that is not relevant to the query, therefore, highly reducing I/O. Furthermore, columnar storage compresses very well, reducing even more I/O.
Swarm64 DA columnstore index brings the necessary data selectivity and data throughput to the query processing pipeline to effectively execute complex queries. Here you can watch a cool video about our columnstore index.
Moving up in the query plan, we can see that for native PostgreSQL, the gather node is introduced early in the plan (likely due to the DISTINCT clause in the query). This results in the serial execution of the aggregate and the sorting — the most expensive operation consuming 7 of the 8 minutes required to run this query. Here is where shuffling comes in handy (a well-known technique in distributed computing).
When PostgreSQL chooses to use the Parallel S64 DA Hashed Shuffle, data is redistributed among workers allowing the plan to stay parallel. In this example, the plan stays parallel right until the end — executing the aggregate and the sorting in parallel, thus reducing the time spent in the sorting operation by 50x.
Of course, redistributing the data based on a key comes at a certain cost. A time penalty that becomes acute for short-running queries. For those cases, Swarm64 DA (since release 5.6.0) uses S64 DA Duplicator, a data duplication node, instead of the shuffle node to keep the query plan parallel also for short-running queries operating on smaller datasets.
As I mentioned, parallelism in PostgreSQL is restricted in several cases. One of the most common ones is the presence of CURSORS in the interface of the database with BI tools and SQL clients. S64 DA Output Buffer is the node that helps here, allowing query parallelism in the presence of CURSORS, JDBC, and materialized views. When used one can see it at the top of the query plan, like on the right-hand side picture above.
We constantly add new cool features to our extension to make PostgreSQL parallel query execution breeze through complex queries against your transactional database. Subscribe to our newsletter to stay tuned.