Backup and restore data in PostgreSQL foreign tables using pg_dump

Posted in How to, PostgreSQL

In this article, we describe a recent enhancement to pg_dump that Swarm64 contributed that more fully supports the backup and restoration of foreign tables. The contribution has been committed to PostgreSQL 13.

The PostgreSQL foreign-data wrapper (FDW)

The PostgreSQL foreign-data-wrapper interface is an extensibility feature that allows PostgreSQL to integrate data from another data source, which might be another PostgreSQL instance, an Oracle server, a CSV file, or more. Queries against a foreign table access the external data source and return results as if they had been accessed from a native PostgreSQL table.

The Swarm64 DA PostgreSQL accelerator extension speeds up PostgreSQL query performance in a number of ways, including the use of a compressed, columnar storage format that we store in a foreign table.

Foreign table backup and restore prior to PostgreSQL 13

Foreign tables extend PostgreSQL in many useful ways. Unfortunately, prior to PostgreSQL 13, pg_dump does not fully support foreign tables; pg_dump correctly dumps the DDL commands needed to define the foreign table again, but no rows will ever be dumped. If you restore that backup, the foreign tables will have been defined again, but there won’t be any data.

Foreign table backup and restore in PostgreSQL 13

In order to make database administration easier for our customers, as of Swarm64 DA 3.1.0, and PostgreSQL 13, we developed a patched version of pg_dump that allows you to dump the content of a foreign table using the --include-foreign-data option. This option instructs pg_dump to perform a COPY (SELECT * FROM foreign_table) TO on the tables that use the foreign server provided by the option.

For example, the following database test_db has a Swarm64 foreign table foo:

CREATE EXTENSION swarm64da;
CREATE FOREIGN TABLE foo(a int) server swarm64da_server;
INSERT INTO foo SELECT * FROM generate_series(1,10);

You can create the backup dump.txt file using the new option:

$ pg_dump -d test_db --include-foreign-data=swarm64da_server > dump.txt

The backup contains the necessary statements to recreate the table and insert the data that is in table foo, as seen in the following excerpt of dump.txt:

--
-- Name: foo; Type: FOREIGN TABLE; Schema: public; Owner: postgres
--

CREATE FOREIGN TABLE public.foo (
	a integer
)
SERVER swarm64da_server;

ALTER FOREIGN TABLE public.foo OWNER TO postgres;
--
-- Data for Name: foo; Type: TABLE DATA; Schema: public; Owner: postgres
--

COPY public.foo (a) FROM stdin;
1
2
3
4
5
6
7
8
9
10
\.

--
-- PostgreSQL database dump complete
--

To restore the dump in a new database named new_db, apply the dump using the psql command:

$ psql new_db < dump.txt

If your setup has a different foreign-data wrapper installed, you can issue the option multiple times to selectively dump the content of tables backed by different foreign servers:

$ pg_dump -d test_db --include-foreign-data=swarm64da_server --include-foreign-data=other_fdw_server

Some FDWs are read-only, such as the file_fdw provided by PostgreSQL. Do not use --include-foreign-data with a server that is read-only because it makes your backup unusable. To help avoid this situation, the --include-foreign-data option forces you to explicitly define the foreign servers that should be used. Unfortunately, this is not automatically preventable because pg_dump has no way of knowing if a foreign-data wrapper is read-only.

The --include-foreign-data option for pg_dump is available as of Swarm64 DA 3.1.0 and PostgreSQL 13. If you are using previous versions of PostgreSQL, you can apply the patch, which for PostgreSQL 12 and 11 can be easily adapted.

To download the patch, go to the Swarm64 Github repository or in the PostgreSQL hackers list.

I’d like to thank the PostgreSQL reviewers and my colleague Ursula Kallio for their contributions.