Latest News
Start your free PG Nitrous Trial Get started ›
Swarm64 & EDB partner to extend Postgres with faster queries Read more ›
Blog Post: Introducing the Swarm64 DA PostgreSQL Settings Advisor Read more ›

Using PG Nitrous as a fast read replica for Amazon RDS PostgreSQL

Posted in How to, PostgreSQL, Product

Introduction

Here is a step-by-step guide on how to set up a PG Nitrous database as a read replica for an Amazon RDS (Postgres) database. PG Nitrous is a PostgreSQL database instance, accelerated by the Swarm64 DA extension, that provides much faster query performance than Amazon RDS or self-hosted PostgreSQL. If you are setting up a replica to handle user queries, as with a multi-user reporting or dashboarding system, PG Nitrous is a simpler, more economic alternative to scaling your Amazon RDS resources or migrating to Amazon Redshift.

AWS Instances setup

  • Publisher: The RDS instance containing the database can also be called the publisher as it publishes data to the read replica. This has to be configured to provide access to the PG-Nitrous replica, so you should make sure that the replica’s IP is whitelisted or both instances are in the same network.
  • Subscriber: The PG Nitrous instance you’ve created by launching PG Nitrous from the AWS Marketplace. This will serve as the read replica, and we’ll refer to it in this guide as the subscriber.

Logical replication configuration

Logical replication is a method of replicating data objects and their changes, based upon their replication identity (usually a primary key). We use the term logical in contrast to physical replication, which uses exact block addresses and byte-by-byte replication.

The first step is to configure the publisher to allow logical replication. In logical replication, you only set up one database of your instance to be replicated while with physical replication you replicate the whole instance (every database on the instance) to the replica.

  • The ‘rds.logical_replication’ static parameter needs to be set to 1
  • This is done by creating a new parameter group from Amazon RDS Web Console (under the “parameter groups” section on the side menu)
  • Create a new Parameter group and set the rds.logical_replication to 1, then save.
  • Go back to the Databases Dashboard, select your publisher instance, and click ‘Modify’.
  • Under Additional Configuration > Database options > DB parameter group … you can now select the newly created parameter group.
  • Once this is done, you should reboot your instance and these changes will take place.
  • Additionally, these parameters (wal_level, max_wal_senders, max_replication_slots, and max_connections) will also be automatically adjusted accordingly to suit the replication needs.
show rds.logical_replication;
rds.logical_replication
--------------------
on

Steps to replicate the database

1. Check role privileges

The role of the user on each database has to have the correct privileges for replication otherwise some of the steps may fail.

In the RDS instance, the user should look like this:

\du postgres
List of roles
Role name | Attributes | Member of
-----------+-------------------------------+------------------
postgres | Create role, Create DB +| {rds_superuser}
| Password valid until infinity |

In the PG Nitrous instance, the subscriber role should look like this:

\du postgres
List of roles
Role name | Attributes | Member of
-----------+------------------------------------------------------------+------------
postgres | Superuser, Create role, Create DB, Replication, Bypass RLS | {}

2. Set up the publishing

Make sure you are connected to the database that you want to replicate on the RDS instance since with logical replication you can only replicate one database at a time and not all the databases on your server.

For tracking the progress you could create a test table to check the data on both sides at each step of the replication setup.

create table table_test (id1 int, id2 int);

Then add some data into it with:

insert into table_test values (1,2);

Run the following SQL command:

CREATE PUBLICATION alltables FOR ALL TABLES;

If the logical replication setup went fine this statement should execute successfully. This can be checked with the following queries:

select * from pg_publication_tables;
pubname | schemaname | tablename
-----------+------------+-----------
alltables | public | table_test
select * from pg_publication;
pubname | pubowner | puballtables | pubinsert | pubupdate | pubdelete
-----------+----------+--------------+-----------+-----------+-----------
alltables | 16389 | t | t | t | t

3. Configuring the subscription

Before subscribing to the publisher make sure that the schema is already created on the subscriber’s database. The schema can have slight differences (i.e. additional columns, bigint instead of int type) but make sure the data is compatible, i.e. all publisher columns must be present, and no mismatching types (e.g. DateTime and int).

create table table_test (id1 int, id2 int, extra_column char(5));

Once this all is set, then the following SQL statement can be run on the subscriber, with all the parameters set correctly (i.e. host, user, password):

CREATE SUBSCRIPTION mysub
CONNECTION 'dbname=postgres host=xx.xx user=postgres password=xxxx'
PUBLICATION alltables;

To see the replication slots created once the connection is made you can query them from the publisher with select * from pg_replication_slots; and there will be a slot named mysub.

If rds.force_ssl parameter is set to 1 (on) at the publisher instance then you need to update the CONNECTION string of CREATE SUBSCRIPTION command and include SSL parameters like sslmode, sslrootcert etc. For information about downloading certificates for RDS PostgreSQL, see Using SSL/TLS to Encrypt a Connection to a DB Instance.

To check the status of the replication on the publisher side you can run:

select * from pg_stat_replication;
-[ RECORD 1 ]---+-----------------------------
pid              | 4218
usesysid         | 16389
usename          | postgres
application_name | mysub
client_addr      | xxx.31.89.29
client_hostname  |
client_port      | 57078
backend_start    | 2020-xx-xx 07:xx:49.72425+00
backend_xmin     |
state            | streaming
sent_lsn         | 57/14000060
write_lsn        | 57/14000060
flush_lsn        | 57/14000060
replay_lsn       | 57/14000060
write_lag        |
flush_lag        |
replay_lag       |
sync_priority    | 0
sync_state       | async

And on the subscriber side you can run:

select srsubid, pg_filenode_relation(0,srrelid), srsublsn, srsubstate from pg_subscription_rel;
srsubid | pg_filenode_relation | srsublsn | srsubstate
---------+--------------------------+------------+------------
16391 | table_test | 57/80197D0 | r

where the srsubstate has codes representing: i – initialize, d – Data is being copied, s – synchronized, r – Ready (normal replication)

You can also query the table_test to see if the first row is already replicated in the subscriber side.

4. Creating new tables after replication

If new tables are created on the publisher’s database after the replication connection is made, it will not automatically be replicated to the subscriber.

To replicate a newly added table, the table will also have to be created in the subscribing PG Nitrous database. After defining the tables in the subscribing database, the subscription needs to be refreshed by executing the following:

alter subscription mysub REFRESH PUBLICATION;

Once this publication is refreshed the new table will be added to the tables being replicated. This can be checked by querying the state of the subscription again:

select srsubid, pg_filenode_relation(0,srrelid), srsublsn, srsubstate from pg_subscription_rel;

The newly added table will also be found in this result set.

5. Setting up updates and deletes after replication

So far only inserts have been tested and proven to work fine, but if you try to update or delete the data you may run into some issues, with error messages that look like:

update table_test set id1=1111 where id1=1;
ERROR: cannot update table "table_test" because it does not
have a replica identity and publishes updates
HINT: To enable updating the table, set REPLICA IDENTITY
using ALTER TABLE.

Once a table is published it needs to have the replica identity set before it can update or delete any rows. This setting changes information written in the write-ahead logs to identify rows which are updated or deleted.

The options for REPLICA IDENTITY are defined as follows:

  • DEFAULT: records old values of the columns of the primary key (if any)
  • USING INDEX: records old value of the columns covered by the named index, which must be unique, not partial, not deferrable, and include only columns marked NOT NULL.
  • FULL: records old values of all columns in the row
  • NOTHING: records no information about the old row

So the replica identity is used as a key to match the rows between the subscriber and the publisher hence by default the primary key of a table is used.

In other cases, a unique index can also be used as the key. If no uniquely identifying element exists then one can resort to the FULL setting where all columns are used as a key.

And if the REPLICA IDENTITY is not set to FULL on the publisher side then the subscriber side will also need to be set with a corresponding REPLICA IDENTITY on the respective table.

Example code:

alter table table_test REPLICA IDENTITY FULL;

FULL can be the easiest setting to go with, but it could become inefficient, storage-wise because it stores the old values of the entire row. I recommend using FULL as a last resort when no primary key or unique index is available.

After setting the replica identity, the update statement from the above should now run without any errors on the publisher. The changes will also be replicated to the subscriber accordingly.

6. Removing the replication

In the event you’d want to remove the replica, then the following steps would have to be taken.

alter table table_test REPLICA IDENTITY FULL;

First, the subscription will have to be dropped from the subscriber’s side:

drop subscription mysub;

As a consequence, the respective replication slots on the publisher’s side will also be dropped. This can be checked with the following statement:

select * from pg_replication_slots;

Then from the publisher’s side, the publication must also be dropped.

drop publication alltables;

After this is done, you can query the publication tables or the publications in general and the result should be empty.

select * from pg_publication_tables;
select * from pg_publication;

And now all the tables will continue to exist without any form of replication going on in between the publishing and subscribing databases.

Remarks & References

If you’d like to learn more about replicating data from Amazon RDS for PostgreSQL, I recommend these articles: Logical replication to replicate managed RDS to self-managed PostgreSQL and Logical Replication under PostgreSQL RDS documentation.

This setup is for replicating a single database on your PostgreSQL server. If you would like to replicate all the databases on your instance then you would have to repeat these steps for all databases or use physical replication.

Logical Replication does have a few limitations that you should be aware of depending with the setup of your database and how you plan to use your read-replica, these are all explained in this link to the PostgreSQL documentation.

For more in-depth information about logical replication in general, there is a chapter for it in the PostgreSQL documentation.