Setup of Gluesync CDC connector for PostgreSQL

Prerequisites

To have Gluesync working on your PostgreSQL instance you will need to have: * valid user credentials with permissions to read, create tables, schema and replication slots and write to the target tables and respective database.

To create a valid user for Gluesync on your PostgreSQL database you can run/adapt the following query:
CREATE USER gsuser SUPERUSER;
ALTER USER gsuser WITH PASSWORD ‘youdecide’;
Given user gsuser present in the example above is not a mandatory username for Gluesync, you can define whatever user name you’d like.

Setup steps

Enabling CDC on the source database

First of all, to enable CDC you will need to set the wal_level param to logical. To learn more about what wal, also known as write ahead logs, is you can refer to PostgreSQL’s official documentation by following this link.

After having configured the proper wal_level the other step will be to set the number of desired replication slots that are allowed to run on your PostgreSQL environment. From the docs, max_replication_slots, is the maximum number of simultaneously defined replication slots available for GlueSync to work correctly with CDC.

the minimum replications slots number should be equal to the number of entities in the GlueSync config.json. The value that is already set in PostgreSQL by default is 10.

The last step will be changing max_wal_senders which is the maximum number of simultaneously running WAL sender processes available for Gluesync to work correctly with PostgreSQL CDC. The minimum wal senders number should be equal to the number of replication slots set, by the way, we suggest setting this param slightly higher than the number of replication slots so that the database can better handle orphan connections. The value that is already set in PostgreSQL by default is 10.

There are two ways to achieve that and we have documented the needed steps in the next 2 chapters.

Via SQL commands

In order to set the wal_level param to logical via SQL command you have to issue the following statement:

alter SYSTEM set wal_level = logical;

Then, after having successfully applied that you can issue the following statement to set the number of desired replication slots:

alter SYSTEM set max_replication_slots = <number_of_desired_replication_slots>;
If this parameter is changed in the future setting a smaller value to the already active replication slots will cause the database not to start!

Lastly, it’s time to change the max_wal_senders value, to do that just hit the following:

alter SYSTEM set max_wal_senders = <number_of_desired_wal_senders>;

Via editing postgresql.conf file

To set the wal_level param to logical through editing the postgresql.conf file you are required to go inside your PostgreSQL directory, with a user that has read/write rights, look for the file named postgresql.conf and open it.

There inside you’ll find a param called wal_level where its given value needs to be changed to logical, if the line is commented please recommend it to let the configuration file parser understand your change.

You can learn more about modifying the postgresql.conf file by following this link;

After having applied that first change please set also max_replication_slots and max_wal_senders params in the same way.

If this parameter is changed in the future setting a smaller value to the already active replication slots will cause the database not to start!

Allowing connections to replica streams

After having applied the above-mentioned changes to your PostgreSQL configuration we need to allow Gluesync to connect over your instance’s replica streams. Gluesync requires to connect with replication privileges to a replication stream. You will need to modify the values in your pg_hba.conf, if not already set, like this:

local   replication   all                   trust
host    replication   all   127.0.0.1/32    md5
host    replication   all   ::1/128         md5

This will allow replication connections from localhost, by a user with the replication privilege.

The IP Address of the first host replication can differ based on your server configuration and how Gluesync will connect to your server. In the example case, Gluesync is running on the same machine of PostgreSQL database which is usually not recommended for a production environment.

Installing Wal2Json

Gluesync requires Wal2Json to work correctly with Logical Replication. Wal2Jon is a plugin for proving logical decoding as output steam to Gluesync that requires it to receive changed tables data in a readable format. This plugin can be installed as a last step, right after the CDC has been enabled.

Depending on your OS and PostgreSQL version you have to install the plugin by issuing one of these commands:

In Red Hat/CentOS:

sudo (if needed) yum install wal2jsonXX

In Debian/Ubuntu:

sudo (if needed) apt-get install postgresql-XX-wal2json

Where XX is one of the supported versions of PostgreSQL (16, 15, 14, 13, 12, 11, 10). Depending on your PostgreSQL version you have to replace XX chars with the correct version.

To learn more about Wal2Json you can refer to Wal2Json’s official documentation by following this link.

Basic configuration example

The SQL to NoSQL module can be customized by using a configuration file, in JSON format. The file name to use must be specified as a parameter when launching the app, with the -f or --file tokens. The file should be composed by the union of the common configuration file (see here Installation steps) and source/destination specific configuration:

{
  ...
  "postgres": {
      "tableKeys":{
        "dbo.SUPPLIERS": ["NAME", "SURNAME", "EMAIL"]
      },
    "maxPollingIntervalMilliseconds": 2
  }
}

PostgreSQL-specific configurations are listed under the postgres property:

  • tableKeys: this object is mandatory if you’re considering sourcing data from table(s) where there are NO primary key columns defined. You can learn more about the tableKeys object and how to leverage it by reading its deep dive at this link;

  • maxPollingIntervalMilliseconds: is used by Gluesync to wait for new messages incoming on the replication slot if none new. The default value is 2 seconds;

Troubleshooting

  • Counting replication slots count: If you are in need to perform a check about how many replication slots are currently in use within your instance you can run the following query:

select * from pg_replication_slots;

Looking for data modeling features or other options?

For more detailed configuration options, including the ability to perform data modeling, please have a look at the dedicated Data modeling section when sourcing from an RDBMS.