Gluesync for PostgreSQL

Gluesync SQL to NoSQL for PostgreSQL

Core principles

Gluesync consumes changes from PostgreSQL in 2 ways:

  • using CDC (Change data capture) technique to read transaction logs from PostgreSQL replication slots technology starting from version 10;

  • using the GDC (Gluesync data capture) technique to catch field-level changes of rows in tables based on our own proprietary trigger auditing structure.

Based on your use case and requirements you can decide whether to use Change data capture (CDC) or Gluesync data capture (GDC, based on triggers).

Triggers vs Change data capture

From the books:

Logical decoding is the process of extracting all persistent changes to a database’s tables into a coherent, easy-to-understand format that can be interpreted without detailed knowledge of the database’s internal state. In PostgreSQL, logical decoding is implemented by decoding the contents of the write-ahead log, which describes changes on a storage level, into an application-specific form such as a stream of tuples or SQL statements. In the context of logical replication, a slot represents a stream of changes that can be replayed to a client in the order they were made on the origin server. Each slot streams a sequence of changes from a single database.
— PostgreSQL
About CDC & logical replication through replication slots
A trigger is a specification that the database should automatically execute a particular function whenever a certain type of operation is performed. Triggers can be defined to execute either before or after any INSERT, UPDATE, or DELETE operation, either once per modified row, or once per SQL statement. If a trigger event occurs, the trigger’s function is called at the appropriate time to handle the event. PostgreSQL offers both per-row triggers and per-statement triggers. With a per-row trigger, the trigger function is invoked once for each row that is affected by the statement that fired the trigger. Gluesync’s GDC (Gluesync Data Capture) feature is based on a mix of per-row & per-statement triggers, more specifically Gluesync adopts an optimized triggers structure to catch events generated by operations that hit fields in your source PostgreSQL database.
— PostgreSQL & MOLO17
Triggers in PostgreSQL

While they both share basic principles of Gluesync, like the STATE_PRESERVATION table as well as the MIGRATION_CHECKPOINT one. The main difference resides instead in how changes are recorded from inside the source database:

  • CDC sees changes recorded and managed by the PostgreSQL logical records API;

  • GDC tracks record changes using the above-mentioned triggers that are storing minimal information (like the unique row identifier belonging to the row that got changed as well as the CRUD action) inside Gluesync’s GLUESYNC_ACTION_LOG table, only tracking changed rows IDs, storing data on that table where all changed row IDs are stored and their data type maintained unaltered. This technique is also known as change tracking pattern. This in the opposite of change data capture which does not require a copy of all the data present in all the columns involved in a commit, this ensures performance gain and less footprint per transaction on your source database since no data transformation or manipulation is being required by our design.

Our 2 cents here:

  • CDC is compatible with versions starting from version 10 and above while GDC has also backwards compatibility;

  • CDC permits workload isolation between the logical records tracking and the overhead required by the trigger-logics compute;

  • CDC requires additional setup steps and experience on PostgreSQL while the GDC lifecycle is being fully managed by Gluesync itself.

Architectural overview

Here in the following diagram represents an architectural overview of the environment you are going to have after having deployed Gluesync.

a diagram illustrating the architectural overview of Gluesync for PostgreSQL

Q&A

I have got PostgreSQL running under a managed DBaaS offered from AWS | CGP | Azure | OCI | …​, is it GDC supported? Since we opted for GDC to catch field-level changes from your PostgreSQL instance this gives you the ability to work with whatever deployment/flavor of PostgreSQL you might have.

For a list of supported versions have a look at our list of supported databases/versions available here.

Having further questions? For more information regarding Gluesync please reach us via email by pressing here: Contact MOLO17.