Setup of Gluesync data capture (GDC)

Prerequisites

  • Oracle database version 10g and above;

  • A user with a grant to alter a source table, create a table, create a sequence and create trigger statements.

To create a valid user for Gluesync on your Oracle database to run GDC you can run/adapt the following query:
CREATE USER GSUSER IDENTIFIED BY password;

ALTER USER GSUSER quota unlimited on USERS;

GRANT CREATE SESSION TO GSUSER;
GRANT CREATE TABLE TO GSUSER;
GRANT SELECT ANY TABLE TO GSUSER;
GRANT CREATE TRIGGER TO GSUSER;
GRANT CREATE ANY TRIGGER TO GSUSER;
GRANT CREATE ANY SEQUENCE TO GSUSER;

Looking for data modeling features or other options?

For more detailed configuration options, including the ability to perform data modeling, please have a look at the dedicated Data modeling section when sourcing from an RDBMS.

Basic configuration example

The SQL to NoSQL module can be customized by using a configuration file, in JSON format. The file name to use must be specified as a parameter when launching the app, with the -f or --file tokens. The file should be composed of the union of the common configuration files (see here Installation steps) and source/destination specific configuration:

{
  ...
  "oracle": {
    "connectionType": "SID",
    "usePartition": true,
    "triggerPageSize": 10000,
    "thinNetConnectTimeout": 30,
    "thinReadTimeout": 30,
    "downHostsTimeout": 30,
    "thinOutboundConnectTimeout": 30,
    "tableKeys": {
      "articles": ["id1", "id2"],
      "drivers": ["driver_unique_id"],
      "customers": [
          "customer_name",
          "customer_surname",
          "customer_age",
          "customer_id"
        ]
    }
  },

  ...
  "sourceChangeRetention": 5
  ...
}

Oracle-specific configurations are listed under the oracle property.

  • connectionType (optional): connection type used to connect to Oracle. Available types are: SID or SERVICE_NAME. Defaults to SID;

  • usePartition (optional): default to false. Enables partitioning at GLUESYNC_ACTION_LOG table, this ensures that no locks are being made while purging old records from that table as well as preserving the Oracle database well in shape. Partitioning is available only on Oracle Enterprise editions, check compatibility with your Oracle database instance before turning on this functionality;

  • triggerPageSize (optional): default to 10000. The max buffer size of rows within a single transaction that a trigger is capable of buffering before writing actions to GLUESYNC_ACTION_LOG table, works only starting from version 11.2g and above.

  • thinNetConnectTimeout (optional): the connect timeout controls how much time is allowed to connect the socket to the database.

  • thinReadTimeout (optional): read timeout while reading from the socket.

  • downHostsTimeout (optional): to specify the amount of time in seconds that information about the down state of server hosts is kept in the driver’s cache.

  • thinOutboundConnectTimeout (optional): the outbound connect timeout controls the time allowed to connect the socket, let the server accept the connection to the desired service, negotiate the NS protocol as well as complete the ASO negotiation.

  • tableKeys: is the object that represents the key-value pair list of tables and its columns that compose the clustered (compound) index or the column Gluesync will use as a primary key. To read more about this feature we suggest you have a look at the following link.

The source core parameter sourceChangeRetention can also be customized in that source connector. This will tell the automatic purge mechanism built-in in the Gluesync source connector for Oracle triggers to purge every record that is older than the given amount of days, 5 as per the example shown here above.

Addition to standard GLUESYNC_ACTION_LOG table

Transaction consumed column

Oracle GDC-based source connector comes with an additional column named TRANSACTION_CONSUMED, this column helps the Gluesync engine identify changes that are coming into the ACTION LOGS table as deferred commits (seen happening when also other CDC tools are using this data source as a target database).

Here following are explained the logical steps to better understand how this column is being used:

  1. Gluesync reads transactions from the action log table with a transaction ID greater than the last transaction ID read and with TRANSACTION_CONSUMED = 0;

  2. For each transaction read the column TRANSACTION_CONSUMED = -1 is set;

  3. The data is copied to the target and the writing confirmation is waited for;

  4. Gluesync updates the state preservation table with the latest synchronized transaction;

  5. Gluesync updates the column TRANSACTION_CONSUMED = 1 for all transactions with value TRANSACTION_CONSUMED = -1 and transaction ID less than/equal to the last transaction saved on state preservation;

  6. If Gluesync goes into error the flow is reset, all rows with TRANSACTION_CONSUMED = -1 are reset to TRANSACTION_CONSUMED = 0 and Gluesync starts reading again from the last confirmed transaction on the state preservation table.