Administration

When running Gluesync you don’t have to worry about much - the log will record any activity, and Gluesync itself will monitor the source data set to mirror any changes to the specified destination.

In the long run, you may want to update some configuration settings to better reflect changed requirements or data source refactoring.

Configuration data is loaded at boot, so after updating the config file you should restart Gluesync for the new settings to take effect, that requires just a couple of seconds and it’s safe: you won’t lose data.

Performing a new data refresh task

There are cases where you may want Gluesync to perform a full refresh of your data by copying the whole content from source to target.

When the data model is updated, you can decide whether to leave already-mirrored data as-is or perform a new synchronization of all source data. This very much depends on the type of changes you made to the configuration and data modeling settings.

If you need to re-synchronize a full table, your easiest option will be to simply remove the relevant row inside Gluesync’s state tracking table, which is automatically managed inside the source database by the Gluesync user specified in the settings and is called GLUESYNC_STATE_PRESERVATION.

The table has three columns:

  • The name of the table the state is being tracked for

  • The last transaction ID that has been successfully mirrored across

  • The timestamp of the last successful mirroring operation

To force a re-synchronization simply locate the row in this table that refers to the table (or rows, for more than one table) that was updated in the configuration, and remove them before starting Gluesync back again.

At boot, Gluesync will check the table, and for any table that is described in the configuration file but is not found in the GLUESYNC_STATE_PRESERVATION table, will perform a full synchronization. Data already existing in the destination database will be updated with any new fields.

Be sure you’ve set to true the copySourceEntitiesAtStartup flag. Otherwise, Gleusync will skip the initial snapshot and will just perform CDC over your changes feed.

Checking for data consistency

While the process of initial load starts loading your data from source to target, Gluesync will output in its console log output per each of the declared entities how many rows it has found in the format described in the Snapshot chapter the value then stored inside the GLUESYNC_MIGRATION_CHECKPOINT table at the end of the process (meaning after completion 100%) can be used for reconciliation of records present in the target database. In this way, if both counts are then matching you can assume that everything went well in terms of rows/documents migrated.

What do to if the number is lower

If the given number from the target database is lower than the number of expected records you intended to migrate this is an indicator of possible keys collisions. We suggest you first check your data source to make sure you have unique keys that are identifying uniquely each of your rows. In case you don’t have PKs columns you can always tell Gluesync to cluster multiple columns to build a compound primary key, you can learn more about that process by looking at the following chapter and/or checking for a way to customize your document keys differently by playing with Mapping target document IDs.

What to do if the number is higher

A higher number of records present in the target database compared to what you were targeting to move usually indicates that the snapshot wasn’t performed in a clean and flushed (truncated) destination datastore. Fix it by trying to empty all existing records from the target datastore and repeat the process once again.

Data quality

Currently Gluesync doesn’t come with a built-in data quality tool. We suggest running custom queries over both tour source and target data after having performed a snapshot to check for data quality: look for date values that are in the proper format, look for keys that match your desired format as well for currency and other meaningful value you can easily use to spot for any possible configuration issue that resulted in a poor quality migration.