GlueSync NoSQL to SQL
Replicate data from Apache HBase
Core principles
GlueSync uses the scan
feature provided by Apache HBase to catch changes occurred at database level.
To perform all the operations under HBase, GlueSync uses the latest SDKs provided directly by the Apache HBase community. Having this kind of approach means that we are not going to face any deprecations or incompatibility issue, having selected a native and robust approach with no third party dependency or any private APIs.
GlueSync needs to create own collection to store some informations as sync checkpoint, documents checksums and others. The table name is gs_state_preservation . Please, avoid deleting or edit this table or its content.
|
Architectural overview
Here in the following diagram is represented an architectural overview of the environment you are going to have after having deployed GlueSync.

Clustering
When moving large datasets, and especially when there’s a big data store like HBase involved, GlueSync needs to be configured in a cluster configuration.
Clustering GlueSync means that multiple nodes will consumes either same table or multiple tables simultaneously depending on your specific configuration and data density per table. Using that feature will incredibly boost the throughput compared to a single-instance configuration.
Clustering configuration is available throught our professional services, soon in GA and maneagable by end user.
Q&A
I have got Phoenix running on top of my HBase deployment, is it supported? Sure, GlueSync doesn’t make use of any other component other than plain HBase. You can still make use of Phoenix while GlueSync migrates and offload your data from the datastore.
My company has more than a Petabyte of data inside our deployment, does it work with GlueSync? How many resources do I have to dispose to ensure it will work well? Resources amount is based on few key driving factors:
-
Network bandwhidth;
-
Storage performance;
-
CPU;
-
RAM.
We will take care of gathering this KPIs and provide you the proper sizing of the needed resources allocation for the GlueSync instances. There’s is no amount of data that scares GlueSync: when properly sized your data will smoothly flow into the other end, consistently and safe.
Having further questions? For more information regarding GlueSync please reach us via email by pressing here: Contact MOLO17.