Data modeling in Gluesync

Data modeling is an essential toolkit that we provide to let developers and DBAs achieve the best out of their destination target database especially when it comes to mapping and modeling data in JSON documents according to the application business logic/presentation layer point of view. This means that with that feature you will be able to implement different document design strategies that will result in a boost of performance and productivity at the destination application level.

The options described below may require additional resources to be processed. We suggest having on board our professional services to fine-tune and properly size the containers and number of nodes required to satisfy your production workloads.

Core principles

It is important to mention that every operation that Gluesync does when modeling your data does not involve data introspection at the Gluesync core modules level because of the security design approach that our engineers adopted when crafting the core replicators modules.

That said every operation that involves data modeling is performed against the source or destination database engine, leveraging its full set of features and achieving the best out of compatibility from it, talking about SQL language standards support, performances and data types.

Is on these solid principles that we refer to it the term on-the-fly data modeling when talking about the Gluesync data modeling capabilities because is indeed designed for in-memory-only computation without the need to store or persisting no bit of data either while transferring or applying modeling-related business logics.

Data Modelling 101

These are the currently supported group of different features that Gluesync offers in terms of data modeling on the fly:

  • Field skipping, the ability to skip certain fields avoiding their replication;

  • Field renaming, ability to rename fields;

  • SQL queries JSON modeling, apply SQL statements on your data source to output a tailored JSON document data model;

  • Advanced data modeling: using our meta-description language you can describe how data has to be modeled achieving an unparalleled customization level with nested JSON support (arrays, objects…​) of up to 2 levels deep (even more coming soon…​).

Each of these is meant to support you on the journey of replicating data from a relational database to a NoSQL database or vice versa, overcoming challenges like post-commit document manipulation or UDFs (user-defined functions) that could slow down the time-to-data need that you might have, making real-time data replication challenging.

Every data modeling function that Gluesync supports works both while capturing changes through CDC or GDC and also when the initial snapshot of data is performed. This means that you don’t have to handle any manual data modeling tasks.

Field skipping

This functionality is available both for SQL-to-NoSQL replication and vice versa following the respective links:

You should use this functionality to avoid unwanted fields being replicated on the counter side, preserving space: don’t forget that each key in a JSON document is repeated per each document in your NoSQL database, meaning that if there are certain columns that you are not considering to use in NoSQL are easily avoidable using that feature to save space and resources.

Field Renaming

This functionality is available both for SQL-to-NoSQL replication and vice versa following the respective links:

You should use this functionality to rename long column names or just make them more use-case-oriented to let them be replicated on the counter side with a new name. A common use case is preserving space: don’t forget that each key in a JSON document is repeated per each document in your NoSQL database, meaning that if certain columns have long names you might be considering renaming these to save space and resources.

SQL queries JSON modeling

an example of SQL query to JSON output result

This feature enables you to define a SQL query statement, the same that you are used to using while performing queries against your relational database, to aggregate, map and design an output structure that will be then transformed by the Gluesync data modeling engine as a JSON document output. It works both while initially moving your entire dataset (tables and columns defined in the statement) and also while performing CDC incremental replication.

To learn more about this functionality please visit the related page at this link: SQL queries JSON modelling.

As for now, this functionality is only supported when replicating changes from a relational database to a NoSQL database, more extended support is on our development roadmap and will be soon made available.

Advanced data modeling

Version v1.4 introduces this feature we called Advanced data modeling since it enables users to highly customize their data modeling output using the easy-to-learn meta-description language that helps you define how data should be joined, filtered, columns skipped or renamed, all while providing a way to freely define the level of deep you want your data to be represented: JSON documents provide you a wide range of possibilities in terms of application data model flexibility, so we do.

Imagine how many ways you could have decided to output the same query result from the table structure in the picture here below:

an example of Advanced data modelling output result

In this e-commerce-like use case, you could have decided to:

  • nest order rows inside all the relevant customer and order information needed by the delivery person

  • split orders header and rows information on different JSON documents, while keeping customer and shipping address info on the same document as the header;

  • split orders header, rows, customer, and address information on different JSON documents;

  • …​we can continue to imagine different scenarios all day long: but what matters is that you decide based on your specific use case: it’s a trade-off between commodity/performances/flexibility and so on.

Start now with Advanced data modeling sourcing from RDBMS here at this link: Advanced data modeling from RDBMS.

Looking for Advanced data modeling when sourcing from NoSQL databases? Follow this link: Advanced data modeling from NoSQL.