Advanced data modeling from NoSQL

In this chapter you will understand the principles of the newly introduced feature available starting from version 1.4 specifically designed for a wide range of data modeling/data transformation applications: we named this feature Advanced Data Modeling. All of the operations involved in those features are performed on the fly without involving local caches or any form of persistence inside Gluesync, this makes the overall process faster, more secure and more consistent.

Data modeling in Gluesync makes use of what we call virtual entities, if you’re not familiar with it we suggest having a look at this page, it will take less than 1 minute to read.

Advanced data modeling

When we designed the new data modeling feature we took into consideration a few main aspects, like:

  • freedom to define relationships between tables;

  • Choose either data should be represented nested in different levels of deep or just flat;

  • compatibility across multiple sources: not only RDBMS but also NoSQL databases;

  • easiness to hand-write but also future-proof when it will be managed via an API / GUI;

  • fast to compute;

  • no new language to learn: SQL-like statements easy to recall;

  • easy to extend in the future;

Taking these as main principles we come up with a meta-description language that is now the core foundation of that functionality.

See it in action here in this video:

The meta description language

The description language helps you define what will be the content of your virtual entity that will then become tables-rows-columns objects.

Here following is the list of available commands and their description:

Key Description Accepted values

level

Level of deep of parenthood for JSON objects

1 or 2

name

Defines the name of the entity

string

tables

Array of tables that are involved in a JOIN-like statement, each table is described using these elements here on the left

Key Description Accepted values

table

Name of the table involved

string

fields

Object representing the list of columns that will selected and used for the output. The mapping works exactly as per the basic config of Gluesync: values are the resulting target’s names used as table’s columns

{
  "id": "id",
  "order_number": "order_number",
  "notes": "notes"
}

alias

(optional) Value used by the query engine to distinguish between repeated usages of the same table in the same context

string

where

(optional) Same per standard SQL language, specifies the clause applied to the filter

string, ex. status = 1

fromTable

Represent the master table name that is involved in your query

boolean (true or false), defaults to false

join

Recalls standard SQL-like JOIN statement, specifies the table that has to be joined together in this context. It automatically manages the traverse of relationships between tables' foreign keys (if any).

{
  "with": "orders_headers"
}

keys

Map of keys/values representing the keys and their aliases for the JSON

{
  "customerId": "id_customer"
}

Bring everything together

Following a virtual entity defined using the above keys:

"order": {
  "dataModeling": [
      {
        "level": 1,
        "name": "orders",
        "tables":
        [
          {
            "table": "orders_headers",
            "fields":
            {
              "id": "id",
              "order_number": "order_number",
              "notes": "notes"
            },
            "alias": "oh",
            "where": "status = 1",
            "fromTable": true
          },
          {
            "table": "customers",
            "fields":
            {
              "name": "name",
              "surname": "surname",
              "phone": "phone"
            },
            "alias": "c",
            "join":
            {
              "with": "orders_headers"
            }
          }
        ]
      },
      {
        "level": 1,
        "name": "address",
        "tables":
        [
          {
            "table": "addresses",
            "fields":
            {
              "street": "street",
              "street_number": "street_number",
              "city": "city",
              "postal_code": "postal_code"
            },
            "alias": "ad",
            "join":
            {
              "with": "orders_headers"
            }
          }
        ]
      },
      {
        "level": 2,
        "name": "order_rows",
        "tables":
        [
          {
            "table": "orders_rows",
            "fields":
            {
              "id": "row_id",
              "order_id": "order_id",
              "quantity": "quantity",
              "notes": "row_notes"
            },
            "alias": "o",
            "join":
            {
              "with": "orders_headers"
            }
          },
          {
            "table": "articles",
            "fields":
            {
              "article_name": "article_name",
              "description": "description"
            },
            "alias": "a",
            "join":
            {
              "with": "orders_rows"
            }
          }
        ]
      }
    ],
  "type": "order",
  "scope": "myscope"
}

As a result in your RDBMS database you’ll have:

an example of Advanced data modelling output result

Key fields involved in Advanced data modeling

  • dataModeling: the key that tells the engine how to treat the following object, specifying that key enables Advanced data modeling over the object;

  • type: every modeled document stored on the target database will have a key called type with the given custom value;

  • scope (optional): will be used as source/target scope in NoSQL databases supporting this kind of data tenancy, like Couchbase for example.

type and scope keys have been recently introduced with version 1.5, to learn more please take a look at our version changelog by following this link: Changes in Advanced data modeling.

Compatibility matrix

NoSQL Gluesync compatibility

Aerospike

⏱ launching as soon as CDC support for Aerospike will be introduced

Couchbase

MongoDB

As per our product roadmap NoSQL databases are being added every month so expect to have Advanced data modeling support on more databases soon after the initial launch.