Installation steps

Gluesync NoSQL to SQL for DynamoDB

Prerequisites

In order to have Gluesync working on your DynamoDB AWS account you will need to have:

  • valid user credentials with permissions: arn:aws:iam::aws:policy/AmazonDynamoDBFullAccess;

  • IAM access key obtained through AWS IAM console;

  • IAM secret key obtained through AWS IAM console;

  • name of the region where the DynamoDB belogs to.

Basic configuration example

This module can be customized by using a configuration file, in JSON format. The file name to use must be specified as parameter when launching the app, with the -f or --file tokens. The file should be composed by union of common configuration file (see here Installation steps) and source/destination specific configuration, like these ones for DynamoDB:

Turning ON DynamoDB Streams on your table

This step is mandatory to follow in order to enable CDC on your DynamoDB source table(s).

By default DynamoDB table(s) has/have CDC turned off. In order to enable CDC on DynamoDB tables you are required to follow this steps listed here after:

  1. Connect to your AWS console;

  2. Enter inside the DynamoDB control plane;

  3. From the tables tab, select the table you’re looking to enable CDC onto;

  4. Once you’re inside the table details press on the Exports and streams tab;

  5. From the Exports and streams tab scroll down reaching DynamoDB stream details and then press Enable;

  6. A new window will appear telling you which of the 4 available options you would like to choose, select New and old images;

  7. Once selected, press Enable streams button.

Perform this operation per each one of the tables that are meant to be replicated under Gluesync.

Additional charges by AWS may apply. Please refer to the documentation at this link to know more about billing details for DynamoDB Streams.
In further versions we might support automatical CDC enablement via DynamoDB SDK.

Source database in DynamoDB

Despite other sources, DynamoDB doesn’t require you to specify the target database.

Here’s an example of a config file that supports DynamoDB:

{
  ...
  // (source section omitted)
  ...
  "sourceHost": "eu-central-1",
  "sourceUsername": "IAMCredentialAccessKeyID",
  "sourcePassword": "IAMCredentialSecret",
  ...
}

The parameters that refer to a target are the parameters that describe the connection to DynamoDB.

  • sourceHost: the name of the region where your bucket belongs to;

  • sourceUsername: IAM Credential access key ID created under your AWS IAM console;

  • sourcePassword: IAM Credential secret created under your AWS IAM console.

Source change retention

The sourceChangeRetention parameter present in the Gluesync’s core module that indicates the number of retention days that the source will use to maintain CDC data before clearing it, is by default set to the maximum value of 24 hours in DynamoDB and it is not possible to change. You can either remove or leave empty the key as it won’t have any effect on that specific source configuration.

Source entities key mapping for DynamoDB

Source entities in DynamoDB require a specific mapping since the partition key used per each DynamoDB’s table could be higly customizable by the user in order to optimize performances.

In the example below we show how a addresses table is being represented having a field _id and the separator used is ` ""` like the prefix for that use case.

{
  ...
  "entitiesKeyMapping": {
    "addresses": {
      "separator": "",
      "fields": ["_id"],
      "prefix": ""
    }
    ...
  }
  ...
}

This object is used to recreate the partition key per table joining every string in the field array using a separator defined in your json object adding the defined prefix, like per this further example:

{
  ...
  "entitiesKeyMapping": {
    "addresses": {
      "separator": ":",
      "fields": ["_id", "name"],
      "prefix": "gs_"
    }
    ...
  }
  ...
}

The result will be gs__id:name as a partition key for your addresses table.

Provision table(s) throughtput

As per the DynamoDB documentation here at this link you are required to specifiy table throughput values in order to enhace table performances. We suggest to follow this advice where you DynamoDB performances are not meeting the desidered requirements per your Gluesync replication use case.

This config piece should look like the one in the following example:

{
  ...
  "dynamodb": {
    "entitiesProvisionedThroughput": {
      "yourEntityNameHere": {
        "read": 10,
        "write": 10
      }
      ...
    }
  },
  ...
}

DynamoDB specific configurations are listed under the dynamodb property:

  • entitiesProvisionedThroughput: Provisioned resources per each table created through Gluesync as per the official documentation available here at this link. You are required to specify these parameters per each of your entities.

Tables automatically created by Gluesync like gs_state_preservation and gs_migrated_tables, already mentioned in the previous chapter, are by default managed using read and write throughtput values set both to 10. You can override this values just specifying the table names as per the example above inside the dynamodb object.