Data Ingestion using Database Migration Service(DMS) and Lambda

Overview

The AWS Database Migration Service(DMS)arrow-up-right is a managed service to migrate data into AWS. It can replicate data from operational databases and data warehouses (on premises or AWS) to a variety of targets, including S3 datalakes. In this architecture, DMS is used to capture changed records from relational databases on RDS or EC2 and write them into S3. AWS Lambdaarrow-up-right, a serverless compute service, is used to transform and partition datasets based on their arrival time in S3 for better query performance.

Data Ingestion using DMS and Lambda

Architecture Component Walkthrough

  1. Create a Relational databases on EC2 or RDS within a VPC.

  2. Create a Staging S3 location to store changes captured by DMS.

  3. Create a Replication Instancearrow-up-right using the DMS API's or console

  4. Create an IAM role for AWS Lambdaarrow-up-right which has read access on the staging S3 bucket and write access on target datalake location.

  5. Create a Lambda functionarrow-up-right to trigger custom codearrow-up-right execution with s3:ObjectCreated:* requests to the staging S3 bucket. The function writes the same objects to the target datalake location on S3 with partitions based on the LastModifiedarrow-up-right metadata attribute of S3 objects.

  6. Create a DMS Taskarrow-up-right to migrate data from your source system to target location.

  7. The DMS Replication Instance will then connect to the source via elastic network interface(ENI)arrow-up-right, and write to the S3 staging location. AWS Lambda will receive the PutObject events, and use the S3 Copy APIarrow-up-right to reorganise the data into your datalake.

Sample Lambda Function Using Python

Have suggestions? Join our Slack channelarrow-up-right to share feedback.

Last updated

Was this helpful?