Data Ingestion From On-Premise NFS using Amazon DataSync

Overview

AWS DataSync is a fully managed data transfer service that simplifies, automates, and accelerates moving and replicating data between on-premises storage systems and AWS storage services over the internet or AWS Direct Connect. In a datalake environment, AWS DataSync can be used to sync files securely from on premise storage servers like NFS to S3 based datalake automatically.

In this architecture, we = walk you through how to use AWS DataSync and DataSync Agent to migrate data to a datalake in Amazon S3.

Architecture Component Walkthrough

You create a network attached file storage server (NFS) inside your data center.
You install an AWS Datasync Agent as a VMware ESXi hypervisor based environment. This Agent will have read access on the NFS server.
You configure AWS DataSync with the locations required to perform syncronisation
You create and then start an AWS DataSync task to synchronization files from NFS to S3.
Use an AWS Glue Crawler to catalog the S3 location that receives files via AWS DataSync.

References

PreviousData Ingestion using Amazon Glue NextData Curation Architectures

Last updated 5 years ago

Was this helpful?

Overview

Architecture Component Walkthrough

References

Have suggestions? Join our Slack channel to share feedback.