Data Ingestion From On-Premise NFS using Amazon DataSync

Overview

AWS DataSyncarrow-up-right is a fully managed data transfer service that simplifies, automates, and accelerates moving and replicating data between on-premises storage systems and AWS storage services over the internet or AWS Direct Connect. In a datalake environment, AWS DataSync can be used to sync files securely from on premise storage servers like NFS to S3 based datalake automatically.

In this architecture, we = walk you through how to use AWS DataSync and DataSync Agent to migrate data to a datalake in Amazon S3.

Data Ingestion Amazon Glue

Architecture Component Walkthrough

  1. You create a network attached file storage server (NFS) inside your data center.

  2. You install an AWS Datasync Agentarrow-up-right as a VMware ESXi hypervisorarrow-up-right based environment. This Agent will have read access on the NFS server.

  3. You configure AWS DataSync with the locationsarrow-up-right required to perform syncronisation

  4. You createarrow-up-right and then startarrow-up-right an AWS DataSync task to synchronization files from NFS to S3.

  5. Use an AWS Glue Crawlerarrow-up-right to catalog the S3 location that receives files via AWS DataSync.

References

Have suggestions? Join our Slack channelarrow-up-right to share feedback.

Last updated

Was this helpful?