Data Ingestion From On-Premise NFS using Amazon DataSync
Last updated
Last updated
AWS DataSync is a fully managed data transfer service that simplifies, automates, and accelerates moving and replicating data between on-premises storage systems and AWS storage services over the internet or AWS Direct Connect. In a datalake environment, AWS DataSync can be used to sync files securely from on premise storage servers like NFS to S3 based datalake automatically.
In this architecture, we = walk you through how to use AWS DataSync and DataSync Agent to migrate data to a datalake in Amazon S3.
You create a network attached file storage server (NFS) inside your data center.
You install an AWS Datasync Agent as a VMware ESXi hypervisor based environment. This Agent will have read access on the NFS server.
You configure AWS DataSync with the locations required to perform syncronisation
Use an AWS Glue Crawler to catalog the S3 location that receives files via AWS DataSync.