aws-reference-architectures/datalake
  • Overview of a Data Lake on AWS
  • Amazon S3: A Storage Foundation for Datalakes on AWS
    • Data lake Storage Architecture FAQs
  • Data Catalog Architecture
    • Schema Management Within a Data Catalog
  • Data Security and Access Control Architecture
    • Data Security and Access Control Using IAM
    • Fine-grained Access Control With AWS LakeFormation
  • Ingestion Architectures for Data lakes on AWS
    • Data Ingestion using Kinesis Firehose and Kinesis Producer Library (KPL)
    • Data Ingestion using Database Migration Service(DMS) and Lambda
    • Data Ingestion using Amazon Glue
    • Data Ingestion From On-Premise NFS using Amazon DataSync
  • Data Curation Architectures
    • Overwrite Table Partitions Using PySpark
  • Data Consumption Architectures
    • Query S3 Data lake using Athena and Glue Catalog
    • Query Data lake using Redshift Spectrum and Glue Catalog
    • Query Data lake using EMR and External Hive Metastore in VPC
    • Query Data lake using EMR and Glue Catalog
  • Code of Conduct
  • Contributing Guidelines
Powered by GitBook
On this page
  • Overview
  • Architecture Component Walkthrough
  • References
  • Have suggestions? Join our Slack channel to share feedback.

Was this helpful?

  1. Ingestion Architectures for Data lakes on AWS

Data Ingestion From On-Premise NFS using Amazon DataSync

PreviousData Ingestion using Amazon GlueNextData Curation Architectures

Last updated 5 years ago

Was this helpful?

Overview

is a fully managed data transfer service that simplifies, automates, and accelerates moving and replicating data between on-premises storage systems and AWS storage services over the internet or AWS Direct Connect. In a datalake environment, AWS DataSync can be used to sync files securely from on premise storage servers like NFS to S3 based datalake automatically.

In this architecture, we = walk you through how to use AWS DataSync and DataSync Agent to migrate data to a datalake in Amazon S3.

Architecture Component Walkthrough

  1. You create a network attached file storage server (NFS) inside your data center.

References

You as a VMware ESXi based environment. This Agent will have read access on the NFS server.

You configure AWS DataSync with the required to perform syncronisation

You and then an AWS DataSync task to synchronization files from NFS to S3.

Use an to catalog the S3 location that receives files via AWS DataSync.

Have suggestions? Join our to share feedback.

install an AWS Datasync Agent
hypervisor
locations
create
start
AWS Glue Crawler
Getting started with AWS DataSync
How AWS DataSync works
Slack channel
AWS DataSync
Data Ingestion Amazon Glue