aws-reference-architectures/datalake
  • Overview of a Data Lake on AWS
  • Amazon S3: A Storage Foundation for Datalakes on AWS
    • Data lake Storage Architecture FAQs
  • Data Catalog Architecture
    • Schema Management Within a Data Catalog
  • Data Security and Access Control Architecture
    • Data Security and Access Control Using IAM
    • Fine-grained Access Control With AWS LakeFormation
  • Ingestion Architectures for Data lakes on AWS
    • Data Ingestion using Kinesis Firehose and Kinesis Producer Library (KPL)
    • Data Ingestion using Database Migration Service(DMS) and Lambda
    • Data Ingestion using Amazon Glue
    • Data Ingestion From On-Premise NFS using Amazon DataSync
  • Data Curation Architectures
    • Overwrite Table Partitions Using PySpark
  • Data Consumption Architectures
    • Query S3 Data lake using Athena and Glue Catalog
    • Query Data lake using Redshift Spectrum and Glue Catalog
    • Query Data lake using EMR and External Hive Metastore in VPC
    • Query Data lake using EMR and Glue Catalog
  • Code of Conduct
  • Contributing Guidelines
Powered by GitBook
On this page

Was this helpful?

Data Security and Access Control Architecture

PreviousSchema Management Within a Data CatalogNextData Security and Access Control Using IAM

Last updated 5 years ago

Was this helpful?

A data lake platform has various components that store data, execute jobs, orchestration tools and data consumption services, etc. Security for each type or even each component varies. Let's assume your data lake uses S3 as a storage . platform. Here are some examples of the kind of security to be used in some of the components at the platform level:

  • Data catalog access and users' roles - What accounts have access to a particular datasets in a data catalog and what roles do they use

  • Direct access to datasets- Either, objects stored in S3 or those used by the programs running as part of your data lake system, should have restricted access. Any system that has direct access to the datasets within a data lake should have fine grained access control

  • Jobs execution - Permissions to execute Jobs, YARN, or similar applications.

  • Administration utilities - Permissions to access and manage data platform’s components management utilities.

Primarily, access control and data security in datalakes within AWS can be enforced by

Have suggestions? Join our to share feedback.

Access control using IAM
Fine grained access control using AWS Lakeformation
Slack channel