aws-reference-architectures/datalake
  • Overview of a Data Lake on AWS
  • Amazon S3: A Storage Foundation for Datalakes on AWS
    • Data lake Storage Architecture FAQs
  • Data Catalog Architecture
    • Schema Management Within a Data Catalog
  • Data Security and Access Control Architecture
    • Data Security and Access Control Using IAM
    • Fine-grained Access Control With AWS LakeFormation
  • Ingestion Architectures for Data lakes on AWS
    • Data Ingestion using Kinesis Firehose and Kinesis Producer Library (KPL)
    • Data Ingestion using Database Migration Service(DMS) and Lambda
    • Data Ingestion using Amazon Glue
    • Data Ingestion From On-Premise NFS using Amazon DataSync
  • Data Curation Architectures
    • Overwrite Table Partitions Using PySpark
  • Data Consumption Architectures
    • Query S3 Data lake using Athena and Glue Catalog
    • Query Data lake using Redshift Spectrum and Glue Catalog
    • Query Data lake using EMR and External Hive Metastore in VPC
    • Query Data lake using EMR and Glue Catalog
  • Code of Conduct
  • Contributing Guidelines
Powered by GitBook
On this page
  • Access Control Using Amazon LakeFormation
  • Access policy options with AWS LakeFormation
  • How it works under the hood?
  • Have suggestions? Join our Slack channel to share feedback.

Was this helpful?

  1. Data Security and Access Control Architecture

Fine-grained Access Control With AWS LakeFormation

PreviousData Security and Access Control Using IAMNextIngestion Architectures for Data lakes on AWS

Last updated 5 years ago

Was this helpful?

Access Control Using Amazon LakeFormation

Overview

are complex systems. They not only get data from many source systems but also many systems and users consume data from them. For large-scale implementations it becomes complicated for customers to manage thousands of IAM roles and policy that controls access to their datalake. simplifies datalake access management by providing on the data lake catalog. Please refer to various AWS LakeFormation terminologies .

In this model, a set of initial represented by IAM roles and users known as can grant Lake Formation permissions on data locations and Data Catalog resources to any principal (including self). Once the permissions on various data lake objects are created, users (IAM Users, Roles) can access to the datalake objects through their preferred compute engines like , , that's integrated with . Lake formation authorizes access to datalake resources when the execute workloads on data catalog.

Access policy options with AWS LakeFormation

  • Metadata access control – Permissions on Data Catalog resources (Data Catalog permissions).

    These permissions enable principals to create, read, update, and delete metadata databases and tables in the Data Catalog.

  • Data access permissions enable principals to read and write data to underlying Amazon S3 locations. Data location permissions enable principals to create metadata databases and tables that point to specific Amazon S3 locations.

As of 03/31/2020, AWS Lake formation provides prermissions at various levels.

  1. Catalog level permissions

    1. CREATE_DATABASE

  2. Database level permissions

    1. CREATE

    2. ALTER

    3. DROP

  3. Tables level permissions

    1. ALTER

    2. DROP

    3. SELECT - Applicable to underlying data

    4. DELETE - Applicable to underlying data

    5. INSERT - Applicable to underlying data

  4. Column level permissions

    1. SELECT - Applicable to underlying data

How it works under the hood?

When an user runs a workload on Lakeformation catalog using an integrated compute service, the compute service requests for access to Lakeformation. Based on the access level defined on the catalog objects, AWS Lake formation vends short-term credentials to the compute service. The compute service then uses the temporary credentials to directly access S3 objects and execute the workload. For column level access control, the compute engine filters out attributes that the user/role don't have access to after the objects have been downloaded from S3 as part processing(as of 03/31/2020).

is divided into the following two areas:

Underlying data access control – Permissions on locations in (data access permissions and data location permissions).

For both areas, Lake Formation uses a combination of Lake Formation permissions and permissions. The IAM permissions model consists of IAM policies. The Lake Formation permissions model is implemented as DBMS-style GRANT/REVOKE commands, such as Grant SELECT on tableName to userName.

Have suggestions? Join our to share feedback.

Access control in AWS Lake Formation
Amazon Simple Storage Service (Amazon S3)
AWS Identity and Access Management (IAM)
Slack channel
Data lakes
AWS LakeFormation
fine grained access control mechanisms
here
data stewards
Datalake Administrators
EMR
Glue
Athena
AWS LakeFomation
integrated compute engines