Fine-grained Access Control With AWS LakeFormation
Last updated
Last updated
Data lakes are complex systems. They not only get data from many source systems but also many systems and users consume data from them. For large-scale implementations it becomes complicated for customers to manage thousands of IAM roles and policy that controls access to their datalake. AWS LakeFormation simplifies datalake access management by providing fine grained access control mechanisms on the data lake catalog. Please refer to various AWS LakeFormation terminologies here.
In this model, a set of initial data stewards represented by IAM roles and users known as Datalake Administrators can grant Lake Formation permissions on data locations and Data Catalog resources to any principal (including self). Once the permissions on various data lake objects are created, users (IAM Users, Roles) can access to the datalake objects through their preferred compute engines like EMR, Glue, Athena that's integrated with AWS LakeFomation. Lake formation authorizes access to datalake resources when the integrated compute engines execute workloads on data catalog.
Access control in AWS Lake Formation is divided into the following two areas:
Metadata access control – Permissions on Data Catalog resources (Data Catalog permissions).
These permissions enable principals to create, read, update, and delete metadata databases and tables in the Data Catalog.
Underlying data access control – Permissions on locations in Amazon Simple Storage Service (Amazon S3) (data access permissions and data location permissions).
Data access permissions enable principals to read and write data to underlying Amazon S3 locations. Data location permissions enable principals to create metadata databases and tables that point to specific Amazon S3 locations.
For both areas, Lake Formation uses a combination of Lake Formation permissions and AWS Identity and Access Management (IAM) permissions. The IAM permissions model consists of IAM policies. The Lake Formation permissions model is implemented as DBMS-style GRANT/REVOKE commands, such as Grant SELECT on
tableName
to
userName
.
As of 03/31/2020, AWS Lake formation provides prermissions at various levels.
Catalog level permissions
CREATE_DATABASE
Database level permissions
CREATE
ALTER
DROP
Tables level permissions
ALTER
DROP
SELECT - Applicable to underlying data
DELETE - Applicable to underlying data
INSERT - Applicable to underlying data
Column level permissions
SELECT - Applicable to underlying data
When an user runs a workload on Lakeformation catalog using an integrated compute service, the compute service requests for access to Lakeformation. Based on the access level defined on the catalog objects, AWS Lake formation vends short-term credentials to the compute service. The compute service then uses the temporary credentials to directly access S3 objects and execute the workload. For column level access control, the compute engine filters out attributes that the user/role don't have access to after the objects have been downloaded from S3 as part processing(as of 03/31/2020).