aws-reference-architectures/datalake
  • Overview of a Data Lake on AWS
  • Amazon S3: A Storage Foundation for Datalakes on AWS
    • Data lake Storage Architecture FAQs
  • Data Catalog Architecture
    • Schema Management Within a Data Catalog
  • Data Security and Access Control Architecture
    • Data Security and Access Control Using IAM
    • Fine-grained Access Control With AWS LakeFormation
  • Ingestion Architectures for Data lakes on AWS
    • Data Ingestion using Kinesis Firehose and Kinesis Producer Library (KPL)
    • Data Ingestion using Database Migration Service(DMS) and Lambda
    • Data Ingestion using Amazon Glue
    • Data Ingestion From On-Premise NFS using Amazon DataSync
  • Data Curation Architectures
    • Overwrite Table Partitions Using PySpark
  • Data Consumption Architectures
    • Query S3 Data lake using Athena and Glue Catalog
    • Query Data lake using Redshift Spectrum and Glue Catalog
    • Query Data lake using EMR and External Hive Metastore in VPC
    • Query Data lake using EMR and Glue Catalog
  • Code of Conduct
  • Contributing Guidelines
Powered by GitBook
On this page
  • Overview
  • Architecture Component Walkthrough
  • Have suggestions? Join our Slack channel to share feedback.

Was this helpful?

  1. Ingestion Architectures for Data lakes on AWS

Data Ingestion using Kinesis Firehose and Kinesis Producer Library (KPL)

PreviousIngestion Architectures for Data lakes on AWSNextData Ingestion using Database Migration Service(DMS) and Lambda

Last updated 5 years ago

Was this helpful?

Overview

makes it easy to collect, process, and analyze real-time, streaming data so you can get timely insights and react quickly to new information. Amazon Kinesis offers key capabilities to cost-effectively process streaming data at any scale, along with the flexibility to choose the tools that best suit the requirements of your application.

is fully managed service that including Amazon S3, Redshift, and the ElasticSearch Service. The simplifies producer application development, allowing developers to achieve high write throughput to a Kinesis Data Stream. In this example, KPL is used to write data to a Kinesis Data Stream from the producer application. Kinesis Firehose then reads this stream and batches incoming records into files and delivers them to S3 based on file buffer size/time limit defined in the Firehose configuration.

Architecture Component Walkthrough

  1. Firehose batches records based on count or target file size, and then compresses and encrypts files before delivering to S3

Your application to collect records and write to Kinesis Data Streams.

If your application resides in VPC, you can use an Internet Gateway or a to access the Kinesis Service.

Kinesis Firehose is

Have suggestions? Join our to share feedback.

uses the Kinesis Producer Library
Kinesis VPC Endpoint
configured to read data from the Kinesis Data Stream
Slack channel
Amazon Kinesis
Kinesis Data Firehose
delivers data to target locations
Kinesis Producer Library (KPL)
Ingestion using Kinesis Firehose and KPL