Data Consumption Architectures

Different ways to consume data from a data lake store.

An S3 datalake efficiently decouples storage and compute, which makes it is easy to build analytics applications that scale out with increases in demand. To analyze data in your datalake easily and efficiently, AWS has developed several managed and serverless big data services. The most commonly used services to run analytics on S3 data are: Amazon Athena, Redshift Spectrum, Amazon EMR, as well as other 3rd party and open source services. Some common reference architectures are outlined below.

Querying Data lake using Athena
Querying Data lake using Redshift Spectrum
Querying Data lake using EMR and External Hive Catalog
Querying Datalake using EMR

PreviousOverwrite Table Partitions Using PySpark NextQuery S3 Data lake using Athena and Glue Catalog

Last updated 5 years ago

Was this helpful?

Have suggestions? Join our Slack channel to share feedback.