Data Consumption Architectures

Different ways to consume data from a data lake store.

An S3 datalake efficiently decouples storage and compute, which makes it is easy to build analytics applications that scale out with increases in demand. To analyze data in your datalake easily and efficiently, AWS has developed several managed and serverless big data services. The most commonly used services to run analytics on S3 data are: Amazon Athena, Redshift Spectrum, Amazon EMR, as well as other 3rd party and open source services. Some common reference architectures are outlined below.

Have suggestions? Join our Slack channel to share feedback.

Last updated