Data Consumption Architectures
Different ways to consume data from a data lake store.
An S3 datalake efficiently decouples storage and compute, which makes it is easy to build analytics applications that scale out with increases in demand. To analyze data in your datalake easily and efficiently, AWS has developed several managed and serverless big data services. The most commonly used services to run analytics on S3 data are: Amazon Athena, Redshift Spectrum, Amazon EMR, as well as other 3rd party and open source services. Some common reference architectures are outlined below.
Have suggestions? Join our Slack channel to share feedback.
PreviousOverwrite Table Partitions Using PySparkNextQuery S3 Data lake using Athena and Glue Catalog
Last updated