# Query S3 Data lake using Athena and Glue Catalog

## Overview

[Amazon Athena](https://aws.amazon.com/athena/) is a serverless interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. Athena is out-of-the-box integrated with AWS Glue Data Catalog, which makes it very fast and easy to start running queries against your datalake. This is one of the simplest data lake architectures, as Amazon Athena is natively integrated with S3 data through the [AWS Glue Catalog](https://aws.amazon.com/glue/). [Glue Crawlers](https://docs.aws.amazon.com/glue/latest/dg/add-crawler.html) can be optionally used to create and maintain the data catalog.

![Query S3 Data lake using Athena](https://2553439727-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-LXQF3JgpYb-IDUgkC6e%2F-LXUCd_m6SPY3a3y3Qyr%2F-LXUCf6grim1GYFI-svR%2Fanalytics-athena.png?generation=1548859390057584\&alt=media)

## Architecture Component Walkthrough

1. AWS Glue Catalog stores schema and partition metadata of datasets residing in S3 datalake.
2. Amazon Glue Crawler can be (optionally) used to create and update the data catalogs periodically. If you know the schema of your data, you may want to use Athena to define tables directly in the Glue catalog using Hive DDL syntax.
3. Athena uses the Glue Data Catalog to extract schema definitions by default, which are then used to format and query data on S3. Wherever possible, it is recommended to use data partitioning, compression,  columnar serialization formats in S3 for better query performance.

## References

* [Athena Best Practices](https://aws.amazon.com/blogs/big-data/top-10-performance-tuning-tips-for-amazon-athena/)

## Have suggestions? Join our [Slack channel](https://join.slack.com/t/cat-cwp4274/shared_invite/zt-e2ztjpgw-Bugw46iXsLbZ~V54AljWsA) to  share feedback.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://aws-reference-architectures.gitbook.io/datalake/data-analytics/amazon-athena.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
