There is a new version of this tutorial available for Debian 12 (Bookworm).

Airflow S3 Download | Operator |work|

In Apache Airflow, downloading files from Amazon S3 is a cornerstone of many data engineering pipelines. While there isn't a single "S3DownloadOperator," the functionality is primarily handled by the (for transferring data to a worker's local storage) or the S3Hook (for granular control within a Python function). Key Operators for S3 Downloads

: Since version 6.0.0 of the Amazon provider, you should typically use the Amazon Web Services connection type in the Airflow UI rather than a specific "S3" connection type.

from airflow.providers.amazon.aws.hooks.s3 import S3Hook from airflow.operators.python import PythonOperator def download_from_s3(key, bucket_name, local_path): hook = S3Hook(aws_conn_id='aws_default') file_name = hook.download_file(key=key, bucket_name=bucket_name, local_path=local_path) return file_name task_download = PythonOperator( task_id='hook_download', python_callable=download_from_s3, op_kwargs={ 'key': 'logs/daily_report.json', 'bucket_name': 'my-analytics-bucket', 'local_path': '/tmp/' } ) Use code with caution. Best Practices and Considerations airflow s3 download operator

: Before attempting a download, it is best practice to use an S3KeySensor to ensure the file actually exists, preventing task failures due to missing data.

: This is the most direct way to move a file from an S3 bucket to the local file system of an Airflow worker. It is highly efficient for smaller files that need to be processed by a downstream bash script or a local tool. In Apache Airflow, downloading files from Amazon S3

If you need to rename files dynamically or handle logic based on file content, use the S3Hook .

: Always use IAM roles or managed AWS Connections rather than hardcoding secrets in your DAG files. Airflow AWS S3 Sensor Operator: Airflow Tutorial P12 from airflow

from airflow.providers.amazon.aws.transfers.s3_to_local import S3ToLocalOperator download_task = S3ToLocalOperator( task_id='download_s3_file', bucket_name='my-source-bucket', s3_key='data/input_file.csv', local_full_path='/tmp/input_file.csv', aws_conn_id='aws_default' ) Use code with caution. 2. Using S3Hook with PythonOperator

: Often used when you need to download a file, perform a transformation (like a shell script or a local binary), and then immediately upload it to another S3 location.

Share this page:

airflow s3 download operator airflow s3 download operator airflow s3 download operator airflow s3 download operator

2 Comment(s)

airflow s3 download operator