Integrating Airflow with AWS: Extracting Data through Apache Airflow And Store It In S3

Integrating Airflow with AWS: Extracting Data through Apache Airflow And Store It In S3

Goals and Achievements

  1. Install Apache Airflow in EC2 instance.

  2. Create a virtual environment and install all dependencies.

  3. Establish a connection between the EC2 instance and Visual Studio.

  4. Login into the airflow server.

  5. Build a DAG that fetches data from RapidAPI and puts it into an S3 bucket.

A command that needs to be run inside the instance.

  1. sudo apt update

  2. sudo apt install python3-pip

  3. sudo apt install python3.10-venv

  4. python3 -m venv endtoendyoutube_venv

  5. source endtoendyoutube_venv/bin/activate

  6. pip install --upgrade awscli

  7. sudo pip install apache-airflow airflow standalone

EC2 Instance and Visual Studio Connection

  1. Install Remote ssh extension in Visual Studio.

  2. Configure your host connection

  3. Select your path

  4. Edit your configurations according to your instance.

  5. Select your instance OS.

  6. Open your instance folder instance visual studio

    Build a DAG that fetches data from RapidAPI and puts it into an S3 bucket.

    Configure your S3 Bucket

    Run your DAG

    After its completion, data starts to appear in S3.

    "Thank You for Reading!"

    We sincerely appreciate you taking the time to read our blog. We hope you found the information valuable and that it helped you in your journey. Your interest and engagement mean a lot to us.

    If you have any questions, or feedback, or if there's a specific topic you'd like us to cover in the future, please don't hesitate to reach out. We're here to assist and provide you with the necessary knowledge and insights.

    Remember, your support is what keeps us motivated to continue creating content. Stay curious, keep learning, and thank you for being a part of our community!

    Warm regards,

    Chetan Sharma