
MLB Cloud Data Warehouse Data Pipeline (AWS)
Python, Snowflake, R, Docker, DBT, AWS
Project
Overview
This project showcases a modern MLB data engineering pipeline migrated from local infrastructure to AWS. The pipeline leverages AWS ECS Fargate containers scheduled via EventBridge to run daily data ingestion using R's hoopR library. Raw MLB statistics are loaded into Snowflake, where dbt Core handles transformations and data quality. The containerized architecture provides scalability and cost efficiency while maintaining the existing R-based extraction logic and dbt transformation layers. This migration demonstrates expertise in cloud infrastructure, data warehousing, and orchestration using AWS-native services.
Check out my blog post that goes more into detail on how I was able to create this amazing project!
Technologies
Python
Snowflake
R
Docker
DBT
AWS