In this Uber Data Analytics project, I am leveraging modern data engineering tools like mage.ai, BigQuery, Looker Studio, and Cloud Storage on Google Cloud Platform (GCP). The project involves designing a comprehensive data pipeline and analytics dashboard to extract, transform, and visualize key insights from Uber's extensive datasets.
- Programming Language - Python
- Scripting Language - SQL
- Google Cloug Platform
- BigQuery
- Cloud Storage
- Looker Studio
- Compute Instance
- Mage.AI ( A modern data pipeline tool)
Modern data pipeline tool: https://www.mage.ai/
The Uber dataset provides a comprehensive view of ride-sharing activity, including details such as trip durations, distances, pickup and drop-off locations, and timestamps. This data is crucial for analyzing trends in user behavior, optimizing routes, and improving overall service efficiency.
Here is the data set used:https://github.com/Kindoli/Uber-data-engineering-mage-project/blob/main/data/uber_data.csv
-
Original dataset: https://www.nyc.gov/site/tlc/about/tlc-trip-record-data.page
-
Data Dictionary – Yellow Taxi Trip Records:https://www.nyc.gov/assets/tlc/downloads/pdf/data_dictionary_trip_records_yellow.pdf