This project demonstrates a complete data pipeline and analysis workflow, starting from raw JSON data to generating interactive dashboards. The pipeline showcases the integration of modern cloud-based data storage, processing, and visualization tools.
It reflects real-world scenarios of handling unstructured data, transforming it, storing it securely in the cloud, and deriving actionable insights through business intelligence tools.
Tool | Purpose |
---|---|
JSON | Raw data source format |
Python | Data preprocessing, cleaning & transformation |
Amazon S3 | Cloud storage for structured & raw data |
Snowflake | Cloud data warehousing & querying |
PowerBI | Data visualization & dashboarding |
- Extract JSON Data (Simulated Yelp Dataset)
- Clean & Transform Data using Python
- Upload Clean Data to Amazon S3
- Load Data from S3 into Snowflake
- Query & Analyze Data within Snowflake
- Visualize Insights using PowerBI Dashboards
├── data/
│ └── raw/ # Original JSON data
│ └── processed/ # Cleaned and transformed data
│
├── scripts/
│ ├── extract_data.py # Load & Explore JSON data
│ ├── transform_data.py # Clean & structure data
│ ├── upload_s3.py # Upload to Amazon S3
│ ├── snowflake_loader.py # Load into Snowflake
│
├── powerbi/ # PowerBI dashboard files (.pbix)
│
├── requirements.txt # Python dependencies
├── README.md # Project documentation
└── .gitignore
The PowerBI dashboard showcases:
- Rating distribution
- Customer sentiment trends
- Top categories by review count
- Geographic distribution of reviews
- Time-series trends analysis
Name: Kindoli Edward
Role: Data Analyst | Data Engineer | BI Developer
GitHub: https://github.com/Kindoli
LinkedIn: https://www.linkedin.com/in/kindoli-edward-5058544a/