You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This project was developed as part of UE20CS343 - Database Technologies to build a real-time data streaming pipeline using Apache Kafka and Spark Structured Streaming. It simulates ingesting San Francisco crime data into Kafka, processing it with Spark, and performing aggregations and stream-table joins.
A sandbox environment designed to simulate a pseudo-distributed Hadoop cluster with integrated Apache Spark and Kafka components. It allows developers to prototype and experiment with big data workflows, test distributed computing patterns, and explore cluster behavior in a contained virtual setup.