spark-structured-streaming-kafka

Here are 2 public repositories matching this topic...

Aish-p / Data_Streaming

This project was developed as part of UE20CS343 - Database Technologies to build a real-time data streaming pipeline using Apache Kafka and Spark Structured Streaming. It simulates ingesting San Francisco crime data into Kafka, processing it with Spark, and performing aggregations and stream-table joins.

kafka-spark-crime-analysis spark-structured-streaming-kafka sf-crime-data-pipeline

Updated Mar 5, 2025
Python

imjuliengaupin / sparkler

Star

A sandbox environment designed to simulate a pseudo-distributed Hadoop cluster with integrated Apache Spark and Kafka components. It allows developers to prototype and experiment with big data workflows, test distributed computing patterns, and explore cluster behavior in a contained virtual setup.

java apache-spark hadoop sandbox apache-kafka pseudo-distributed-hadoop dataframes-api spark-structured-streaming-kafka

Updated Jun 10, 2025
Java

Improve this page

Add a description, image, and links to the spark-structured-streaming-kafka topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the spark-structured-streaming-kafka topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly