big-data

Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.

python aws data-science machine-learning caffe theano big-data spark deep-learning hadoop tensorflow numpy scikit-learn keras pandas kaggle scipy matplotlib mapreduce

Updated Mar 20, 2024
Python

apache / flink

Star

Apache Flink

python java scala sql big-data flink

Updated Sep 19, 2024
Java

amark / gun

Sponsor

Star

An open source cybersecurity protocol for syncing decentralized graph data.

Updated Aug 10, 2024
JavaScript

prestodb / presto

Star

The official home of the Presto distributed SQL query engine for big data

java data query sql big-data presto hive hadoop lakehouse

Updated Sep 19, 2024
Java

heibaiying / BigData-Notes

Star

大数据入门指南 ⭐

phoenix scala kafka big-data spark yarn hive hadoop storm bigdata hbase zookeeper hdfs mapreduce flume azkaban sqoop

Updated Jan 5, 2024
Java

questdb / questdb

Star

QuestDB is an open source time-series database for fast ingest and SQL queries

java iot postgres sql database big-data time-series analytics cpp grafana postgresql simd low-latency financial-analysis tsdb hacktoberfest time-series-database questdb

Updated Sep 19, 2024
Java

andkret / Cookbook

Star

The Data Engineering Cookbook

big-data best-practices cookbook data-engineering data-engineer

Updated Aug 1, 2024

apache / predictionio

Star

PredictionIO, a machine learning server for developers and ML engineers.

scala big-data predictionio

Updated Jan 9, 2021
Scala

yahoo / CMAK

Star

CMAK is a tool for managing Apache Kafka clusters

scala kafka big-data cluster-management

Updated Aug 2, 2023
Scala

vesoft-inc / nebula

Star

A distributed, fast open-source graph database featuring horizontal scalability and high availability

distributed-systems database big-data cpp graph raft scalability distributed graph-database graphdb hacktoberfest nebula nebula-graph nebulagraph

Updated Sep 19, 2024
C++

trinodb / trino

Star

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)

java distributed-systems data-science sql database big-data presto hive hadoop analytics jdbc databases distributed-database query-engine iceberg datalake prestodb trino delta-lake

Updated Sep 19, 2024
Java

provectus / kafka-ui

Star

Open-Source Web UI for Apache Kafka Management

opensource kafka big-data web-ui streams kafka-connect apache-kafka kafka-producer kafka-client kafka-streams hacktoberfest streaming-data kafka-manager kafka-cluster event-streaming cluster-management kafka-ui kafka-brokers

Updated Jul 26, 2024
Java

cython / cython

Star

The most widely used Python to C compiler

python c performance big-data cpp cython cpython cpython-extensions

Updated Sep 19, 2024
Python

StarRocks / starrocks

Star

StarRocks, a Linux Foundation project, is a next-generation sub-second MPP OLAP database for full analytics scenarios, including multi-dimensional analytics, real-time analytics, and ad-hoc queries.

Updated Sep 19, 2024
Java

catboost / catboost

Star

A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.