Qbeast-spark: DataSource enabling multi-dimensional indexing and efficient data sampling. Big Data, free from the unnecessary!
-
Updated
Oct 1, 2024 - Scala
Qbeast-spark: DataSource enabling multi-dimensional indexing and efficient data sampling. Big Data, free from the unnecessary!
A curated list of open source tools used in analytical stacks and data engineering ecosystem
Sample Data Lakehouse deployed in Docker containers using Apache Iceberg, Minio, Trino and a Hive Metastore. Can be used for local testing.
DatAasee - A Metadata-Lake for Libraries
My M.Sc. dissertation: Modern Data Platform using DataOps, Kubernetes, and Cloud-Native ecosystem to build a resilient Big Data platform based on Data Lakehouse architecture which is the base for Machine Learning (MLOps) and Artificial Intelligence (AIOps).
The project aims to process Formula 1 racing data, create an automated data pipeline, and make the data available for presentation and analysis purposes.
This repository is a place for the Data Warehousing course at the Information Systems & Analytics department, Santa Clara University.
🦖 Efficiently evolve your old fixed-length data files into more modern file formats, fully parallelized!
Data lakehouse at home with docker compose
STEDI project
Всё что нужно знать про DuckDB
This project is aimed at overhauling a university's data infrastructure to improve efficiency, security, and scalability, resulting in the successful creation of a unified data management solution.
Qbeast-spark: DataSource enabling multi-dimensional indexing and efficient data sampling. Big Data, free from the unnecessary!
#Test - Create a Data Lakehouse in Kubernetes
This project implements an end-to-end techstack for a data platform, for local development.
This is an example project how to build a serverless data lakehouse on AWS using Terraform, Apache Iceberg and Spark.
Инфраструктура для data engineer S3
Add a description, image, and links to the data-lakehouse topic page so that developers can more easily learn about it.
To associate your repository with the data-lakehouse topic, visit your repo's landing page and select "manage topics."