MeadowData

MeadowData is a set of libraries and services that provides an integrated environment for data scientists and data engineers:

meadowdb: A columnar database designed to make experimentation effortless
meadowflow: A job scheduler that automatically manages your data dependencies
meadowgrid: A cluster manager specifically designed to allocate resources batch jobs and distributed compute jobs that run your code in parallel

Why MeadowData

meadowflow and meadowdb work together to capture what data is read and written by which jobs. This effect system of sorts enables powerful scenarios. For example:

A job can be scheduled to automatically run whenever its dependencies are updated. In a traditional job scheduler, the job definition's dependencies and the actual code's data dependencies can get out of sync, which can result in jobs running before their dependencies are ready, reading stale data, and thus producing incorrect results.
Running a regression test on a job or reviewing it for model changes is almost no extra work. meadowflow/meadowdb can run a test run of a job with local code changes while redirecting all of its outputs to a userspace for comparison.

Getting started

See examples/meadowgrid.md for an introduction to meadowgrid.
See examples/covid for an introduction to meadowdb and meadowflow.
See examples/covid/regression_test.md for a deeper dive on regression tests/model change reviews.
See readerwriter_shared.py for an introduction to the meadowdb data layout.

Name		Name	Last commit message	Last commit date
Latest commit History 175 Commits
.github		.github
docker_images/meadowdata		docker_images/meadowdata
examples		examples
src		src
test_data		test_data
tests		tests
.flake8		.flake8
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
build_docker_image.bat		build_docker_image.bat
build_meadowdata_amis.md		build_meadowdata_amis.md
clean_test_data.bat		clean_test_data.bat
generate_protobufs.bat		generate_protobufs.bat
poetry.lock		poetry.lock
publish_docker_image.bat		publish_docker_image.bat
publish_pypi.bat		publish_pypi.bat
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MeadowData

Why MeadowData

Getting started

About

Releases

Packages

Languages

License

hrichardlee/meadowflow

Folders and files

Latest commit

History

Repository files navigation

MeadowData

Why MeadowData

Getting started

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages