merge codebase-hydra-restructure into main #90

cameronraysmith · 2023-03-05T17:51:42Z

To be followed by #92 .

…g full params

@mateibejan1

* wip: dataloader first draf * Fixing train, val, and test path * Added initial project structure Added a bunch of directories with (mostly) empty/dummy .py files for now, so that everyone can see what the project will be structured like. On top of the present directories, there will also be a datasets and a logs directory, the latter being dynamically created at traintime or validation time. * rename file, remove one-hot encode * Revert "wip: dataloader first draft" * Updating component loading section * sequence dataloader baseline model * fixing a couple typos * Delete src/metrics directory Deleting metrics directory as it was decided we'll have only one file with all metrics. * Added refactored DDPM and UNet from notebook V2 Refactored Lucas's DDPM, UNet and units and added them as PL modules. * Update diffusion.py Added "instantiate_from_config" import. * Update ddpm.py Added nucleotides as a parameter with a default of 4 to the sample method. * wip: separate train/val/test subclasses * Delete codebase/src/data directory * Updated PL dataloader * placeholder test file * Update unet_lucas.py Added default function import. * Added matching dummy test files * complete: initial dataloader * Added config template Designed config template mainly for PL-related parameters. Keeping multiprocessing arguments for multi-GPU for the first test, which we'll change to multi-node. Diffusion and UNet parameters can easily vary. * Delete dummy_config.yaml * delete test_diffusion * fix: fixed function naming convention * feat: Add initial CI proposal * feat: Add a simple pyproject config file * wip: train.py + configs * config folder structure update * fix datapath param of datasets * add additional sequence encoding schemes + separate transforms * add tests for sequence dataloader * add additional asserts for data batches * check sequence lengths in datasets * add more tests for invalid data * style: run black * feat: Refactor schedules and remove time_difference * feat: Add type hints to schedule utility functions * feat: Refactor noise schedule fn * feat: refactor q_sample fn * feat: add type hints to q_sample * feat: drop bit_scale * feat: run black and switch to torch.log * feat: drop t_index * feat: refactor p_sample fn * feat: refactor p_sample_loop fn * feat: refactor sample fn * feat: refactor training_step fn * feat(ci): Add `codebase` branch to CI Based on discussion with @mateibejan1, running the tests on the `codebase` branch is also essential. It's the branch which is under heavy development and we should ensure all tests pass before we merge into `codebase` as well. * reqs: add `pandas` to requirements.txt * reqs: add `torch` to requirements.txt * reqs: bump torch to `1.11.0` for compatibility * fix(ci): run pytest as a module * reqs: add torchvision to `0.12.0` * reqs: add `pytorch-lightning` * fix: failing CI tests for dataloader across platforms * fix: failing CI tests for dataloader - wrap transforms * fix: failing CI tests for dataloader - no multiprocessing for transforms * Add Lucas' conditioned UNet * Update EMA with Lucas' version * Added mean_flat util from P2 paper * Added P2 weighting skeleton. Need to figure out how to use P2 weighting on DNA data. * misc: create a PR template Fixes #51 * misc: add doc strings and type hints to the PR template cc: @mateibejan1 * Add files via upload * Add files via upload * Add files via upload Updated DDPM with the Noah's refactored notebook version. Preemptively added p2_weighting, need to figure out if/how it works on bit sequences. * Add files via upload * Add files via upload * style: run black * feat: add type hints to `utils/misc.py` * feat: add type hints to utils/metrics * feat: add type hints to utils/schedules * feat: add type hints to unet_bitdiffusion * feat: add type hints to unet_lucas * feat: add type hints to ddim * feat: add type hints to seq dataloader * feat: add type hints to unet_lucas_cond * Delete ddim.py Deprecated. * Delete unet_bitdiffusion.py Deprecated. * Update unet_conditional.yaml Changed default number of timesteps from 1000 to 200. * Update unet_conditional.yaml Moved unet_config params inside the diffusion models params, so it mirrors the hierarchical relationship between the diffusion class and the unet class. * Update misc.py Minor dict property name changes. * Update diffusion.py * Update diffusion.py * Update default.yaml * Update unet_lucas.py * initial test lucas unet * add test vq * ddm * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * merge codebase-hydra-restructure into main (#90) * WIP new folder structure * ema parameter fix * Base dataloader instantiation with full hydraconfig succesful, missing full params * Update sequence_dataloader.py * Remove outputs folder, update .gitignore * Update network.py * Update sequence_datamodule.py * Update sequence_datamodule.py --------- Co-authored-by: cmvcordova <cmvcordova@github.com> Co-authored-by: cmvcordova <cmvcordova@pm.me> Co-authored-by: Matei Bejan <24592776+mateibejan1@users.noreply.github.com> --------- Co-authored-by: ssenan <simonsenan@gmail.com> Co-authored-by: Matei Bejan <24592776+mateibejan1@users.noreply.github.com> Co-authored-by: Bendidi Ihab <ihabnobendidi@gmail.com> Co-authored-by: Saurav Maheshkar <sauravvmaheshkar@gmail.com> Co-authored-by: Jan Sobotka <jsobotka1188@gmail.com> Co-authored-by: ceziegler <cheyenneeziegler@gmail.com> Co-authored-by: jamesthesnake <james.ryan.hennessy@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: cmvcordova <cmvcordova@github.com> Co-authored-by: cmvcordova <cmvcordova@pm.me>

cameronraysmith changed the title ~~merge codebase hydra restructure into codebase~~ merge codebase-hydra-restructure into codebase Mar 5, 2023

This was referenced Mar 5, 2023

merge codebase changes on main into default branch #89

Closed

merge codebase into main #91

Closed

merge codebase changes on main into default #92

Merged

cameronraysmith changed the base branch from codebase to main March 5, 2023 18:20

cameronraysmith changed the title ~~merge codebase-hydra-restructure into codebase~~ merge codebase-hydra-restructure into main Mar 5, 2023

cameronraysmith added enhancement New feature or request refactoring Refactoring labels Mar 5, 2023

cameronraysmith added this to the 0.0.1 milestone Mar 5, 2023

cameronraysmith linked an issue Mar 5, 2023 that may be closed by this pull request

merge "codebase" work into default branch #93

Closed

cmvcordova and others added 8 commits March 7, 2023 11:21

WIP new folder structure

9367948

ema parameter fix

d6e84e8

Base dataloader instantiation with full hydraconfig succesful, missin…

8e38455

…g full params

Update sequence_dataloader.py

ecfbdb0

Remove outputs folder, update .gitignore

c4719d7

Update network.py

d1845fc

Update sequence_datamodule.py

9fc83e0

Update sequence_datamodule.py

bbe8da4

cameronraysmith force-pushed the codebase-hydra-restructure branch from 6897a69 to bbe8da4 Compare March 7, 2023 16:24

cameronraysmith self-assigned this Mar 7, 2023

cameronraysmith requested review from mateibejan1, ssenan, LucasSilvaFerreira and cmvcordova March 7, 2023 16:27

cameronraysmith marked this pull request as ready for review March 7, 2023 16:28

cameronraysmith merged commit e9794d3 into main Mar 7, 2023

cameronraysmith deleted the codebase-hydra-restructure branch March 10, 2023 04:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

merge codebase-hydra-restructure into main #90

merge codebase-hydra-restructure into main #90

cameronraysmith commented Mar 5, 2023 •

edited

Loading

merge codebase-hydra-restructure into main #90

merge codebase-hydra-restructure into main #90

Conversation

cameronraysmith commented Mar 5, 2023 • edited Loading

cameronraysmith commented Mar 5, 2023 •

edited

Loading