Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

merge codebase-hydra-restructure into main #90

Merged
merged 8 commits into from
Mar 7, 2023

Conversation

cameronraysmith
Copy link
Collaborator

@cameronraysmith cameronraysmith commented Mar 5, 2023

To be followed by #92 .

@cameronraysmith cameronraysmith changed the title merge codebase hydra restructure into codebase merge codebase-hydra-restructure into codebase Mar 5, 2023
@cameronraysmith cameronraysmith changed the base branch from codebase to main March 5, 2023 18:20
@cameronraysmith cameronraysmith changed the title merge codebase-hydra-restructure into codebase merge codebase-hydra-restructure into main Mar 5, 2023
@cameronraysmith cameronraysmith added enhancement New feature or request refactoring Refactoring labels Mar 5, 2023
@cameronraysmith cameronraysmith added this to the 0.0.1 milestone Mar 5, 2023
@cameronraysmith cameronraysmith linked an issue Mar 5, 2023 that may be closed by this pull request
@cameronraysmith cameronraysmith self-assigned this Mar 7, 2023
@cameronraysmith cameronraysmith marked this pull request as ready for review March 7, 2023 16:28
@cameronraysmith cameronraysmith merged commit e9794d3 into main Mar 7, 2023
cameronraysmith added a commit that referenced this pull request Mar 7, 2023
* wip: dataloader first draf

* Fixing train, val, and test path

* Added initial project structure

Added a bunch of directories with (mostly) empty/dummy .py files for now, so that everyone can see what the project will be structured like. On top of the present directories, there will also be a datasets and a logs directory, the latter being dynamically created at traintime or validation time.

* rename file, remove one-hot encode

* Revert "wip: dataloader first draft"

* Updating component loading section

* sequence dataloader baseline model

* fixing a couple typos

* Delete src/metrics directory

Deleting metrics directory as it was decided we'll have only one file with all metrics.

* Added refactored DDPM and UNet from notebook V2

Refactored Lucas's DDPM, UNet and units and added them as PL modules.

* Update diffusion.py

Added "instantiate_from_config" import.

* Update ddpm.py

Added nucleotides as a parameter with a default of 4 to the sample method.

* wip: separate train/val/test subclasses

* Delete codebase/src/data directory

* Updated PL  dataloader

* placeholder test file

* Update unet_lucas.py

Added default function import.

* Added matching dummy test files

* complete: initial dataloader

* Added config template

Designed config template mainly for PL-related parameters. Keeping multiprocessing arguments for multi-GPU for the first test, which we'll change to multi-node. Diffusion and UNet parameters can easily vary.

* Delete dummy_config.yaml

* delete test_diffusion

* fix: fixed function naming convention

* feat: Add initial CI proposal

* feat: Add a simple pyproject config file

* wip: train.py + configs

* config folder structure update

* fix datapath param of datasets

* add additional sequence encoding schemes + separate transforms

* add tests for sequence dataloader

* add additional asserts for data batches

* check sequence lengths in datasets

* add more tests for invalid data

* style: run black

* feat: Refactor schedules and remove time_difference

* feat: Add type hints to schedule utility functions

* feat: Refactor noise schedule fn

* feat: refactor q_sample fn

* feat: add type hints to q_sample

* feat: drop bit_scale

* feat: run black and switch to torch.log

* feat: drop t_index

* feat: refactor p_sample fn

* feat: refactor p_sample_loop fn

* feat: refactor sample fn

* feat: refactor training_step fn

* feat(ci): Add `codebase` branch to CI

Based on discussion with @mateibejan1, running the tests on the `codebase` branch is also essential. It's the branch which is under heavy development and we should ensure all tests pass before we merge into `codebase` as well.

* reqs: add `pandas` to requirements.txt

* reqs: add `torch` to requirements.txt

* reqs: bump torch to `1.11.0` for compatibility

* fix(ci): run pytest as a module

* reqs: add torchvision to `0.12.0`

* reqs: add `pytorch-lightning`

* fix: failing CI tests for dataloader across platforms

* fix: failing CI tests for dataloader - wrap transforms

* fix: failing CI tests for dataloader - no multiprocessing for transforms

* Add Lucas' conditioned UNet

* Update EMA with Lucas' version

* Added mean_flat util from P2 paper

* Added P2 weighting skeleton. 

Need to figure out how to use P2 weighting on DNA data.

* misc: create a PR template

Fixes #51

* misc: add doc strings and type hints to the PR template

cc: @mateibejan1

* Add files via upload

* Add files via upload

* Add files via upload

Updated DDPM with the Noah's refactored notebook version. Preemptively added p2_weighting, need to figure out if/how it works on bit sequences.

* Add files via upload

* Add files via upload

* style: run black

* feat: add type hints to `utils/misc.py`

* feat: add type hints to utils/metrics

* feat: add type hints to utils/schedules

* feat: add type hints to unet_bitdiffusion

* feat: add type hints to unet_lucas

* feat: add type hints to ddim

* feat: add type hints to seq dataloader

* feat: add type hints to unet_lucas_cond

* Delete ddim.py

Deprecated.

* Delete unet_bitdiffusion.py

Deprecated.

* Update unet_conditional.yaml

Changed default number of timesteps from 1000 to 200.

* Update unet_conditional.yaml

Moved unet_config params inside the diffusion models params, so it mirrors the hierarchical relationship between the diffusion class and the unet class.

* Update misc.py

Minor dict property name changes.

* Update diffusion.py

* Update diffusion.py

* Update default.yaml

* Update unet_lucas.py

* initial test lucas unet

* add test vq

* ddm

* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci

* merge codebase-hydra-restructure into main (#90)

* WIP new folder structure

* ema parameter fix

* Base dataloader instantiation with full hydraconfig succesful, missing full params

* Update sequence_dataloader.py

* Remove outputs folder, update .gitignore

* Update network.py

* Update sequence_datamodule.py

* Update sequence_datamodule.py

---------

Co-authored-by: cmvcordova <cmvcordova@github.com>
Co-authored-by: cmvcordova <cmvcordova@pm.me>
Co-authored-by: Matei Bejan <24592776+mateibejan1@users.noreply.github.com>

---------

Co-authored-by: ssenan <simonsenan@gmail.com>
Co-authored-by: Matei Bejan <24592776+mateibejan1@users.noreply.github.com>
Co-authored-by: Bendidi Ihab <ihabnobendidi@gmail.com>
Co-authored-by: Saurav Maheshkar <sauravvmaheshkar@gmail.com>
Co-authored-by: Jan Sobotka <jsobotka1188@gmail.com>
Co-authored-by: ceziegler <cheyenneeziegler@gmail.com>
Co-authored-by: jamesthesnake <james.ryan.hennessy@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: cmvcordova <cmvcordova@github.com>
Co-authored-by: cmvcordova <cmvcordova@pm.me>
@cameronraysmith cameronraysmith deleted the codebase-hydra-restructure branch March 10, 2023 04:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request refactoring Refactoring
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

merge "codebase" work into default branch
3 participants