Adding the modified NB #60

noahweber1 · 2022-11-28T11:58:58Z

Concrete improvements from the previous notebook:

Complete restructuring and refactoring of the code.
Delete unecessary code
Seed set for complete reproducibility (such that we know when improvements are really improving and not due to randomness)
Decoupled the logic between data loading and processing
Wrote doc strings for most of the classes and functions
Exposed hyperparameters immediately at the beggining (for easier hydra refactoring)
Added more functionality (such as injectable/interchangable metric functions etc.)
Documented all of the main chapters and subchapters, with brief theoretical description behind stable diffusion, architecture tackled there, classifier-Free Diffusion Guidance and EMEA.

The main goal of this notebook is:

Set a major pre refactoring step such that we can easily move this to code base.
Set an easier entry point to newcomers.
Serve as the new benchmark for prototyping.
As the refactoring is ongoing all of the code here should be abstracted away from imports from the codebase.

review-notebook-app · 2022-11-28T11:59:02Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

SauravMaheshkar · 2022-11-28T12:23:25Z

notebooks/experiments/conditional_diffusion/dna_diff_baseline_conditional_UNET.ipynb

@@ -0,0 +1,7119 @@
+{


Line #1. def compare_motif_list(df_motifs_a, df_motifs_b, motif_scoring_metric=motif_scoring_KL_divergence, plot_motif_probs=False):
Are df_motifs_a and df_motifs_b going to be a pandas Series ?
We should add a Callable type hint to motif_scoring_metric .
We should add a bool type hint to plot_motif_probs .
We should add the torch.Tensor type hint to the function output.

Reply via ReviewNB

SauravMaheshkar · 2022-11-28T12:23:25Z

notebooks/experiments/conditional_diffusion/dna_diff_baseline_conditional_UNET.ipynb

@@ -0,0 +1,7119 @@
+{


Line #1. def metric_comparison_between_components(original_data, generated_data, x_label_plot, y_label_plot):
We should add the Dict type hint to original_data and generated_data.
If I understand correctly x_label_plot and y_label_plot are strings (type: str) ? If yes we should add that.
We should add the None type hint to the function output.

Reply via ReviewNB

SauravMaheshkar · 2022-11-28T12:23:25Z

notebooks/experiments/conditional_diffusion/dna_diff_baseline_conditional_UNET.ipynb

@@ -0,0 +1,7119 @@
+{


Line #1. def one_hot_encode(seq, nucleotides, max_seq_len):
Are the seq and nucleotides parameters, Lists or Strings ? The corresponding type hint should be added.
We should add the int type hint to max_seq_len.
We should add the np.ndarray type hint to the function output (viz. -> np.ndarray)

Reply via ReviewNB

SauravMaheshkar · 2022-11-28T12:23:25Z

notebooks/experiments/conditional_diffusion/dna_diff_baseline_conditional_UNET.ipynb

@@ -0,0 +1,7119 @@
+{


Line #1. def log(t, eps = 1e-20):
We should add the torch.Tensor type hint to the parameter t and the function output.

Reply via ReviewNB

SauravMaheshkar · 2022-11-28T12:23:26Z

notebooks/experiments/conditional_diffusion/dna_diff_baseline_conditional_UNET.ipynb

@@ -0,0 +1,7119 @@
+{


Same goes for all class methods except update_average. As they don't return any values we should add None to the method output.

Reply via ReviewNB

SauravMaheshkar · 2022-11-28T12:23:26Z

notebooks/experiments/conditional_diffusion/dna_diff_baseline_conditional_UNET.ipynb

@@ -0,0 +1,7119 @@
+{


Line #2. def __init__(self, beta):
We should ideally add class docstrings. The markdown comments above would be ideal.
We should add float type hint to the beta parameter.

Reply via ReviewNB

SauravMaheshkar · 2022-11-28T12:23:26Z

notebooks/experiments/conditional_diffusion/dna_diff_baseline_conditional_UNET.ipynb

@@ -0,0 +1,7119 @@
+{


Line #7. def update_model_average(self, ma_model, current_model):
We should add nn.Module type hints to ma_model and current_model and None to the method output.

Reply via ReviewNB

SauravMaheshkar · 2022-11-28T12:23:26Z

notebooks/experiments/conditional_diffusion/dna_diff_baseline_conditional_UNET.ipynb

@@ -0,0 +1,7119 @@
+{


Great work on the abstraction 👍, type hints would be helpful.

Reply via ReviewNB

SauravMaheshkar · 2022-11-28T12:23:26Z

notebooks/experiments/conditional_diffusion/dna_diff_baseline_conditional_UNET.ipynb

@@ -0,0 +1,7119 @@
+{


Can we also do Garbage Collection and Empty cache after each step through the dataloader viz.

for epoch in range(...): for idx, sample in enumerate(dataloader): ... # ⭐️⭐️ Garbage Collection torch.cuda.empty_cache() _ = gc.collect()

Reply via ReviewNB

Adding the modified NB

a03375d

SauravMaheshkar reviewed Nov 28, 2022

View reviewed changes

SauravMaheshkar added enhancement New feature or request codebase refactoring Refactoring labels Nov 28, 2022

SauravMaheshkar assigned SauravMaheshkar and noahweber1 and unassigned SauravMaheshkar Nov 28, 2022

LucasSilvaFerreira merged commit ca068bc into dna-diffusion Dec 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding the modified NB #60

Adding the modified NB #60

noahweber1 commented Nov 28, 2022

review-notebook-app bot commented Nov 28, 2022

SauravMaheshkar Nov 28, 2022

SauravMaheshkar Nov 28, 2022

SauravMaheshkar Nov 28, 2022

SauravMaheshkar Nov 28, 2022

SauravMaheshkar Nov 28, 2022

SauravMaheshkar Nov 28, 2022

SauravMaheshkar Nov 28, 2022

SauravMaheshkar Nov 28, 2022

SauravMaheshkar Nov 28, 2022

Adding the modified NB #60

Adding the modified NB #60

Conversation

noahweber1 commented Nov 28, 2022

review-notebook-app bot commented Nov 28, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment