Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding the modified NB #60

Merged
merged 1 commit into from
Dec 2, 2022
Merged

Conversation

noahweber1
Copy link
Collaborator

Concrete improvements from the previous notebook:

  1. Complete restructuring and refactoring of the code.
  2. Delete unecessary code
  3. Seed set for complete reproducibility (such that we know when improvements are really improving and not due to randomness)
  4. Decoupled the logic between data loading and processing
  5. Wrote doc strings for most of the classes and functions
  6. Exposed hyperparameters immediately at the beggining (for easier hydra refactoring)
  7. Added more functionality (such as injectable/interchangable metric functions etc.)
  8. Documented all of the main chapters and subchapters, with brief theoretical description behind stable diffusion, architecture tackled there, classifier-Free Diffusion Guidance and EMEA.

The main goal of this notebook is:

  1. Set a major pre refactoring step such that we can easily move this to code base.
  2. Set an easier entry point to newcomers.
  3. Serve as the new benchmark for prototyping.
  4. As the refactoring is ongoing all of the code here should be abstracted away from imports from the codebase.

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@@ -0,0 +1,7119 @@
{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line #1.    def compare_motif_list(df_motifs_a, df_motifs_b, motif_scoring_metric=motif_scoring_KL_divergence, plot_motif_probs=False):
  • Are df_motifs_a and df_motifs_b going to be a pandas Series ?
  • We should add a Callable type hint to motif_scoring_metric .
  • We should add a bool type hint to plot_motif_probs .
  • We should add the torch.Tensor type hint to the function output.

Reply via ReviewNB

@@ -0,0 +1,7119 @@
{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line #1.    def metric_comparison_between_components(original_data, generated_data, x_label_plot, y_label_plot):
  • We should add the Dict type hint to original_data and generated_data.
  • If I understand correctly x_label_plot and y_label_plot are strings (type: str) ? If yes we should add that.
  • We should add the None type hint to the function output.

Reply via ReviewNB

@@ -0,0 +1,7119 @@
{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line #1.    def one_hot_encode(seq, nucleotides, max_seq_len):
  • Are the seq and nucleotides parameters, Lists or Strings ? The corresponding type hint should be added.
  • We should add the int type hint to max_seq_len.
  • We should add the np.ndarray type hint to the function output (viz. -> np.ndarray)

Reply via ReviewNB

@@ -0,0 +1,7119 @@
{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line #1.    def log(t, eps = 1e-20):

We should add the torch.Tensor type hint to the parameter t and the function output.


Reply via ReviewNB

@@ -0,0 +1,7119 @@
{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same goes for all class methods except update_average. As they don't return any values we should add None to the method output.


Reply via ReviewNB

@@ -0,0 +1,7119 @@
{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line #2.        def __init__(self, beta):
  • We should ideally add class docstrings. The markdown comments above would be ideal.
  • We should add float type hint to the beta parameter.

Reply via ReviewNB

@@ -0,0 +1,7119 @@
{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line #7.        def update_model_average(self, ma_model, current_model):

We should add nn.Module type hints to ma_model and current_model and None to the method output.


Reply via ReviewNB

@@ -0,0 +1,7119 @@
{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work on the abstraction 👍, type hints would be helpful.


Reply via ReviewNB

@@ -0,0 +1,7119 @@
{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also do Garbage Collection and Empty cache after each step through the dataloader viz.

for epoch in range(...):
  for idx, sample in enumerate(dataloader):
  ...
  # ⭐️⭐️ Garbage Collection
  torch.cuda.empty_cache()
  _ = gc.collect()

Reply via ReviewNB

@SauravMaheshkar SauravMaheshkar added enhancement New feature or request codebase refactoring Refactoring labels Nov 28, 2022
@LucasSilvaFerreira LucasSilvaFerreira merged commit ca068bc into dna-diffusion Dec 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
codebase enhancement New feature or request refactoring Refactoring
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants