Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Final round of text reviews for T011-T018 #106

Merged
merged 18 commits into from
Aug 27, 2021
Merged

Conversation

AndreaVolkamer
Copy link
Member

@AndreaVolkamer AndreaVolkamer commented Jun 4, 2021

Talktorials:

Address in another PR:

  • (T018) whole pipeline still tba
  • T019
  • T020
  • T021
  • T022

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@AndreaVolkamer
Copy link
Member Author

AndreaVolkamer commented Jun 4, 2021

@dominiquesydow can you please check two things in the KLIFS notebook (T012):

  • cell 39 produces this output, 3740 ligand bound structures for that kinase sound too many to me?

Selected kinases: PKR
Chosen KLIFS entry: PDB ID 2a19 with chain B and alternate model - (human).
Number of structure-bound ligands: 3740

  • the shown complex in cell 41 has a weird looking ligand (only a bond is shown)?

@AndreaVolkamer
Copy link
Member Author

@dominiquesydow or @jaimergp in T014 there are two TODO's mentioned, not sure if they are still planned (biopython) or already implemented (opencadd)? Can you please double check?

  • cell 21:

TODO: Change _construct_df to read_pdb_from_lines once biopandas
cuts a new release (currently: 0.2.7), see BioPandas/biopandas#72

  • cell 28:

TODO: With new opencadd release: Check if casting to int still necessary

@dominiquesydow dominiquesydow mentioned this pull request Jul 9, 2021
27 tasks
@AndreaVolkamer
Copy link
Member Author

AndreaVolkamer commented Aug 15, 2021

@schallerdavid minor questions/remarks on T015

  • can you check sentence: Also, the rise of artificial intelligence lead to the development of algorithms able to generate libraries with billions of theoretically synthesizable molecules (Nat Rev Chem (2019), 3, 119-128). -> is AI necessarily the cause for larger libraries? Maybe reformulate.
  • Fig2: Can we write 'pose evaluation'? and should we add also a forth category of ML/DL-based scoring fcts? (@AndreaVolkamer provide paper ref)
  • There are several warnings when writing the file, could you double check if me can get rid of them (cell 4):

UserWarning: Found no information for attr: 'tempfactors' Using default value of '0.0' warnings.warn("Found no information for attr: '{}'"

VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray. np.array(sorted(unique_bonds)), 4)

DeprecationWarning: np.bool is a deprecated alias for the builtin bool. To silence this warning, use bool by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use np.bool_ here.

  • in binding site definition paragraph -> should we link to the bs talktorial (T014) for more detail?
  • cell10, same MDanalysis warning (see above)
  • cell 11: in run_smina function, what do we need 'universal_newlines=True,' for, maybe add a comment?
  • 'smina is a command line tool, which currently does not provide a Python API' -> please just double check if this still holds true, I think @dominiquesydow mentioned sth in that direction lately in teams?
  • cell 12: should we add a warning that the actual docking can take some time?
  • after cell 16, could we discuss in a bit more detail what we see (overall good agreement between pose and x-ray ligand, ring systems nicely overlap, only solvent exposed part points in different direction]

@AndreaVolkamer
Copy link
Member Author

@t-kimber minor questions/remarks on T016:

  • description of PLIP algorithm: which equals the maximum distance between the centroid of the ligand and a ligand atom. -> Can you double check the sentence, not sure I understand it corretcly?
  • warnings when downloading the structure (adding @schallerdavid here, same as in T015):

UserWarning: Found no information for attr: 'tempfactors' Using default value of '0.0' warnings.warn("Found no information for attr: '{}'"

VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray. np.array(sorted(unique_bonds)), 4)

DeprecationWarning: np.bool is a deprecated alias for the builtin bool. To silence this warning, use bool by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use np.bool_ here. Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations guessed = np.asarray(guessed, dtype=np.bool)

  • in cell 10: can you please add a comment what is stored in interactions exactly, it might currently be a bit abstract

@AndreaVolkamer
Copy link
Member Author

AndreaVolkamer commented Aug 18, 2021

@jaimergp and @dominiquesydow minor questions/remarks on T017:

  • theory part: 'The NGL implementation also considers some types of measurements as representations.' -Is the vdW surface considered a measurement, same for a label? maybe reformulate
    ⦁ Figure: Step by step procedure to represent geometric measurements in NGLView -> Can you please annotate where is this taken from?
    ⦁ Comment after cell 3: Check if referring to T020 is correct, maybe add topic of the talktorial (in case numbers change)?
    ⦁ cell 8: can you add what factor=2 is needed for? [see comment further down that this seems to be introduced at two locations]
    ⦁ cell 11: the structure is shown for me although the text below says 'The above widget is now completely blank.'? -> so I do not see the described updating? Wait: Actually, everything is as described, if I execute this and the next cell cell-by-cell, but once executing the second that also changes the representation of the first? Maybe add a comment for people that execute the whole notebook at once - like me - to go cell by cell in this notebook :) [see one more comment on this effect below]
    ⦁ The 'Details' above cell 13 are a bit repetitive to the first time this is introduced, maybe fully introduce first time. [see above]
    ⦁ Cell 20: just a questions, we choose add_line for the waters, but they are rather displayed as dots?
    ⦁ Before cell 27: what do you consider multimodel PDBs? A pdb file that contains several frames of a trajectory? Maybe specify.
    ⦁ Before cell 29: when do we use RCSB, when PDB?
    ⦁ Before cell 30: you describe the phenomena I observed above: 'Note that the view above will get updated automatically:' Can you explain why? and maybe do it at the first occurrence? I think this would be important, it confused me many times.
    ⦁ cell 53: could you explain what the "/0" stand for in 'view.add_ribbon("/0")'

Additional comment: After running the notebook a few times, I now get weird artefacts for the cells trying to view structures - maybe just a problem on my laptop - or could it be that some cache needs to be cleared or so?

@dominiquesydow
Copy link
Collaborator

@dominiquesydow or @jaimergp in T014 there are two TODO's mentioned, not sure if they are still planned (biopython) or already implemented (opencadd)? Can you please double check?

  • cell 21:

TODO: Change _construct_df to read_pdb_from_lines once biopandas
cuts a new release (currently: 0.2.7), see rasbt/biopandas#72

Still a TODO - the new biopandas version may not be on conda-forge yet: BioPandas/biopandas#86
Add this TODO to our refactoring base PR: #74

  • cell 28:

TODO: With new opencadd release: Check if casting to int still necessary

Removed this TODO - we will by default cast to int here (regardless of what opencadd does).

@dominiquesydow
Copy link
Collaborator

@dominiquesydow can you please check two things in the KLIFS notebook (T012):

  • cell 39 produces this output, 3740 ligand bound structures for that kinase sound too many to me?

Selected kinases: PKR
Chosen KLIFS entry: PDB ID 2a19 with chain B and alternate model - (human).
Number of structure-bound ligands: 3740

Thanks, you are right, we fetched co-crystallized ligands for multiple kinases instead of only the one kinase we are interested in. Fixed that.

  • the shown complex in cell 41 has a weird looking ligand (only a bond is shown)?

The new complex shows a nicer ligand :)

@dominiquesydow
Copy link
Collaborator

@jaimergp and @dominiquesydow minor questions/remarks on T017:

  • theory part: 'The NGL implementation also considers some types of measurements as representations.' -Is the vdW surface considered a measurement, same for a label? maybe reformulate

@jaimergp can you please take a look?

⦁ Figure: Step by step procedure to represent geometric measurements in NGLView -> Can you please annotate where is this taken from?

I am pretty sure that Jaime made this video himself.

⦁ Comment after cell 3: Check if referring to T020 is correct, maybe add topic of the talktorial (in case numbers change)?

Is correct. I added "You can see it in action in Talktorial T020, a talktorial on analyzing molecular dynamics simulations."

⦁ cell 8: can you add what factor=2 is needed for? [see comment further down that this seems to be introduced at two locations]

Moved the text regarding this -- later in the notebook -- up here.

⦁ cell 11: the structure is shown for me although the text below says 'The above widget is now completely blank.'? -> so I do not see the described updating? Wait: Actually, everything is as described, if I execute this and the next cell cell-by-cell, but once executing the second that also changes the representation of the first? Maybe add a comment for people that execute the whole notebook at once - like me - to go cell by cell in this notebook :) [see one more comment on this effect below]

Added an asterisk with comment "If the widget is not black, you may have executed the whole talktorial at once; please restart and execute the talktorial cell by cell."

⦁ The 'Details' above cell 13 are a bit repetitive to the first time this is introduced, maybe fully introduce first time. [see above]

Move this up to first instance. [see above]

⦁ Cell 20: just a questions, we choose add_line for the waters, but they are rather displayed as dots?

I added "Note: Crystallographic waters consist of an oxygen atom only since hydrogen atoms are not resolved by X-ray crystallography, so we see dots instead of lines."

⦁ Before cell 27: what do you consider multimodel PDBs? A pdb file that contains several frames of a trajectory? Maybe specify.

@jaimergp, multimodels refers to trajectory or also to alternative models and chains?

⦁ Before cell 29: when do we use RCSB, when PDB?

I usually refer to the PDB. I think the RCSB is one of the maintainers and tool providers for the PDB archives (other are ePDB, PDBj). We use the RCSB PDB website. @jaimergp can we replace RCSB with PDB (or RCSB PDB) in the talktorial text?

⦁ Before cell 30: you describe the phenomena I observed above: 'Note that the view above will get updated automatically:' Can you explain why? and maybe do it at the first occurrence? I think this would be important, it confused me many times.

I cannot explain way, @jaimergp you? I have added a "Look how the widget updates over the next steps (including in the view above)." when we first encounter this magic.

⦁ cell 53: could you explain what the "/0" stand for in 'view.add_ribbon("/0")'

It's stated above the cell in the text "and add the ribbon only on model N with the syntax /N". Refers to the model.
Added a short description "(atoms can have multiple possible positions based on the experimental data, which can be deposited in the PDB file as alternative models)"

Additional comment: After running the notebook a few times, I now get weird artefacts for the cells trying to view structures - maybe just a problem on my laptop - or could it be that some cache needs to be cleared or so?

I cannot reproduce this issue, sorry. I would go for "Restart and clear all output" or shutting down Jupyter Lab and restart.

@t-kimber
Copy link
Contributor

@AndreaVolkamer,
Concerning T016:

  1. For the cutoff, I added another sentence that hopefully makes it clearer, as well as the reference from the paper where it mentions it (in the suppl. data).
  2. I wasn't able to reproduce the exact same error as yours. However, I added an additional import warnings warnings.filterwarnings('ignore'). If you execute the cell twice, all warnings disappear. I haven't found a better solution for now, how about you @schallerdavid ?
  3. I added a detailed description for interactions that hopefully makes it less abstract. For sake of completeness, linking here to the corresponding plip code.

@schallerdavid
Copy link
Collaborator

@AndreaVolkamer,

thanks for the review! Here the implementations of the different suggestions:

can you check sentence: Also, the rise of artificial intelligence lead to the development of algorithms able to generate libraries with billions of theoretically synthesizable molecules (Nat Rev Chem (2019), 3, 119-128). -> is AI necessarily the cause for larger libraries? Maybe reformulate.

  • --> Also, improvements in the fields of cheminformatics and machine learning lead to the development ...

Fig2: Can we write 'pose evaluation'? and should we add also a forth category of ML/DL-based scoring fcts? (@AndreaVolkamer provide paper ref)

  • I adapted the figure accordingly and added a new section for ML/DL-based scoring functions citing Talia's paper and the Gnina paper.
    • ML/DL-based scoring functions are machine learning (ML)/deep learning (DL) models that were trained on a set of available protein-ligand complexes with known binding affinity. The protein-ligand complexes are thereby encoded in a computer-readable format, e.g. as interaction fingerprints or graph. Such scoring functions can be applied during post-processing to rank hit compounds more accurately or during the pose evaluation step.

There are several warnings when writing the file, could you double check if me can get rid of them (cell 4)

  • I added a new cell to get rid of the warnings
# filter warnings
warnings.filterwarnings("ignore")
ob_log_handler = pybel.ob.OBMessageHandler()
pybel.ob.obErrorLog.SetOutputLevel(0)

in binding site definition paragraph -> should we link to the bs talktorial (T014) for more detail?

  • --> If no co-crystallized ligand is available, one can also use binding site detection algorithms to identify regions of interest for docking (see Talktorial T014 on binding site detection algorithms).

cell10, same MDAnalysis warning (see above)

  • warnings disabled (see above)

cell 11: in run_smina function, what do we need 'universal_newlines=True,' for, maybe add a comment?

  • added a comment --> # needed to capture output text

'smina is a command line tool, which currently does not provide a Python API' -> please just double check if this still holds true, I think @dominiquesydow mentioned sth in that direction lately in teams?

  • smina still does not provide a python API, though the recent release of Vina 1.2 has a python api, so we could think about switching to this tool in the future

cell 12: should we add a warning that the actual docking can take some time?

  • I added a warning: Note: Depending on the CPU speed and the docking settings this step may take a few minutes.

after cell 16, could we discuss in a bit more detail what we see (overall good agreement between pose and x-ray ligand, ring systems nicely overlap, only solvent exposed part points in different direction]

  • Depending on random seeds and future smina releases those results might actually change for every user. So I find it a little hard to discuss the docking poses.
  • However I added two sentences, so the user can analyze few things: --> With the provided visualization you can now check if the docking program was able to recapitulate the binding pose observed in the X-ray structure. Also, you can check if the docking pose with the best score (docking_pose_id = 1) is closer to the X-ray structue then the docking pose with worst score (docking_pose_id = 10).

Another note that might become also important to other notebooks:
With the new release of MDAnalysis 2.0.0 the OpenCADD fetch of PDB structures by PDB codes will not have elements assigned in the mda.Universe. So by default this will not be written to a PDB file. However openbabel requires this information. I found a workaround to add the element information to the mda.Universe in case it is not present:

# retrieve structure from the Protein Data Bank
pdb_id = "2ito"
structure = Structure.from_pdbid(pdb_id)
# element information maybe missing, but important for subsequent PDBQT conversion
if not hasattr(structure.atoms, "elements"):
    structure.add_TopologyAttr('elements', structure.atoms.types)
structure

@dominiquesydow
Copy link
Collaborator

smina still does not provide a python API, though the recent release of Vina 1.2 has a python api, so we could think about switching to this tool in the future

@schallerdavid could you please raise an issue so that we do not forget about this option (incl. relevant links)?

@dominiquesydow
Copy link
Collaborator

Concerning T016:

For the cutoff, I added another sentence that hopefully makes it clearer, as well as the reference from the paper where it mentions it (in the suppl. data).

@t-kimber and I decided to keep only a general sentence on the binding site detection and refer to the SI for details.

@dominiquesydow
Copy link
Collaborator

dominiquesydow commented Aug 27, 2021

I am closing open issues in T017 now as follows:

  • theory part: 'The NGL implementation also considers some types of measurements as representations.' -Is the vdW surface considered a measurement, same for a label? maybe reformulate

@jaimergp can you please take a look?

Added "Labels count as representations, too."

⦁ Figure: Step by step procedure to represent geometric measurements in NGLView -> Can you please annotate where is this taken from?

I am pretty sure that Jaime made this video himself.

@jaimergp I am assuming now that this is your video. If not please ping me, I can add a reference also later.

⦁ Before cell 27: what do you consider multimodel PDBs? A pdb file that contains several frames of a trajectory? Maybe specify.

@jaimergp, multimodels refers to trajectory or also to alternative models and chains?

In my understanding this is restricted to alternative models, added "(multimodel structure = multiple sets of coordinates for the same structure in one PDB file)"
Here we do not mean trajectories or chains.

⦁ Before cell 29: when do we use RCSB, when PDB?

I usually refer to the PDB. I think the RCSB is one of the maintainers and tool providers for the PDB archives (other are ePDB, PDBj). We use the RCSB PDB website. @jaimergp can we replace RCSB with PDB (or RCSB PDB) in the talktorial text?

Added at first instance of nv.show_pdbid that PDBs are fetched from the RCSB PDB resource; we use now the term "RCSB PDB" in the text.

@dominiquesydow dominiquesydow self-requested a review August 27, 2021 08:53
@jaimergp
Copy link
Contributor

I am assuming now that this is your video. If not please ping me, I can add a reference also later.

Correct, I recorded that myself @dominiquesydow

Also, sorry I haven't kept up with these notifications. I am getting way too many lately. Is there anything else you need me to do? Feel free to ping me in the Choderalab Slack if I ever missed a tag!

Copy link
Collaborator

@dominiquesydow dominiquesydow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Talktorials T011-T017 are looking good to me now.

@dominiquesydow
Copy link
Collaborator

dominiquesydow commented Aug 27, 2021

Is there anything else you need me to do? Feel free to ping me in the Choderalab Slack if I ever missed a tag!

Thanks @jaimergp, going through the CI in our base branch #74 now. I will ping you on slack re issues I don't get fixed, thank you.

@dominiquesydow dominiquesydow changed the title Final round of text reviews Final round of text reviews for T011-T018 Aug 27, 2021
@dominiquesydow dominiquesydow merged commit 3737e87 into t011-base Aug 27, 2021
@dominiquesydow dominiquesydow deleted the t011-base-review branch August 27, 2021 09:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants