Final round of text reviews for T011-T018 #106

AndreaVolkamer · 2021-06-04T15:33:54Z

Talktorials:

Address in another PR:

(T018) whole pipeline still tba
T019
T020
T021
T022

review-notebook-app · 2021-06-04T15:33:57Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

AndreaVolkamer · 2021-06-04T16:52:29Z

@dominiquesydow can you please check two things in the KLIFS notebook (T012):

cell 39 produces this output, 3740 ligand bound structures for that kinase sound too many to me?

Selected kinases: PKR
Chosen KLIFS entry: PDB ID 2a19 with chain B and alternate model - (human).
Number of structure-bound ligands: 3740

the shown complex in cell 41 has a weird looking ligand (only a bond is shown)?

AndreaVolkamer · 2021-06-07T15:24:07Z

@dominiquesydow or @jaimergp in T014 there are two TODO's mentioned, not sure if they are still planned (biopython) or already implemented (opencadd)? Can you please double check?

cell 21:

TODO: Change _construct_df to read_pdb_from_lines once biopandas
cuts a new release (currently: 0.2.7), see BioPandas/biopandas#72

cell 28:

TODO: With new opencadd release: Check if casting to int still necessary

AndreaVolkamer · 2021-08-15T18:48:50Z

@schallerdavid minor questions/remarks on T015

can you check sentence: Also, the rise of artificial intelligence lead to the development of algorithms able to generate libraries with billions of theoretically synthesizable molecules (Nat Rev Chem (2019), 3, 119-128). -> is AI necessarily the cause for larger libraries? Maybe reformulate.
Fig2: Can we write 'pose evaluation'? and should we add also a forth category of ML/DL-based scoring fcts? (@AndreaVolkamer provide paper ref)
There are several warnings when writing the file, could you double check if me can get rid of them (cell 4):

UserWarning: Found no information for attr: 'tempfactors' Using default value of '0.0' warnings.warn("Found no information for attr: '{}'"

VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray. np.array(sorted(unique_bonds)), 4)

DeprecationWarning: np.bool is a deprecated alias for the builtin bool. To silence this warning, use bool by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use np.bool_ here.

in binding site definition paragraph -> should we link to the bs talktorial (T014) for more detail?
cell10, same MDanalysis warning (see above)
cell 11: in run_smina function, what do we need 'universal_newlines=True,' for, maybe add a comment?
'smina is a command line tool, which currently does not provide a Python API' -> please just double check if this still holds true, I think @dominiquesydow mentioned sth in that direction lately in teams?
cell 12: should we add a warning that the actual docking can take some time?
after cell 16, could we discuss in a bit more detail what we see (overall good agreement between pose and x-ray ligand, ring systems nicely overlap, only solvent exposed part points in different direction]

AndreaVolkamer · 2021-08-17T07:57:37Z

@t-kimber minor questions/remarks on T016:

description of PLIP algorithm: which equals the maximum distance between the centroid of the ligand and a ligand atom. -> Can you double check the sentence, not sure I understand it corretcly?
warnings when downloading the structure (adding @schallerdavid here, same as in T015):

UserWarning: Found no information for attr: 'tempfactors' Using default value of '0.0' warnings.warn("Found no information for attr: '{}'"

VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray. np.array(sorted(unique_bonds)), 4)

DeprecationWarning: np.bool is a deprecated alias for the builtin bool. To silence this warning, use bool by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use np.bool_ here. Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations guessed = np.asarray(guessed, dtype=np.bool)

in cell 10: can you please add a comment what is stored in interactions exactly, it might currently be a bit abstract

AndreaVolkamer · 2021-08-18T10:30:31Z

@jaimergp and @dominiquesydow minor questions/remarks on T017:

theory part: 'The NGL implementation also considers some types of measurements as representations.' -Is the vdW surface considered a measurement, same for a label? maybe reformulate
⦁ Figure: Step by step procedure to represent geometric measurements in NGLView -> Can you please annotate where is this taken from?
⦁ Comment after cell 3: Check if referring to T020 is correct, maybe add topic of the talktorial (in case numbers change)?
⦁ cell 8: can you add what factor=2 is needed for? [see comment further down that this seems to be introduced at two locations]
⦁ cell 11: the structure is shown for me although the text below says 'The above widget is now completely blank.'? -> so I do not see the described updating? Wait: Actually, everything is as described, if I execute this and the next cell cell-by-cell, but once executing the second that also changes the representation of the first? Maybe add a comment for people that execute the whole notebook at once - like me - to go cell by cell in this notebook :) [see one more comment on this effect below]
⦁ The 'Details' above cell 13 are a bit repetitive to the first time this is introduced, maybe fully introduce first time. [see above]
⦁ Cell 20: just a questions, we choose add_line for the waters, but they are rather displayed as dots?
⦁ Before cell 27: what do you consider multimodel PDBs? A pdb file that contains several frames of a trajectory? Maybe specify.
⦁ Before cell 29: when do we use RCSB, when PDB?
⦁ Before cell 30: you describe the phenomena I observed above: 'Note that the view above will get updated automatically:' Can you explain why? and maybe do it at the first occurrence? I think this would be important, it confused me many times.
⦁ cell 53: could you explain what the "/0" stand for in 'view.add_ribbon("/0")'

Additional comment: After running the notebook a few times, I now get weird artefacts for the cells trying to view structures - maybe just a problem on my laptop - or could it be that some cache needs to be cleared or so?

dominiquesydow · 2021-08-25T09:37:29Z

@dominiquesydow or @jaimergp in T014 there are two TODO's mentioned, not sure if they are still planned (biopython) or already implemented (opencadd)? Can you please double check?

cell 21:

TODO: Change _construct_df to read_pdb_from_lines once biopandas
cuts a new release (currently: 0.2.7), see rasbt/biopandas#72

Still a TODO - the new biopandas version may not be on conda-forge yet: BioPandas/biopandas#86
Add this TODO to our refactoring base PR: #74

cell 28:

TODO: With new opencadd release: Check if casting to int still necessary

Removed this TODO - we will by default cast to int here (regardless of what opencadd does).

dominiquesydow · 2021-08-25T09:42:27Z

@dominiquesydow can you please check two things in the KLIFS notebook (T012):

cell 39 produces this output, 3740 ligand bound structures for that kinase sound too many to me?

Selected kinases: PKR
Chosen KLIFS entry: PDB ID 2a19 with chain B and alternate model - (human).
Number of structure-bound ligands: 3740

Thanks, you are right, we fetched co-crystallized ligands for multiple kinases instead of only the one kinase we are interested in. Fixed that.

the shown complex in cell 41 has a weird looking ligand (only a bond is shown)?

The new complex shows a nicer ligand :)

dominiquesydow · 2021-08-25T10:21:11Z

@jaimergp and @dominiquesydow minor questions/remarks on T017:

theory part: 'The NGL implementation also considers some types of measurements as representations.' -Is the vdW surface considered a measurement, same for a label? maybe reformulate

@jaimergp can you please take a look?

⦁ Figure: Step by step procedure to represent geometric measurements in NGLView -> Can you please annotate where is this taken from?

I am pretty sure that Jaime made this video himself.

⦁ Comment after cell 3: Check if referring to T020 is correct, maybe add topic of the talktorial (in case numbers change)?

Is correct. I added "You can see it in action in Talktorial T020, a talktorial on analyzing molecular dynamics simulations."

⦁ cell 8: can you add what factor=2 is needed for? [see comment further down that this seems to be introduced at two locations]

Moved the text regarding this -- later in the notebook -- up here.

⦁ cell 11: the structure is shown for me although the text below says 'The above widget is now completely blank.'? -> so I do not see the described updating? Wait: Actually, everything is as described, if I execute this and the next cell cell-by-cell, but once executing the second that also changes the representation of the first? Maybe add a comment for people that execute the whole notebook at once - like me - to go cell by cell in this notebook :) [see one more comment on this effect below]

Added an asterisk with comment "If the widget is not black, you may have executed the whole talktorial at once; please restart and execute the talktorial cell by cell."

⦁ The 'Details' above cell 13 are a bit repetitive to the first time this is introduced, maybe fully introduce first time. [see above]

Move this up to first instance. [see above]

⦁ Cell 20: just a questions, we choose add_line for the waters, but they are rather displayed as dots?

I added "Note: Crystallographic waters consist of an oxygen atom only since hydrogen atoms are not resolved by X-ray crystallography, so we see dots instead of lines."

⦁ Before cell 27: what do you consider multimodel PDBs? A pdb file that contains several frames of a trajectory? Maybe specify.

@jaimergp, multimodels refers to trajectory or also to alternative models and chains?

⦁ Before cell 29: when do we use RCSB, when PDB?

I usually refer to the PDB. I think the RCSB is one of the maintainers and tool providers for the PDB archives (other are ePDB, PDBj). We use the RCSB PDB website. @jaimergp can we replace RCSB with PDB (or RCSB PDB) in the talktorial text?

⦁ Before cell 30: you describe the phenomena I observed above: 'Note that the view above will get updated automatically:' Can you explain why? and maybe do it at the first occurrence? I think this would be important, it confused me many times.

I cannot explain way, @jaimergp you? I have added a "Look how the widget updates over the next steps (including in the view above)." when we first encounter this magic.

⦁ cell 53: could you explain what the "/0" stand for in 'view.add_ribbon("/0")'

It's stated above the cell in the text "and add the ribbon only on model N with the syntax /N". Refers to the model.
Added a short description "(atoms can have multiple possible positions based on the experimental data, which can be deposited in the PDB file as alternative models)"

Additional comment: After running the notebook a few times, I now get weird artefacts for the cells trying to view structures - maybe just a problem on my laptop - or could it be that some cache needs to be cleared or so?

I cannot reproduce this issue, sorry. I would go for "Restart and clear all output" or shutting down Jupyter Lab and restart.

t-kimber · 2021-08-26T07:16:17Z

@AndreaVolkamer,
Concerning T016:

For the cutoff, I added another sentence that hopefully makes it clearer, as well as the reference from the paper where it mentions it (in the suppl. data).
I wasn't able to reproduce the exact same error as yours. However, I added an additional import warnings warnings.filterwarnings('ignore'). If you execute the cell twice, all warnings disappear. I haven't found a better solution for now, how about you @schallerdavid ?
I added a detailed description for interactions that hopefully makes it less abstract. For sake of completeness, linking here to the corresponding plip code.

schallerdavid · 2021-08-26T12:05:10Z

@AndreaVolkamer,

thanks for the review! Here the implementations of the different suggestions:

can you check sentence: Also, the rise of artificial intelligence lead to the development of algorithms able to generate libraries with billions of theoretically synthesizable molecules (Nat Rev Chem (2019), 3, 119-128). -> is AI necessarily the cause for larger libraries? Maybe reformulate.

--> Also, improvements in the fields of cheminformatics and machine learning lead to the development ...

Fig2: Can we write 'pose evaluation'? and should we add also a forth category of ML/DL-based scoring fcts? (@AndreaVolkamer provide paper ref)

I adapted the figure accordingly and added a new section for ML/DL-based scoring functions citing Talia's paper and the Gnina paper.
- ML/DL-based scoring functions are machine learning (ML)/deep learning (DL) models that were trained on a set of available protein-ligand complexes with known binding affinity. The protein-ligand complexes are thereby encoded in a computer-readable format, e.g. as interaction fingerprints or graph. Such scoring functions can be applied during post-processing to rank hit compounds more accurately or during the pose evaluation step.

There are several warnings when writing the file, could you double check if me can get rid of them (cell 4)

I added a new cell to get rid of the warnings

# filter warnings
warnings.filterwarnings("ignore")
ob_log_handler = pybel.ob.OBMessageHandler()
pybel.ob.obErrorLog.SetOutputLevel(0)

in binding site definition paragraph -> should we link to the bs talktorial (T014) for more detail?

--> If no co-crystallized ligand is available, one can also use binding site detection algorithms to identify regions of interest for docking (see Talktorial T014 on binding site detection algorithms).

cell10, same MDAnalysis warning (see above)

warnings disabled (see above)

cell 11: in run_smina function, what do we need 'universal_newlines=True,' for, maybe add a comment?

added a comment --> # needed to capture output text

'smina is a command line tool, which currently does not provide a Python API' -> please just double check if this still holds true, I think @dominiquesydow mentioned sth in that direction lately in teams?

smina still does not provide a python API, though the recent release of Vina 1.2 has a python api, so we could think about switching to this tool in the future

cell 12: should we add a warning that the actual docking can take some time?

I added a warning: Note: Depending on the CPU speed and the docking settings this step may take a few minutes.

after cell 16, could we discuss in a bit more detail what we see (overall good agreement between pose and x-ray ligand, ring systems nicely overlap, only solvent exposed part points in different direction]

Depending on random seeds and future smina releases those results might actually change for every user. So I find it a little hard to discuss the docking poses.
However I added two sentences, so the user can analyze few things: --> With the provided visualization you can now check if the docking program was able to recapitulate the binding pose observed in the X-ray structure. Also, you can check if the docking pose with the best score (docking_pose_id = 1) is closer to the X-ray structue then the docking pose with worst score (docking_pose_id = 10).

Another note that might become also important to other notebooks:
With the new release of MDAnalysis 2.0.0 the OpenCADD fetch of PDB structures by PDB codes will not have elements assigned in the mda.Universe. So by default this will not be written to a PDB file. However openbabel requires this information. I found a workaround to add the element information to the mda.Universe in case it is not present:

# retrieve structure from the Protein Data Bank
pdb_id = "2ito"
structure = Structure.from_pdbid(pdb_id)
# element information maybe missing, but important for subsequent PDBQT conversion
if not hasattr(structure.atoms, "elements"):
    structure.add_TopologyAttr('elements', structure.atoms.types)
structure

dominiquesydow · 2021-08-27T07:05:03Z

smina still does not provide a python API, though the recent release of Vina 1.2 has a python api, so we could think about switching to this tool in the future

@schallerdavid could you please raise an issue so that we do not forget about this option (incl. relevant links)?

dominiquesydow · 2021-08-27T08:18:27Z

Concerning T016:

For the cutoff, I added another sentence that hopefully makes it clearer, as well as the reference from the paper where it mentions it (in the suppl. data).

@t-kimber and I decided to keep only a general sentence on the binding site detection and refer to the SI for details.

dominiquesydow · 2021-08-27T08:52:37Z

I am closing open issues in T017 now as follows:

theory part: 'The NGL implementation also considers some types of measurements as representations.' -Is the vdW surface considered a measurement, same for a label? maybe reformulate

@jaimergp can you please take a look?

Added "Labels count as representations, too."

⦁ Figure: Step by step procedure to represent geometric measurements in NGLView -> Can you please annotate where is this taken from?

I am pretty sure that Jaime made this video himself.

@jaimergp I am assuming now that this is your video. If not please ping me, I can add a reference also later.

⦁ Before cell 27: what do you consider multimodel PDBs? A pdb file that contains several frames of a trajectory? Maybe specify.

@jaimergp, multimodels refers to trajectory or also to alternative models and chains?

In my understanding this is restricted to alternative models, added "(multimodel structure = multiple sets of coordinates for the same structure in one PDB file)"
Here we do not mean trajectories or chains.

⦁ Before cell 29: when do we use RCSB, when PDB?

I usually refer to the PDB. I think the RCSB is one of the maintainers and tool providers for the PDB archives (other are ePDB, PDBj). We use the RCSB PDB website. @jaimergp can we replace RCSB with PDB (or RCSB PDB) in the talktorial text?

Added at first instance of nv.show_pdbid that PDBs are fetched from the RCSB PDB resource; we use now the term "RCSB PDB" in the text.

jaimergp · 2021-08-27T08:54:25Z

I am assuming now that this is your video. If not please ping me, I can add a reference also later.

Correct, I recorded that myself @dominiquesydow

Also, sorry I haven't kept up with these notifications. I am getting way too many lately. Is there anything else you need me to do? Feel free to ping me in the Choderalab Slack if I ever missed a tag!

dominiquesydow

Talktorials T011-T017 are looking good to me now.

dominiquesydow · 2021-08-27T08:59:33Z

Is there anything else you need me to do? Feel free to ping me in the Choderalab Slack if I ever missed a tag!

Thanks @jaimergp, going through the CI in our base branch #74 now. I will ping you on slack re issues I don't get fixed, thank you.

T011 reviewed

b42d24a

T012 reviewed

19c3f11

AndreaVolkamer added 2 commits June 7, 2021 16:46

T013 reviewed

b53e4c0

T014 reviewed

e4485a9

dominiquesydow mentioned this pull request Jul 9, 2021

Base branch for new talktorials #74

Merged

27 tasks

T015 reviewed

c6a557d

T016 reviewed

082cf27

T017 reviewed

30e9310

T012: Resolve @AndreaVolkamer's review requests

2a7ce97

T013: Resolve @AndreaVolkamer's review requests

3838fea

T017: Resolve @AndreaVolkamer's review requests

9fdd6bc

Address AV's comments.

8be3f28

t-kimber and others added 2 commits August 26, 2021 09:19

fix minor typo.

246f529

implement suggestions for T015

c9d861f

dominiquesydow added 2 commits August 27, 2021 09:20

T013: Fix typo + table of contents

c2d56f7

T013: Fix typo in "fixed" typo...

1595b08

dominiquesydow mentioned this pull request Aug 27, 2021

MDAnalysis 2.0.0: Structures fetched by PDB code do not contain elements volkamerlab/opencadd#93

Open

T016: Refer to SI for binding site detection procedure in PLIP

cd00dc9

T016: Fix typo in code comment

a001a3d

T017: Add comments on RCSB, labels=representations, multimodels

8a8b9c3

dominiquesydow self-requested a review August 27, 2021 08:53

dominiquesydow approved these changes Aug 27, 2021

View reviewed changes

dominiquesydow changed the title ~~Final round of text reviews~~ Final round of text reviews for T011-T018 Aug 27, 2021

dominiquesydow merged commit 3737e87 into t011-base Aug 27, 2021

dominiquesydow deleted the t011-base-review branch August 27, 2021 09:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Final round of text reviews for T011-T018 #106

Final round of text reviews for T011-T018 #106

AndreaVolkamer commented Jun 4, 2021 •

edited by dominiquesydow

Loading

review-notebook-app bot commented Jun 4, 2021

AndreaVolkamer commented Jun 4, 2021 •

edited

Loading

AndreaVolkamer commented Jun 7, 2021

AndreaVolkamer commented Aug 15, 2021 •

edited

Loading

AndreaVolkamer commented Aug 17, 2021

AndreaVolkamer commented Aug 18, 2021 •

edited

Loading

dominiquesydow commented Aug 25, 2021

dominiquesydow commented Aug 25, 2021

dominiquesydow commented Aug 25, 2021

t-kimber commented Aug 26, 2021

schallerdavid commented Aug 26, 2021

dominiquesydow commented Aug 27, 2021

dominiquesydow commented Aug 27, 2021

dominiquesydow commented Aug 27, 2021 •

edited

Loading

jaimergp commented Aug 27, 2021

dominiquesydow left a comment

dominiquesydow commented Aug 27, 2021 •

edited

Loading

Final round of text reviews for T011-T018 #106

Final round of text reviews for T011-T018 #106

Conversation

AndreaVolkamer commented Jun 4, 2021 • edited by dominiquesydow Loading

review-notebook-app bot commented Jun 4, 2021

AndreaVolkamer commented Jun 4, 2021 • edited Loading

AndreaVolkamer commented Jun 7, 2021

AndreaVolkamer commented Aug 15, 2021 • edited Loading

AndreaVolkamer commented Aug 17, 2021

AndreaVolkamer commented Aug 18, 2021 • edited Loading

dominiquesydow commented Aug 25, 2021

dominiquesydow commented Aug 25, 2021

dominiquesydow commented Aug 25, 2021

t-kimber commented Aug 26, 2021

schallerdavid commented Aug 26, 2021

dominiquesydow commented Aug 27, 2021

dominiquesydow commented Aug 27, 2021

dominiquesydow commented Aug 27, 2021 • edited Loading

jaimergp commented Aug 27, 2021

dominiquesydow left a comment

Choose a reason for hiding this comment

dominiquesydow commented Aug 27, 2021 • edited Loading

AndreaVolkamer commented Jun 4, 2021 •

edited by dominiquesydow

Loading

AndreaVolkamer commented Jun 4, 2021 •

edited

Loading

AndreaVolkamer commented Aug 15, 2021 •

edited

Loading

AndreaVolkamer commented Aug 18, 2021 •

edited

Loading

dominiquesydow commented Aug 27, 2021 •

edited

Loading

dominiquesydow commented Aug 27, 2021 •

edited

Loading