Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

T019 - Molecular dynamics simulation #56

Merged
merged 23 commits into from
Apr 27, 2021
Merged

T019 - Molecular dynamics simulation #56

merged 23 commits into from
Apr 27, 2021

Conversation

Mika-Le
Copy link
Collaborator

@Mika-Le Mika-Le commented Oct 26, 2020

Details

  • Talktorial ID: 019
  • Title: Molecular dynamics simulation
  • Original authors: Pietro Gerletti
  • Reviewer(s): Mareike Leja, David Schaller
  • Date of review: 09/2020-present

Content review

  • Potential labels or categories (e.g. machine learning, small molecules, online APIs): interaction, small molecule, protein
  • One line summary: Perform a molecular dynamics simulation of the SARS-CoV-2 main protease in complex with an inhibitor.
  • The table of contents reflects the talktorial story-line; order of #, ##, ### headers is correct
  • URLs are linked with meaningful words, instead of pasting the URL directly or linking words like here.
  • I have spell-checked the notebook
  • Images have enough resolution to be rendered with quality, without being too heavy.
  • All figures have a description
  • Markdown cell content is still in-line with code cell output (whenever results are discussed)
  • I have checked that cell outputs are not incredibly long (this applies also to DataFrames)
  • Formatting looks correctly on the Sphinx render (bold, italics, figure placing)

Code review

  • Time it took to execute (approx.):
  • Variable and function names follow snake case rules (e.g. a_variable_name vs aVariableName)
  • Spacing follows PEP8 (run Black on the code cells if needed)
  • Code line are under 99 characters each (run black -l 99)
  • Comments are useful and well placed
  • There are no unpythonic idioms like for i in range(len(list)) (see slides)
  • All 3rd party dependencies are listed at the top of the notebook
  • I have marked all code cell with output referenced in markdown cells with the label # TODO: CI
  • I have identified potential candidates for a code refactor / useful functions
  • All import ... lines are at the top (practice part) cell, ordered by standard library / 3rd party packages / our own (teachopencadd.*)
  • I have update the relative paths to absolute paths.
  • List here unfamiliar libraries you find in the imports and their intended use:

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@jaimergp jaimergp changed the base branch from master to packaging October 26, 2020 16:07
Base automatically changed from packaging to master November 23, 2020 10:42
@schallerdavid
Copy link
Collaborator

Hey @jaimergp,

I could not render with sphinx easily because this notebook is probably missing in some sort of configuration file. How can I include it?

@jaimergp
Copy link
Contributor

Hey @jaimergp,

I could not render with sphinx easily because this notebook is probably missing in some sort of configuration file. How can I include it?

Check docs/talktorials and create the correaponding nblink file there!

@schallerdavid
Copy link
Collaborator

@AndreaVolkamer Everything works well as notebook, colab notebook or html 🎉

Ready for review!

@dominiquesydow dominiquesydow added the new talktorial New talktorial label Dec 11, 2020
@jaimergp jaimergp marked this pull request as ready for review December 11, 2020 09:28
@dominiquesydow dominiquesydow changed the title Ml 018 review T018 - Molecular dynamics simulation Dec 11, 2020
@dominiquesydow dominiquesydow mentioned this pull request Dec 11, 2020
27 tasks
@dominiquesydow dominiquesydow changed the title T018 - Molecular dynamics simulation T019 - Molecular dynamics simulation Dec 11, 2020
@dominiquesydow
Copy link
Collaborator

dominiquesydow commented Dec 11, 2020

@Mika-Le - your talktorial's final index will be T019. Could you please at some point update this PR according to that (let me know if you need help)?

@@ -0,0 +1,1155 @@
{
Copy link
Member

@AndreaVolkamer AndreaVolkamer Dec 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add info to authors, see e.g. T012


Reply via ReviewNB

@@ -0,0 +1,1155 @@
{
Copy link
Member

@AndreaVolkamer AndreaVolkamer Dec 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prepare the protein ligand comples -> complex

Suggestion: I would move the 'Merge protein and ligand' one level lower:

  • Prepare the protein ligand complex
  • Prepare protein
  • Prepare ligand
  • Merge protein and ligand


Reply via ReviewNB

@@ -0,0 +1,1155 @@
{
Copy link
Member

@AndreaVolkamer AndreaVolkamer Dec 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You so nicely explained what to expect in all references, despite

Pierre-Simon Laplace, Oeuvres Complètes de Laplace. Théorie Analytique des Probabilités (volume VII Gauthier-Villars (1820), 3rd ed)

Could you add sth here also?


Reply via ReviewNB

@@ -0,0 +1,1155 @@
{
Copy link
Member

@AndreaVolkamer AndreaVolkamer Dec 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use a block quote for the cited text, see eg T012 (same as linked above)


Reply via ReviewNB

@@ -0,0 +1,1155 @@
{
Copy link
Member

@AndreaVolkamer AndreaVolkamer Dec 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw in some other notebooks that they adapted the figure size, maybe one could apply this here too? check T016 figure 1


Reply via ReviewNB

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought I remembered from our first Hackathon that we want to avoid HTML Syntax, which is the only way I know to scale images in notebooks. But I cannot find the guidelines anymore and might be mistaken. Maybe Jaime can tell us.

@@ -0,0 +1,1155 @@
{
Copy link
Member

@AndreaVolkamer AndreaVolkamer Dec 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe also here one could comment a bit more on the solvent part


Reply via ReviewNB

@@ -0,0 +1,1155 @@
{
Copy link
Member

@AndreaVolkamer AndreaVolkamer Dec 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is starting to go into the details (and maybe too much for an introductory talktorial), but yould we at least link to where to find more information about what e.g. the 'LangevinIntegrator' does?


Reply via ReviewNB

Copy link
Collaborator Author

@Mika-Le Mika-Le Jan 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good Point! I added two sentences and links for further info for now.

@@ -0,0 +1,1155 @@
{
Copy link
Member

@AndreaVolkamer AndreaVolkamer Dec 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set-up or set up, try to use it consistently throughout the notebook


Reply via ReviewNB

@@ -0,0 +1,1155 @@
{
Copy link
Member

@AndreaVolkamer AndreaVolkamer Dec 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comments here and in the next few cells could rather become markdown cells that guide through the process:

Minimize energy and output the minimized system

simulation.minimizeEnergy()
with open(DATA / "topology.pdb", "w") as pdb_file:
    app.PDBFile.writeFile(simulation.topology,
                      simulation.context.getState(getPositions=True, enforcePeriodicBox=True).getPositions(),
                      file=pdb_file,
                      keepIds=True)

Reply via ReviewNB

@@ -0,0 +1,1155 @@
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the number for the 'next' notebook already fixed? If yes, please include it here.

... refer to Txxx ... (maybe even with a link if available)

At the very and, we usually have a small quiz (three questions) can you think of any?


Reply via ReviewNB

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The number is T020.

We could link to the right folder in GitHub or do we wait for the website?

@AndreaVolkamer
Copy link
Member

AndreaVolkamer commented Dec 17, 2020

@Mika-Le the talktorial is GREAT!!! Very easy to follow and very good explanations, well done!
I've added my very little feedback above.
MD simulations are very complex and require quite some tools/libraries, and I think the depth you are covering is great. Since - especially in the practical part some people might still want to know more about the individual steps or libraries, could you please include the links to those there again.
[Note, I did only go through it on reviewnb, without running it on colab, I'll do that as soon as time allows]

@Mika-Le
Copy link
Collaborator Author

Mika-Le commented Jan 20, 2021

Hi @AndreaVolkamer, thank you for the great feedback. I updated the notebook, it is ready to be reviewed again.

@AndreaVolkamer
Copy link
Member

@Mika-Le very well done (@schallerdavid), the notebook is great! I only added tiny textual adaptions.

Unfortunately, I couldn't get it running on colab due to the mentioned GLIBCXX_3.4.26 import error. I tried some fixes from this link, but after the rdkit error was fixed, I got a No module named 'mdtraj.formats.dcd' error?
Maybe @jaimergp since you looked into colab for the other course you have a fix in hand?

As soon as the colab thing is solved, the notebook is ready to be merged!

@jaimergp
Copy link
Contributor

As per our chat, the issue is solved but we can't make sure it works reliably right now because Anaconda.org servers are a bit irresponsive now and installation fails. We have decided to reconvene in a week, since by then openmmforcefields might be on conda forge already (and does not suffer from server issues since it runs on separate network).

@jaimergp
Copy link
Contributor

jaimergp commented Feb 2, 2021

openmmforcefields not yet on conda-forge due to blocking openmm/openmmforcefields#151. We'll wait a bit so we can merge with a conda-forge only env!

@schallerdavid
Copy link
Collaborator

@jaimergp,
everything works out with installing only via conda-forge 🎉. I would be done from my side.


## Categories

This talktorial is part of the following categories: [Collections overview](link)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is outdated. Are we writing the categories in the notebook now?

@@ -0,0 +1,1250 @@
{
Copy link
Contributor

@jaimergp jaimergp Feb 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that !command does not stop the cell from being executed. "Dependencies successfully installed" will be showed regardless the success of the conda install command. This, in addition to the cleared outputs, can be very misleading... We should check whether things were actually installed or not, maybe try importing openmm or something similar. Only if that succeeds we clear outputs and report success. Otherwise we want the logs to see what went wrong (e.g. the random network errors we saw).


Reply via ReviewNB

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it pythonic/ok to nest try except blocks? Something along the lines of

try:
	import condacolab 
    [...]
	try: 
			import rdkit
            clear_output()
            print("Installation successful")
  	except:
      		print("Dependencies not installed")
except: 
  	print("Not on Colab")

If not, what would be the better way to check for successful installation? Any If-else-checks possible for installed dependencies?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd use the else clause of the try block, which will only be reached if no exception was raised during try

try:
    import condacolab 
    ...
except: 
    print("Not on Colab")
else:
    try: 
        import rdkit
        clear_output()
        print("Installation successful")
    except:
        print("Dependencies not installed!")

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that is way better

@@ -0,0 +1,1250 @@
{
Copy link
Contributor

@jaimergp jaimergp Feb 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We capitalize Python and Protein Data Bank.


Reply via ReviewNB

@@ -0,0 +1,1250 @@
{
Copy link
Contributor

@jaimergp jaimergp Feb 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RCSB offers a simple way to get PDBs via URLs: https://files.rcsb.org/download/3POZ.pdb. Do we need the get_pdb_file function? We could use requests instead. Up to you!


Reply via ReviewNB

@@ -0,0 +1,1250 @@
{
Copy link
Contributor

@jaimergp jaimergp Feb 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PDBFixer should be able to deal with Path objects but it does not now. We should open an issue for that.

Also, can we parameterize the pH value with an optional keyword? (..., pH=7.0):


Reply via ReviewNB

@@ -0,0 +1,1250 @@
{
Copy link
Contributor

@jaimergp jaimergp Feb 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Capitalize Python.


Reply via ReviewNB

@@ -0,0 +1,1250 @@
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought OpenForceField could convert to OpenMM just fine. Am I missing something?


Reply via ReviewNB

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we have to convert the unit of the positions we extract topology and positions separately and create a modeller object from there. Maybe a modeller object could be created from the openff Molecule directly, but what about the unit conversion then? We could probably do it at another step of the process, if that's better. Any ideas?

OpenFF Molecule Docs

@@ -0,0 +1,1250 @@
{
Copy link
Contributor

@jaimergp jaimergp Feb 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure you need that +1 in complex_positions[len(protein.positions) + 1: total_atoms]? Python indexing is [start, end), so I would assume that we can start the 2nd range where the 1st ended, not shifted +1. What's the extra atom? You can also drop 0 and total_atoms. In other words:

    complex_positions = unit.Quantity(np.zeros([total_atoms, 3]), unit=unit.nanometers)
    complex_positions[:len(protein.positions)] = protein.positions  # add protein positions
    complex_positions[len(protein.positions):] = ligand.positions  # add ligand positions

Reply via ReviewNB

@@ -0,0 +1,1250 @@
{
Copy link
Contributor

@jaimergp jaimergp Feb 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Link to TIP3P and GAFF references, maybe?

Capitalize Python.


Reply via ReviewNB

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't find anything suitable about tip3p, especially not in context with amber. Added a link to the wikipedia article about water models which includes a section for tip3p. Better recommendations are welcome

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the canonical citation: https://aip.scitation.org/doi/10.1063/1.445869

@@ -0,0 +1,1250 @@
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This # NBVAL_CHECK_OUTPUT will almost certainly fail because the minimization is a stochastic process and we are drawing velocities from a distribution, so for sure we will not get the same exact numbers every time.

Remove this comment and maybe, in a new cell, check if everything ran correctly by checking that the simulation objects has reached the desired number of steps.


Reply via ReviewNB

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Mika-Le

How about removing the #NBVAL_CHECK_OUTPUT from the simulation step cell and adding another cell afterwards with the following content:

simulation_time = round(simulation.context.getState().getTime().value_in_unit(unit.femtoseconds))
if simulation_time == steps * 2:
print("Simulation finished successfully!")
else:
print("Simulation failed!")
# NBVAL_CHECK_OUTPUT

@jaimergp
Copy link
Contributor

This is awesome! I am sure OpenMM users will like this material a lot! Thanks for putting it together!

I have added some minor comments that do not change the outcome or purpose of the talktorial, just some technical details we need to take into account.

@jaimergp
Copy link
Contributor

Ping me again when the comments are addressed so I resolve the conflicts and merge!

@jaimergp jaimergp changed the base branch from master to t011-base April 27, 2021 08:19
@jaimergp jaimergp merged commit a5f38e2 into t011-base Apr 27, 2021
@jaimergp jaimergp deleted the ml-018-review branch April 27, 2021 08:20
@jaimergp
Copy link
Contributor

Thanks @Mika-Le! 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new talktorial New talktorial
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants