T012 - Data acquisition from KLIFS #79

dominiquesydow · 2020-12-02T16:47:23Z

Details

Talktorial ID: T012
Title: Data acquisition from KLIFS
Original authors: @jaimergp, @dominiquesydow
Reviewer(s): @dominiquesydow, @schallerdavid
Date of review: 2020-12-02

Content review

Potential labels or categories (e.g. machine learning, small molecules, online APIs): kinases, online APIs, protein-ligand complex, EGFR, profiling data, off-targets
One line summary: Structures and bound ligands for a query kinase are fetched from KLIFS using the website's Swagger API.
The table of contents reflects the talktorial story-line; order of #, ##, ### headers is correct
URLs are linked with meaningful words, instead of pasting the URL directly or linking words like here.
I have spell-checked the notebook
Images have enough resolution to be rendered with quality, without being too heavy.
All figures have a description
Markdown cell content is still in-line with code cell output (whenever results are discussed)
I have checked that cell outputs are not incredibly long (this applies also to DataFrames)
Formatting looks correctly on the Sphinx render (bold, italics, figure placing)

Code review

TODOs

Update environment: Use latest opencadd version
Update opencadd.databases.klifs TODO after PR is merged!
Reread the full talktorial one more time to check again for typos
Prepare talktorial TOC
Replace wiki reference with EGFR and Gefitinib references
Add discussion

review-notebook-app · 2020-12-02T16:47:27Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

review-notebook-app · 2020-12-07T15:05:27Z

View / edit / reply to this conversation on ReviewNB

jaimergp commented on 2020-12-07T15:05:27Z
----------------------------------------------------------------

Maybe?

programmatic access to this database

dominiquesydow commented on 2020-12-07T15:41:11Z
----------------------------------------------------------------

Yes, better, thanks!

review-notebook-app · 2020-12-07T15:05:28Z

View / edit / reply to this conversation on ReviewNB

jaimergp commented on 2020-12-07T15:05:27Z
----------------------------------------------------------------

There are 518 protein kinases encoded in the human genome, which were clustered based on their sequence into eight main kinase groups (AGC, CAMK, CK1, CMGC, STE, TK, TKL and Other

This depends on your source... We have seen slightly different numbers at openkinome/kinodata. I guess you need to cite Manning 2002 for this specific number.

Protein kinases catalyze the phosphorylation of tyrosine, serine and theorine residues of themselves or other kinases using their bound ATP.

Histidine residues can also be targeted!

dominiquesydow commented on 2020-12-07T15:41:21Z
----------------------------------------------------------------

Good point, added both.

review-notebook-app · 2020-12-07T15:05:29Z

View / edit / reply to this conversation on ReviewNB

jaimergp commented on 2020-12-07T15:05:28Z
----------------------------------------------------------------

to programmatically access ~~- from your computer -~~

Luckily, there is a solution - some websites provide you with a document that defines the REST API schema for you (OpenAPI specification).

Link: https://swagger.io/docs/specification/about/. Swagger opensourced the whole thing and now the schema is called OpenAPI :) Replace Swagger definitions for OpenAPI definitions in the rest of the cell/notebook too!

dominiquesydow commented on 2020-12-07T15:44:42Z
----------------------------------------------------------------

Thanks for clarifying!

review-notebook-app · 2020-12-07T15:05:29Z

View / edit / reply to this conversation on ReviewNB

jaimergp commented on 2020-12-07T15:05:29Z
----------------------------------------------------------------

~~Volkamber~~ Volkamer lab

review-notebook-app · 2020-12-07T15:05:30Z

View / edit / reply to this conversation on ReviewNB

jaimergp commented on 2020-12-07T15:05:30Z
----------------------------------------------------------------

When you only want the first element of a comprehension you can use a generator to avoid iterating (and storing) the whole thing on memory. Like this:

kinase_klifs_id = next(kinase.kinase_ID for kinase in kinases if kinase.name==kinase_name)

Same applies to other instances of this notebook

review-notebook-app · 2020-12-07T15:05:31Z

View / edit / reply to this conversation on ReviewNB

jaimergp commented on 2020-12-07T15:05:31Z
----------------------------------------------------------------

I don't see the image here but it might be a reviewnb thing

dominiquesydow commented on 2020-12-07T15:47:18Z
----------------------------------------------------------------

I did not push the rendered images, yet, will do so with the version for the full review for David.

review-notebook-app · 2020-12-07T15:05:32Z

View / edit / reply to this conversation on ReviewNB

jaimergp commented on 2020-12-07T15:05:31Z
----------------------------------------------------------------

Truncate input list to avoid that warning:

show_ligands(ligands["ligand.smiles"].to_list()[:10])

review-notebook-app · 2020-12-07T15:05:33Z

View / edit / reply to this conversation on ReviewNB

jaimergp commented on 2020-12-07T15:05:32Z
----------------------------------------------------------------

I think multiple returns can be expressed like this in docstrings:

    Returns
    -------
    molcomplex : str
        Complex structural data
    protein : str
        protein structural data
    ligands : list of str
        list of ligand SMILES.

jaimergp · 2020-12-07T15:06:48Z

Awesome content @dominiquesydow! I left a few comments clarifying some terms and definitions, and a couple of unimportant technicalities.

Always a pleasure to review high quality PRs such as this one!

dominiquesydow · 2020-12-07T15:41:12Z

Yes, better, thanks!

View entire conversation on ReviewNB

review-notebook-app · 2020-12-08T16:34:23Z

View / edit / reply to this conversation on ReviewNB

schallerdavid commented on 2020-12-08T16:34:23Z
----------------------------------------------------------------

typo?: the following information ~~about~~:

review-notebook-app · 2020-12-08T16:34:24Z

View / edit / reply to this conversation on ReviewNB

schallerdavid commented on 2020-12-08T16:34:24Z
----------------------------------------------------------------

typo: (IFP) for the example

review-notebook-app · 2020-12-08T16:34:25Z

View / edit / reply to this conversation on ReviewNB

schallerdavid commented on 2020-12-08T16:34:25Z
----------------------------------------------------------------

Show only the first 10 file rows.

below it looks like you dont use the first 10 rows, but rows 100-110

review-notebook-app · 2020-12-08T16:34:26Z

View / edit / reply to this conversation on ReviewNB

schallerdavid commented on 2020-12-08T16:34:26Z
----------------------------------------------------------------

I dont think all ligands in ChEMBL have drug-like properties

dominiquesydow commented on 2020-12-11T08:14:11Z
----------------------------------------------------------------

Good point. I took it from the first sentence on their website:

https://www.ebi.ac.uk/chembl/

ChEMBL is a manually curated database of bioactive molecules with drug-like properties. It brings together chemical, bioactivity and genomic data to aid the translation of genomic information into effective new drugs.

Rephrased this to "...ChEMBL, a manually curated compound database"

review-notebook-app · 2020-12-08T16:34:27Z

View / edit / reply to this conversation on ReviewNB

schallerdavid commented on 2020-12-08T16:34:26Z
----------------------------------------------------------------

example ligand W19: 10

not sure what this means

dominiquesydow commented on 2020-12-11T08:17:51Z
----------------------------------------------------------------

This is the number of ChEMBL bioactivity values available at KLIFS ligand W19 (i.e. 10 values).

Rephrased from

Number of bioactivity values for example ligand W19: 10

to

Number of ChEMBL bioactivity values available at KLIFS for example ligand W19: 10

Is that better? Or am I missing the point you were making?

schallerdavid commented on 2020-12-11T20:33:11Z
----------------------------------------------------------------

Sorry my fault, I just did not read the code :). All clear from my side

review-notebook-app · 2020-12-08T16:34:27Z

View / edit / reply to this conversation on ReviewNB

schallerdavid commented on 2020-12-08T16:34:27Z
----------------------------------------------------------------

the opencadd version in test_env.yaml does not contain a databases module

dominiquesydow commented on 2020-12-11T08:20:35Z
----------------------------------------------------------------

True! This is still one TODO in the PR description - will take care of this today.

[ ] Update environment: Use latest opencadd version

review-notebook-app · 2020-12-08T16:34:28Z

View / edit / reply to this conversation on ReviewNB

schallerdavid commented on 2020-12-08T16:34:27Z
----------------------------------------------------------------

typo: per residues position or per residues positions

review-notebook-app · 2020-12-08T16:34:29Z

View / edit / reply to this conversation on ReviewNB

schallerdavid commented on 2020-12-08T16:34:28Z
----------------------------------------------------------------

typo: and ~~keep~~ show the top 10 structures.

review-notebook-app · 2020-12-08T16:34:29Z

View / edit / reply to this conversation on ReviewNB

schallerdavid commented on 2020-12-08T16:34:29Z
----------------------------------------------------------------

typo: there are a lot of

typo: of Gefitinib is EFGFR

typo: showning high activity

review-notebook-app · 2020-12-08T16:34:30Z

View / edit / reply to this conversation on ReviewNB

schallerdavid commented on 2020-12-08T16:34:30Z
----------------------------------------------------------------

typo: You can specify one or more kinases of your choice to narrow down ...

review-notebook-app · 2020-12-08T16:34:31Z

View / edit / reply to this conversation on ReviewNB

schallerdavid commented on 2020-12-08T16:34:31Z
----------------------------------------------------------------

typo: are available for the kinase CDK2?

schallerdavid · 2020-12-08T16:37:18Z

Great talktorial @dominiquesydow !

I mainly found typos. Everything ran smoothly. Only installing the environment from the test_env.yaml resulted in installation of an older opencadd version without the databases module.

dominiquesydow · 2020-12-11T08:14:11Z

Good point. I took it from the first sentence on their website:

https://www.ebi.ac.uk/chembl/

ChEMBL is a manually curated database of bioactive molecules with drug-like properties. It brings together chemical, bioactivity and genomic data to aid the translation of genomic information into effective new drugs.

Rephrased this to "...ChEMBL, a manually curated compound database"

View entire conversation on ReviewNB

dominiquesydow · 2020-12-11T08:17:52Z

This is the number of ChEMBL bioactivity values available at KLIFS ligand W19 (i.e. 10 values).

Rephrased from

Number of bioactivity values for example ligand W19: 10

to

Number of ChEMBL bioactivity values available at KLIFS for example ligand W19: 10

Is that better? Or am I missing the point you were making?

View entire conversation on ReviewNB

dominiquesydow · 2020-12-11T08:20:36Z

True! This is still one TODO in the PR description - will take care of this today.

[ ] Update environment: Use latest opencadd version

View entire conversation on ReviewNB

dominiquesydow · 2020-12-11T08:24:39Z

Thanks, I have rephrased the sentence to a broader meaning:

"EGFR was the first receptor for which a relationship between mutations, overexpression and cancer (tumor growth) has been shown. "

View entire conversation on ReviewNB

dominiquesydow · 2020-12-11T08:32:05Z

Thanks @schallerdavid! My apologies for the many typos!
Will now take care of adding the latest opencadd version to the yml files.

schallerdavid · 2020-12-11T20:33:12Z

Sorry my fault, I just did not read the code :). All clear from my side

View entire conversation on ReviewNB

Last edits: Zoom in ligand to see hinge region highlights; fix typos/format errors; set seed for random kinase selection

Start branch

768be48

dominiquesydow self-assigned this Dec 2, 2020

dominiquesydow changed the base branch from master to t011-base December 2, 2020 16:48

dominiquesydow added 4 commits December 2, 2020 20:58

Add theory (kinases, KLIFS, KLIFS Swagger API)

e87393f

Add practical part (incl. opencadd)

86d3bec

Add bioactivity/IFP analysis + more theory

c27a915

Add tabular to env

eec2b87

dominiquesydow changed the title ~~Refactor and extend KLIFS talktorial T012~~ T012 revamp Dec 5, 2020

dominiquesydow added 11 commits December 5, 2020 08:38

Add bioactivty/IFP analysis + more theory / 2 commits ago add temp utils

a839b53

Merge remote-tracking branch 'origin/ds-012-revamp' into ds-012-revamp

6a8d7a2

Text edits

bb0a045

Apply black-nb; add nbval output check

0da9731

Remove opencadd mprint (use display.Markdown for now)

3c831df

Add talktorial to website

25f7699

Add T012 figure (TOC)

2b36190

Update EGFR/Gefitinib theory (add references)

973fc86

Add discussion

80d5ae2

Rerun notebook (with rendered nglview images)

b8ccbf7

Remove tabular from env

e5d89ed

dominiquesydow mentioned this pull request Dec 11, 2020

Base branch for new talktorials #74

Merged

27 tasks

dominiquesydow added the new talktorial New talktorial label Dec 11, 2020

dominiquesydow changed the title ~~T012 revamp~~ T012 - Data acquisition from KLIFS Dec 11, 2020

dominiquesydow added 3 commits December 11, 2020 11:53

Update yml with latest opencadd

9e748c2

Add @schallerdavid's suggestions

d6e74c2

Update 3 env files with latest opencadd version

1b7d644

dominiquesydow added 4 commits December 13, 2020 15:18

Update 3 env files again with latest opencadd version (PR#62)

e210b2a

Add @schallerdavid to author list; update opencadd usage; proof-reading

4b5d3ca

Apply black-nb -l 99

6db51be

Final proof-reading in rendered version

1a1bd96

Last edits: Zoom in ligand to see hinge region highlights; fix typos/format errors; set seed for random kinase selection

dominiquesydow merged commit 2e99614 into t011-base Dec 14, 2020

dominiquesydow deleted the ds-012-revamp branch December 14, 2020 20:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

T012 - Data acquisition from KLIFS #79

T012 - Data acquisition from KLIFS #79

dominiquesydow commented Dec 2, 2020 •

edited

Loading

review-notebook-app bot commented Dec 2, 2020

review-notebook-app bot commented Dec 7, 2020 •

edited

Loading

review-notebook-app bot commented Dec 7, 2020 •

edited

Loading

review-notebook-app bot commented Dec 7, 2020 •

edited

Loading

review-notebook-app bot commented Dec 7, 2020 •

edited

Loading

review-notebook-app bot commented Dec 7, 2020 •

edited

Loading

review-notebook-app bot commented Dec 7, 2020 •

edited

Loading

review-notebook-app bot commented Dec 7, 2020 •

edited

Loading

review-notebook-app bot commented Dec 7, 2020 •

edited

Loading

jaimergp commented Dec 7, 2020

dominiquesydow commented Dec 7, 2020

review-notebook-app bot commented Dec 8, 2020 •

edited

Loading

review-notebook-app bot commented Dec 8, 2020 •

edited

Loading

review-notebook-app bot commented Dec 8, 2020 •

edited

Loading

review-notebook-app bot commented Dec 8, 2020 •

edited

Loading

review-notebook-app bot commented Dec 8, 2020 •

edited

Loading

review-notebook-app bot commented Dec 8, 2020 •

edited

Loading

review-notebook-app bot commented Dec 8, 2020 •

edited

Loading

review-notebook-app bot commented Dec 8, 2020 •

edited

Loading

review-notebook-app bot commented Dec 8, 2020 •

edited

Loading

review-notebook-app bot commented Dec 8, 2020 •

edited

Loading

review-notebook-app bot commented Dec 8, 2020 •

edited

Loading

schallerdavid commented Dec 8, 2020

dominiquesydow commented Dec 11, 2020

dominiquesydow commented Dec 11, 2020

dominiquesydow commented Dec 11, 2020

dominiquesydow commented Dec 11, 2020

dominiquesydow commented Dec 11, 2020

schallerdavid commented Dec 11, 2020

T012 - Data acquisition from KLIFS #79

T012 - Data acquisition from KLIFS #79

Conversation

dominiquesydow commented Dec 2, 2020 • edited Loading

Details

Content review

Code review

TODOs

review-notebook-app bot commented Dec 2, 2020

review-notebook-app bot commented Dec 7, 2020 • edited Loading

review-notebook-app bot commented Dec 7, 2020 • edited Loading

review-notebook-app bot commented Dec 7, 2020 • edited Loading

review-notebook-app bot commented Dec 7, 2020 • edited Loading

review-notebook-app bot commented Dec 7, 2020 • edited Loading

review-notebook-app bot commented Dec 7, 2020 • edited Loading

review-notebook-app bot commented Dec 7, 2020 • edited Loading

review-notebook-app bot commented Dec 7, 2020 • edited Loading

jaimergp commented Dec 7, 2020

dominiquesydow commented Dec 7, 2020

review-notebook-app bot commented Dec 8, 2020 • edited Loading

review-notebook-app bot commented Dec 8, 2020 • edited Loading

review-notebook-app bot commented Dec 8, 2020 • edited Loading

review-notebook-app bot commented Dec 8, 2020 • edited Loading

review-notebook-app bot commented Dec 8, 2020 • edited Loading

review-notebook-app bot commented Dec 8, 2020 • edited Loading

review-notebook-app bot commented Dec 8, 2020 • edited Loading

review-notebook-app bot commented Dec 8, 2020 • edited Loading

review-notebook-app bot commented Dec 8, 2020 • edited Loading

review-notebook-app bot commented Dec 8, 2020 • edited Loading

review-notebook-app bot commented Dec 8, 2020 • edited Loading

schallerdavid commented Dec 8, 2020

dominiquesydow commented Dec 11, 2020

dominiquesydow commented Dec 11, 2020

dominiquesydow commented Dec 11, 2020

dominiquesydow commented Dec 11, 2020

dominiquesydow commented Dec 11, 2020

schallerdavid commented Dec 11, 2020

dominiquesydow commented Dec 2, 2020 •

edited

Loading

review-notebook-app bot commented Dec 7, 2020 •

edited

Loading

review-notebook-app bot commented Dec 7, 2020 •

edited

Loading

review-notebook-app bot commented Dec 7, 2020 •

edited

Loading

review-notebook-app bot commented Dec 7, 2020 •

edited

Loading

review-notebook-app bot commented Dec 7, 2020 •

edited

Loading

review-notebook-app bot commented Dec 7, 2020 •

edited

Loading

review-notebook-app bot commented Dec 7, 2020 •

edited

Loading

review-notebook-app bot commented Dec 7, 2020 •

edited

Loading

review-notebook-app bot commented Dec 8, 2020 •

edited

Loading

review-notebook-app bot commented Dec 8, 2020 •

edited

Loading

review-notebook-app bot commented Dec 8, 2020 •

edited

Loading

review-notebook-app bot commented Dec 8, 2020 •

edited

Loading

review-notebook-app bot commented Dec 8, 2020 •

edited

Loading

review-notebook-app bot commented Dec 8, 2020 •

edited

Loading

review-notebook-app bot commented Dec 8, 2020 •

edited

Loading

review-notebook-app bot commented Dec 8, 2020 •

edited

Loading

review-notebook-app bot commented Dec 8, 2020 •

edited

Loading

review-notebook-app bot commented Dec 8, 2020 •

edited

Loading

review-notebook-app bot commented Dec 8, 2020 •

edited

Loading