-
Notifications
You must be signed in to change notification settings - Fork 202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
T012 - Data acquisition from KLIFS #79
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
View / edit / reply to this conversation on ReviewNB jaimergp commented on 2020-12-07T15:05:27Z Maybe? programmatic access to this database dominiquesydow commented on 2020-12-07T15:41:11Z Yes, better, thanks! |
View / edit / reply to this conversation on ReviewNB jaimergp commented on 2020-12-07T15:05:27Z There are 518 protein kinases encoded in the human genome, which were clustered based on their sequence into eight main kinase groups (AGC, CAMK, CK1, CMGC, STE, TK, TKL and Other This depends on your source... We have seen slightly different numbers at
Protein kinases catalyze the phosphorylation of tyrosine, serine and theorine residues of themselves or other kinases using their bound ATP. Histidine residues can also be targeted! dominiquesydow commented on 2020-12-07T15:41:21Z Good point, added both. |
View / edit / reply to this conversation on ReviewNB jaimergp commented on 2020-12-07T15:05:28Z to programmatically access
Luckily, there is a solution - some websites provide you with a document that defines the REST API schema for you (OpenAPI specification). Link: https://swagger.io/docs/specification/about/. Swagger opensourced the whole thing and now the schema is called OpenAPI :) Replace Swagger definitions for OpenAPI definitions in the rest of the cell/notebook too!
dominiquesydow commented on 2020-12-07T15:44:42Z Thanks for clarifying!
|
View / edit / reply to this conversation on ReviewNB jaimergp commented on 2020-12-07T15:05:29Z
|
View / edit / reply to this conversation on ReviewNB jaimergp commented on 2020-12-07T15:05:30Z When you only want the first element of a comprehension you can use a generator to avoid iterating (and storing) the whole thing on memory. Like this:
kinase_klifs_id = next(kinase.kinase_ID for kinase in kinases if kinase.name==kinase_name)
Same applies to other instances of this notebook |
View / edit / reply to this conversation on ReviewNB jaimergp commented on 2020-12-07T15:05:31Z I don't see the image here but it might be a reviewnb thing dominiquesydow commented on 2020-12-07T15:47:18Z I did not push the rendered images, yet, will do so with the version for the full review for David.
|
View / edit / reply to this conversation on ReviewNB jaimergp commented on 2020-12-07T15:05:31Z Truncate input list to avoid that warning:
show_ligands(ligands["ligand.smiles"].to_list()[:10]) |
View / edit / reply to this conversation on ReviewNB jaimergp commented on 2020-12-07T15:05:32Z I think multiple returns can be expressed like this in docstrings:
Returns ------- molcomplex : str Complex structural data protein : str protein structural data ligands : list of str list of ligand SMILES. |
Awesome content @dominiquesydow! I left a few comments clarifying some terms and definitions, and a couple of unimportant technicalities. Always a pleasure to review high quality PRs such as this one! |
Yes, better, thanks! View entire conversation on ReviewNB |
View / edit / reply to this conversation on ReviewNB schallerdavid commented on 2020-12-08T16:34:23Z typo?: the following information |
View / edit / reply to this conversation on ReviewNB schallerdavid commented on 2020-12-08T16:34:24Z typo: (IFP) for the example |
View / edit / reply to this conversation on ReviewNB schallerdavid commented on 2020-12-08T16:34:25Z Show only the first 10 file rows. below it looks like you dont use the first 10 rows, but rows 100-110 |
View / edit / reply to this conversation on ReviewNB schallerdavid commented on 2020-12-08T16:34:26Z I dont think all ligands in ChEMBL have drug-like properties dominiquesydow commented on 2020-12-11T08:14:11Z Good point. I took it from the first sentence on their website:
https://www.ebi.ac.uk/chembl/
ChEMBL is a manually curated database of bioactive molecules with drug-like properties. It brings together chemical, bioactivity and genomic data to aid the translation of genomic information into effective new drugs.
Rephrased this to "...ChEMBL, a manually curated compound database" |
View / edit / reply to this conversation on ReviewNB schallerdavid commented on 2020-12-08T16:34:26Z example ligand W19: 10 not sure what this means dominiquesydow commented on 2020-12-11T08:17:51Z This is the number of ChEMBL bioactivity values available at KLIFS ligand W19 (i.e. 10 values).
Rephrased from Number of bioactivity values for example ligand W19: 10 to Number of ChEMBL bioactivity values available at KLIFS for example ligand W19: 10
Is that better? Or am I missing the point you were making? schallerdavid commented on 2020-12-11T20:33:11Z Sorry my fault, I just did not read the code :). All clear from my side |
View / edit / reply to this conversation on ReviewNB schallerdavid commented on 2020-12-08T16:34:27Z the opencadd version in test_env.yaml does not contain a databases module dominiquesydow commented on 2020-12-11T08:20:35Z True! This is still one TODO in the PR description - will take care of this today.
[ ] Update environment: Use latest
|
View / edit / reply to this conversation on ReviewNB schallerdavid commented on 2020-12-08T16:34:27Z typo: per residue |
View / edit / reply to this conversation on ReviewNB schallerdavid commented on 2020-12-08T16:34:28Z typo: and |
View / edit / reply to this conversation on ReviewNB schallerdavid commented on 2020-12-08T16:34:29Z typo: there are a lot of
typo: of Gefitinib is E
typo: showning high activity |
View / edit / reply to this conversation on ReviewNB schallerdavid commented on 2020-12-08T16:34:30Z typo: You can specify one or more kinases of your choice to narrow down ... |
View / edit / reply to this conversation on ReviewNB schallerdavid commented on 2020-12-08T16:34:31Z typo: are available for the kinase CDK2? |
Great talktorial @dominiquesydow ! I mainly found typos. Everything ran smoothly. Only installing the environment from the test_env.yaml resulted in installation of an older opencadd version without the databases module. |
Good point. I took it from the first sentence on their website:
https://www.ebi.ac.uk/chembl/
ChEMBL is a manually curated database of bioactive molecules with drug-like properties. It brings together chemical, bioactivity and genomic data to aid the translation of genomic information into effective new drugs.
Rephrased this to "...ChEMBL, a manually curated compound database" View entire conversation on ReviewNB |
This is the number of ChEMBL bioactivity values available at KLIFS ligand W19 (i.e. 10 values).
Rephrased from Number of bioactivity values for example ligand W19: 10 to Number of ChEMBL bioactivity values available at KLIFS for example ligand W19: 10
Is that better? Or am I missing the point you were making? View entire conversation on ReviewNB |
True! This is still one TODO in the PR description - will take care of this today.
[ ] Update environment: Use latest
View entire conversation on ReviewNB |
Thanks, I have rephrased the sentence to a broader meaning:
"EGFR was the first receptor for which a relationship between mutations, overexpression and cancer (tumor growth) has been shown. " View entire conversation on ReviewNB |
Thanks @schallerdavid! My apologies for the many typos! |
Sorry my fault, I just did not read the code :). All clear from my side View entire conversation on ReviewNB |
Last edits: Zoom in ligand to see hinge region highlights; fix typos/format errors; set seed for random kinase selection
Details
Content review
here
.DataFrames
)Code review
a_variable_name
vsaVariableName
)black -l 99
)for i in range(len(list))
(see slides)# TODO: CI
- added# NBVAL_CHECK_OUTPUT
opencadd
import ...
lines are at the top (practice part) cell, ordered by standard library / 3rd party packages / our own (teachopencadd.*
)TODOs
opencadd
versionopencadd.databases.klifs
TODO after PR is merged!