Skip to content

Materials for the Computer-aided molecular design course at THM Gießen WS2019/2020

Notifications You must be signed in to change notification settings

dkuhn/THM_CAMD_2019

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 

Repository files navigation

THM_CAMD_2019

Materials for the Computer-aided molecular design course at THM Gießen WS2019/2020

In this practical exercise between both lecturing blocks you will become a computational chemist. You will collect affinity data for a protein kinase target. If you do not know what protein kinase target to work on, please drop me an email.

Please perform the follwing steps:

  1. Go to Uniprot find out human Uniprot identifier for your protein kinase.
  2. Select IC50 assays for target in Chembl and remove PAINS compounds
  3. Create a pandas dataframe containing IC50 data and keep values with operators
  4. Average IC50 data for values without operator. In case of IC50 values with operators at different ligand concentrations use just one, the one with more information content: >10, >1 --> choose >10. In case you have both IC50 values for the same compound with and without operator just consider IC50 values without operator.
  5. Create training and test dataset (20%)
  6. Build five different categorical ML models predicting kinase activity using different scikit-learn learners. Use 1uM as activity threshold
  7. Analyse models using accuracy, sensitivity and specificity using cross-validation. Check your final model best model on the test data set.
  8. Select one model with good recall and one model with good precision
  9. Create another training/test split and build a regression model. Select best model based on AUC.

Materials used For each step please re-visit the excellent TeachOpenCADD talktorials jupyter notebooks that we have discussed during our lecture.

You can work with jupyter notebooks, finally please provide a python script (can be based on the jupyter notebooks) that can be executed from command-line and does every analysis step. Make PAINS filtering optional

Thanks and enjoy!

About

Materials for the Computer-aided molecular design course at THM Gießen WS2019/2020

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages