Skip to content

Initial check-in of CoRE-MOF notebook #75

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Mar 5, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 21 additions & 0 deletions notebooks/CoRE-MOF/README.md
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add to the ReadMe some information about the dependency required, the Author. I can see there is some of this info in the Notebook itself, but it would be nice to have it in the ReadMe so folk can see it when they open the folder in github

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look better? While you're here, I see sonarcloud found a problem that I don't really think is a problem, but no idea how to tell it that - any ideas?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree its not an issue, we can formally accept it, but its not blocking merging now

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aye, I've accepted the issue, as it does not apply to this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't the notebook need PACMANCharges? I'd just add it under a requirements header

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added in the requirement for PACMAN charge

Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# CoRE-MOF

## Introduction

This package contains shared functionality that is used to export entries used by CoRE-MOF

download_unmodified_MOFs_from_CSD.ipynb: a script for downloading structures of CSD-unmodified dataset.
list_coremof_csd_unmodified_20250227.json: a list of structures of CSD-unmodified dataset.
structures: an successful case by using the script.

## Requirements

[CoRE-MOF tools](https://coremof-tools.readthedocs.io/en/latest/index.html), [PACMAN Charge](https://pubs.acs.org/doi/10.1021/acs.jctc.4c00434), CSD Python API

## Licensing Requirements

CSD-Core or better

## Authors

[CoRE-MOF](https://coremof-tools.readthedocs.io/en/latest/index.html)
368 changes: 368 additions & 0 deletions notebooks/CoRE-MOF/download_unmodified_MOFs_from_CSD.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,368 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "43d1d18f-6770-4d51-acdc-b6abeefb20e1",
"metadata": {},
"source": [
"## get list of CoRE MOF - CSD - unmodified"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "f2ffddd5-373d-4569-ad30-6227c7fe746e",
"metadata": {},
"outputs": [],
"source": [
"import json"
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "0c5aff3d-56fc-47b6-8644-61dbfb9c67c5",
"metadata": {},
"outputs": [],
"source": [
"with open(\"./list_coremof_csd_unmodified_20250227.json\", \"r\") as f:\n",
" csd_unmodified = json.load(f)"
]
},
{
"cell_type": "markdown",
"id": "8c3447f3-9938-48d4-a8b8-939ac3ce7f40",
"metadata": {},
"source": [
"### there are two subset (CR & NCR)\n",
"\n",
"```python\n",
" CSD Unmodified Dataset # (N = 12,261)\n",
" │\n",
" ├── CR # computation-ready (N = 4,703)\n",
" │ │\n",
" │ ├── ASR # all solvent removed (N = 1,894)\n",
" │ ├── FSR # free solvent removed (N = 2,657)\n",
" │ └── Ion # with ion (N = 152)\n",
" │\n",
" └── NCR # not computation-ready (N = 7,558)\n"
]
},
{
"cell_type": "code",
"execution_count": 36,
"id": "55deb24d-8d13-441f-acd6-bca911ee75ae",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"dict_keys(['CR', 'NCR'])"
]
},
"execution_count": 36,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"csd_unmodified.keys()"
]
},
{
"cell_type": "code",
"execution_count": 38,
"id": "0936cfd5-bb33-4e14-9de4-30282bd7d40a",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"dict_keys(['ASR', 'FSR', 'ION'])"
]
},
"execution_count": 38,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"csd_unmodified[\"CR\"].keys()"
]
},
{
"cell_type": "markdown",
"id": "1333bfb5-811e-480b-96e2-1b822449a2a6",
"metadata": {},
"source": [
"**Note that the refcode in the CoRE MOF DB has an additional string ending such as “_ASR_pacman”.**"
]
},
{
"cell_type": "markdown",
"id": "4a9fc947-3205-46d4-83a2-5fed8176d900",
"metadata": {},
"source": [
"**For CR subset, there are CoRE MOF ID and REFCODE.**\n",
"**For NCR subset, there is only REFCODE.**"
]
},
{
"cell_type": "code",
"execution_count": 60,
"id": "73643f82-05ca-4905-8699-1e717aa16f2c",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"ABUXUT_ASR_pacman\n",
"2016[Cu][nan]3[ASR]1\n"
]
}
],
"source": [
"# example for CR\n",
"print(csd_unmodified[\"CR\"][\"ASR\"][1][0])\n",
"print(csd_unmodified[\"CR\"][\"ASR\"][1][1])"
]
},
{
"cell_type": "code",
"execution_count": 64,
"id": "ec0e0fd1-e20f-4459-bcc0-3ddcca5cd9d8",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"ABECIX_FSR_pacman\n"
]
}
],
"source": [
"# example for NCR\n",
"print(csd_unmodified[\"NCR\"][0])"
]
},
{
"cell_type": "markdown",
"id": "a847770f-4dac-4888-a469-f050e330d959",
"metadata": {},
"source": [
"## download original CIFs from CSD\n",
"you need install [*CSD python API*](https://downloads.ccdc.cam.ac.uk/documentation/API/installation_notes.html) and activate the licence first."
]
},
{
"cell_type": "markdown",
"id": "ae5f4c8c-2266-4638-b47c-aeb3ad293f30",
"metadata": {},
"source": [
"and install [CoREMOF_tools](https://coremof-tools.readthedocs.io/en/latest/index.html) by `pip install CoREMOF-tools`"
]
},
{
"cell_type": "code",
"execution_count": 70,
"id": "91208e59-7bf6-4812-89b6-13943ba35a99",
"metadata": {},
"outputs": [],
"source": [
"from CoREMOF.structure import download_from_CSD"
]
},
{
"cell_type": "code",
"execution_count": 74,
"id": "be948ea1-8a5e-4a72-be5e-7a8c0059043f",
"metadata": {},
"outputs": [],
"source": [
"### download ASR structure\n",
"\n",
"for refcodes in csd_unmodified[\"CR\"][\"ASR\"][:10]: # test for 10 structures\n",
" refcode = refcodes[0].replace(\"_ASR_pacman\", \"\")\n",
" download_from_CSD(refcode=refcode, output_folder=\"./structures/CR/ASR\")"
]
},
{
"cell_type": "code",
"execution_count": 76,
"id": "75e0c9d2-bdb2-424d-9158-29003a9fde16",
"metadata": {},
"outputs": [],
"source": [
"### download FSR structure\n",
"\n",
"for refcodes in csd_unmodified[\"CR\"][\"FSR\"][:10]: # test for 10 structures\n",
" refcode = refcodes[0].replace(\"_FSR_pacman\", \"\")\n",
" download_from_CSD(refcode=refcode, output_folder=\"./structures/CR/FSR\")"
]
},
{
"cell_type": "code",
"execution_count": 78,
"id": "a0399d5e-2a57-4ede-b95e-0ba263fd4ae6",
"metadata": {},
"outputs": [],
"source": [
"### download Ion structure\n",
"\n",
"for refcodes in csd_unmodified[\"CR\"][\"ION\"][:10]: # test for 10 structures\n",
" refcode = refcodes[0].replace(\"_ion_pacman\", \"\")\n",
" download_from_CSD(refcode=refcode, output_folder=\"./structures/CR/Ion\")"
]
},
{
"cell_type": "code",
"execution_count": 86,
"id": "691e839a-387b-44d9-88b2-2e27b44bee94",
"metadata": {},
"outputs": [],
"source": [
"### download NCR structure\n",
"\n",
"for refcode in csd_unmodified[\"NCR\"][:10]: # test for 10 structures\n",
" refcode = refcode.split(\"_\")[0]\n",
" download_from_CSD(refcode=refcode, output_folder=\"./structures/NCR/\")"
]
},
{
"cell_type": "markdown",
"id": "61e48f31-6cc1-4388-9095-b88270351ec1",
"metadata": {},
"source": [
"### process the structures"
]
},
{
"cell_type": "markdown",
"id": "b450249b-421c-429d-9f81-8cffd0611c0e",
"metadata": {},
"source": [
"since solvent removal is not required, only make primitive cell and make P1 are needed."
]
},
{
"cell_type": "code",
"execution_count": 110,
"id": "5a5af1da-b1a5-4ea7-ab76-9c6a074f4497",
"metadata": {},
"outputs": [],
"source": [
"from CoREMOF.structure import make_primitive_p1"
]
},
{
"cell_type": "code",
"execution_count": 116,
"id": "1abd657b-4771-49e4-9bef-9bd2f1cb97eb",
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"structure_pri = make_primitive_p1(filename=\"./structures/CR/ASR/\"+csd_unmodified[\"CR\"][\"ASR\"][0][0].split(\"_\")[0]+\".cif\") # ABAVOP "
]
},
{
"cell_type": "markdown",
"id": "d0120d89-9667-45ea-b71a-aa19d6ab90fc",
"metadata": {},
"source": [
"predict partial atom charge by [PACMAN Charge](https://pubs.acs.org/doi/10.1021/acs.jctc.4c00434)\n",
"install by `pip install pip install PACMAN-charge`"
]
},
{
"cell_type": "code",
"execution_count": 18,
"id": "001ba16b-c921-41bf-acb9-da36bdf6f8c6",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CIF Name: ./structures/CR/ASR/ABAVOP.cif\n",
"Charge Type: DDEC6\n",
"Digits: 10\n",
"Atom Type: True\n",
"Neutral: True\n",
"Keep Connect: False\n",
"Compelete and save as ./structures/CR/ASR/ABAVOP_pacman.cif\n"
]
}
],
"source": [
"from PACMANCharge import pmcharge\n",
"pmcharge.predict(cif_file=\"./structures/CR/ASR/\"+csd_unmodified[\"CR\"][\"ASR\"][0][0].split(\"_\")[0]+\".cif\", # ABAVOP \n",
" charge_type=\"DDEC6\",\n",
" digits=10,\n",
" atom_type=True,neutral=True,\n",
" keep_connect=False)"
]
},
{
"cell_type": "markdown",
"id": "a7c0a0af-4549-405b-96df-7b4d56525b51",
"metadata": {},
"source": [
"**if you want to use CoRE MOF ID, you can change REFCODE to CoRE MOF ID**"
]
},
{
"cell_type": "code",
"execution_count": 22,
"id": "4117005e-abde-4d54-bc4b-62a38c0758dd",
"metadata": {},
"outputs": [],
"source": [
"import os"
]
},
{
"cell_type": "code",
"execution_count": 26,
"id": "3069a5d1-4b34-45c4-9e78-ba8f5468a4ea",
"metadata": {},
"outputs": [],
"source": [
"os.rename(\"./structures/CR/ASR/\"+csd_unmodified[\"CR\"][\"ASR\"][0][0].split(\"_\")[0]+\"_pacman.cif\", \"./structures/CR/ASR/\" + csd_unmodified[\"CR\"][\"ASR\"][0][1]+\".cif\") \n",
"# ABAVOP_pacman.cif -> 2004[Co][rtl]3[ASR]2.cif"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e294f9dd-d61e-4078-be4e-57b3c34b0846",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python [conda env:base] *",
"language": "python",
"name": "conda-base-py"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.5"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Loading
Loading