This project aims to develop a model to predict ADHD diagnosis and sex in children and adolescents using functional brain imaging data and socio-demographic information. Special attention is given to ensuring that the model avoids gender bias, as girls are often underdiagnosed with ADHD. The model should also be explainable, using methods like SHAP or LIME, to meet GDPR requirements. The data includes four datasets: Metadata_A
, Metadata_B
, Func
, and label
.
Download the dataset from the provided source. It contains four datasets:
Metadata_A
: Socio-demographic and emotional dataMetadata_B
: Parenting informationFunc
: fMRI dataLabel
: ADHD diagnosis and sex labels
Python Version: 3.11.11
- NumPy: 1.26.4
- Pandas: 2.2.2
- scikit-learn: 1.6.1
- Seaborn: 0.13.2
- Kneed: 0.8.5
- SciPy: 1.13.1
- Plotly: 5.24.1
- Matplotlib: 3.10.0
- Tensorflow: 2.15.0
Follow the steps below to run the code in the correct order:
Before running the code, ensure that Python 3.11.11 is installed. If necessary, update to this version.
Install the required libraries by running the following command in your terminal:
pip install numpy==1.26.4 pandas==2.2.2 scikit-learn==1.4.2 seaborn==0.13.2 kneed==0.8.5 scipy==1.13.0 plotly==5.24.1 matplotlib==3.8.4 tensorflow==2.15.0
- Ensure Python version 3.11.11 is installed.
- Install the required libraries.
- Load the notebook
sm23788_Data_Exploration.ipynb
. - Download the dataset and place it in the specified directory.
- Ensure the dataset files (e.g.,
Metadata_A
,Metadata_B
,Func
,Label
) are located in the correct directory.
- Ensure the dataset files (e.g.,
- Update the paths:
data_path
: This is the path where you load the dataset. Update the notebook with the correct file path where the raw data is stored.preprocessed_data_path
: This is the path where the preprocessed (cleaned) data will be saved after processing. Update the notebook with the correct file path where the preprocessed data should be stored (e.g.,preprocessed_dataset.csv
).
- Run
sm23788_Data_Exploration.ipynb
and start the analysis. - After running the setup, the clean (preprocessed) dataset will be generated and saved in the specified
preprocessed_data_path
. The following files will be saved:X_train.csv
: Training featuresX_test.csv
: Testing featuresy_train.csv
: Training labelsy_test.csv
: Testing labels
- Load the notebook
sm23788_Modelling_Result.ipynb
. - Update the path:
data_path
: This is the path where you saved preprocessed dataset(preprocessed_data_path
). Update the notebook with the correct file path where the preprocessed data is stored.
- Run
sm23788_Modelling_Result.ipynb
and start the analysis.