Skip to content

New structural distributional shifts for evaluating graph models

Notifications You must be signed in to change notification settings

yandex-research/structural-graph-shifts

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Evaluating Robustness and Uncertainty of Graph Models Under Structural Distributional Shifts

This repository provides an official implementation of experimental framework for the paper Evaluating Robustness and Uncertainty of Graph Models Under Structural Distributional Shifts (NeurIPS 2023)

Overview

To evaluate the performance of graph models, it is important to test them on diverse and meaningful distributional shifts. However, most graph benchmarks that consider distributional shifts for node-level problems focus mainly on node features, while data in graph problems is primarily defined by its structural properties. In this work, we propose a general approach for inducing diverse distributional shifts based on graph structure.

Implementation of Structural Shifts in Graph ML Frameworks

Our approach to create data splits with structural distributional shifts can be accessed in DGL via dgl.data.add_node_property_split and in PyG via torch_geometric.transforms.NodePropertySplit.

Installation

This code requires the packages listed in environment.yaml. You can create a separate conda environment with these dependecies by running the following command in the root directory of this project:

conda env create -f environment.yaml

Just in case, you can also use instruction.txt — a list of conda commands that were run to create the same environment and install the necessary packages.

Running Experiments

If you are in the root directory of this project, you can run an experiment on <dataset_name> graph dataset with <strategy_name> split strategy for <version_name> version of <method_name> method using the following command:

python main.py --run_config_path ./configs/run_configs/<dataset_name>/<strategy_name>/<method_name>/<version_name>/run_config.yaml

For instance, if you want to run an experiment with the standard version of GPN (Graph Posterior Network) on AmazonComputer dataset with popularity data split, try this one:

python main.py --run_config_path ./configs/run_configs/amazon-computer/popularity/gpn/standard/run_config.yaml

Other possible values for <dataset_name> and <method_name> can be found by the names of config files in the corresponding configs subdirectories — configs/dataset_configs/ and configs/method_configs/.

Repository Structure

This repository is organised as follows:

configs

Here you can see various subdirectories containing structured .yaml files with experiment configurations:

  • datamodule_condigs: configurations for pl.LightningDataModule that are used by pl.Trainer managers
  • dataset_configs: description of datasets, including basic dataset properties and exploited split strategies
  • experiment_configs: regulations for pl.Trainer managers on how to perform training and inference
  • method_configs: parameters of pl.LightningModule that describe how pl.Trainer managers should use the underlying models
  • run_configs: intermediate configurations that store paths to other config files and target directory for saving experiments
  • trainer_configs: parameters for pl.Trainer managers that directly perform training and inference

datasets

This subdirectory contains all the proposed datasets and corresponding data splits. For more technical information, please refer to the README.md file inside the datasets subdirectory.

source

Here you can find the source code of our experimental framework:

  • data: everything related to data processing and loading
  • experiment: some classes that are used to setup necessary dependecies and run experiments
  • layers: implementation of considered model architectures
  • metrics: various routine for computing metrics
  • modules: classes that describe how particular models are used by managers at some specific training or evaluation stage
  • utils: general utils that do not belong to source.data, source.layers or source.modules, but support execution of experiments (sync configs, save results, etc.)

main.py

This main script for loading experimental configs and performing training or evaluation.

If you want to change the parameters of your experiment, whether it is the data split strategy, the hidden dimension of the model layer, the index of GPU at your server, or something else, please check the corresponding configs subdirectory. Also, if you need to access the proposed graph datasets or associated data splits, please refer to the dataset subdirectories. Finally, if you are interested in the source code for our experimental pipeline, including models, methods and metrics, you should take a look at the source subdirectories.