SUEP Coffea Dask

Repository for SUEP using fastjet with awkward input from PFnano nanoAOD samples


Example worfkflow for SUEP is included.

Each workflow can be a separate "processor" file, creating the mapping from NanoAOD to the histograms we need. Workflow processors can be passed to the script along with the fileset these should run over. Multiple executors can be chosen (for now iterative - one by one, uproot/futures - multiprocessing and dask-slurm).

To run the example, run:

python --workflow SUEP

Example plots can be found in make_some_plots.ipynb though we might want to make that more automatic in the end.


Coffea installation with Miniconda

For installing Miniconda, see also

# Run and follow instructions on screen

NOTE: always make sure that conda, python, and pip point to local Miniconda installation (which conda etc.).

You can either use the default environmentbase or create a new one:

# create new environment with python 3.7, e.g. environment of name `coffea`
conda create --name coffea python=3.7
# activate environment `coffea`
conda activate coffea

Install coffea, xrootd, and more. SUEP analysis uses Fastjet with awkward array input (fastjet>= and vector:

pip install git+ #latest published release with `pip install coffea`
conda install -c conda-forge xrootd
conda install -c conda-forge ca-certificates
conda install -c conda-forge ca-policy-lcg
conda install -c conda-forge dask-jobqueue
conda install -c anaconda bokeh 
conda install -c conda-forge 'fsspec>=0.3.3'
conda install dask
conda install pytables
pip install --pre fastjet
pip install vector

For work at the LPC or coffea-casa the fastjet package is already included in the relevant singularity image and it's not required to install it in the local environment (see below)

Other installation options for coffea


Running jupyter remotely

See also

  1. On your local machine, edit .ssh/config:
Host lxplus*
  User <your-user-name>
  ForwardX11 yes
  ForwardAgent yes
  ForwardX11Trusted yes
Host *_f
  LocalForward localhost:8800 localhost:8800
  ExitOnForwardFailure yes
  1. Connect to remote with ssh lxplus_f
  2. Start a jupyter notebook:
jupyter notebook --ip= --port 8800 --no-browser
  1. URL for notebook will be printed, copy and open in local browser

Scale-out (Sites)

Scale out can be notoriously tricky between different sites. Coffea's integration of slurm and dask makes this quite a bit easier and for some sites the ``native'' implementation is sufficient, e.g Condor@DESY. However, some sites have certain restrictions for various reasons, in particular Condor @CERN and @FNAL.


The fastjet package is already included in the relevant singularity image and it's not required to install it in the local environment

Follow setup instructions at After starting the singularity container run with

python --wf SUEP --executor dask/lpc --isMC=1 --era=2018

Condor@CERN (lxplus)

Only one port is available per node, so its possible one has to try different nodes until hitting one with 8786 being open. Other than that, no additional configurations should be necessary.

python --wf SUEP --executor dask/lxplus --isMC=1 --era=2018

Coffea-casa (Nebraska AF)

The fastjet package is already included in the relevant singularity image and it's not required to install it in the local environment

Coffea-casa is a JupyterHub based analysis-facility hosted at Nebraska. For more information and setup instuctions see

After setting up and checking out this repository (either via the online terminal or git widget utility run with

python --wf SUEP --executor dask/casa --isMC=1 --era=2018

Authentication is handled automatically via login auth token instead of a proxy. File paths need to replace xrootd redirector with "xcache", does this automatically.

Condor@MIT (T3home000)

After setting up and checking out this repository (either via the online terminal or git widget utility run with

python --wf SUEP --executor dask/mit --isMC=1 --era=2018

uses 'dashboard_address': 8000 ssh -L 8000:localhost:8000


To Run Locally for testing

to run the producer

python3 --isMC=0/1 --era=201X --dataset=<dataset> --infile=XXX.root

If you do not have the requirements set up then you can also run this through the docker container that the coffea team provides. This is simple and easy to do. You just need to enter the Singularity and then issue the command above. To do this use:

singularity shell -B ${PWD}:/work /cvmfs/

If there are files in other folders that are necessary (The folder with your NTuples for example) you can bind additional folders like with the following which will allow one to access the files in the /mnt directory:

export SINGULARITY_BIND="/mnt"

Manually control condor jobs rather than Dask

The file which will submit Condor jobs for all the files in specified datasets. This submission currenty uses xrdfs to find the files stored on Kraken. An example submission can be seen below:

python --isMC=1 --era=2018 --tag=<tag name> --input=filelist/list_2018_MC_A01.txt 

The submission will name a directory in the output directory after the tage name you input. If the tag already exists use the --force option if you are trying to resubmit/overwrite.

Note that this submission will look for the dataset xsec in xsections_.yaml.

To monitor and resubmit jobs we can use the file.

python --tag=<tag name> --input=filelist/list_2018_MC_A01.txt

To resubmit you must specify to resubmit like below:

python --tag=<tag name> --input=filelist/list_2018_MC_A01.txt -r=1