Config Templates#

For configuration management in the benchmark, we use hydra. The configuration files are stored in config folder.

Configs are built through composing different individual configs for each component according to a template schema.

Below, we document the different config templates used in the benchmark.

Training#

This config can be used for both training a model from scratch and pre-training a model.

config/train.yaml#

# @package _global_

# === 1. Set config parameters ===
name: "" # default name for the experiment, "" means logger (eg. wandb) will generate a unique name
seed: 52 # seed for random number generators in pytorch, numpy and python.random
num_workers: 16 # number of subprocesses to use for data loading.

# === 2. Specify defaults here. Defaults will be overwritten by equivalently named options in this file ===
defaults:
  - env: default
  - dataset: cath
  - features: ca_seq
  - encoder: egnn
  - decoder: default
  - transforms: default
  - callbacks: default
  - optimiser: adam
  - scheduler: none
  - trainer: gpu
  - extras: default
  - hydra: default
  - metrics: none
  - task: inverse_folding
  - logger: csv # Also supported: tensorboard, wandb
  # debugging config (enable through command line, e.g. `python train.py debug=default)
  - debug: null
  - optional hparams: ${encoder}_${features}
  - _self_ # see: https://hydra.cc/docs/upgrades/1.0_to_1.1/default_composition_order/. Adding _self_ at bottom means values in this file override defaults.

task_name: "train"
test: False
#compile: True

Finetuning#

This config should be used to finetune a pre-trained model on a downstream task.

config/finetune.yaml#

# @package _global_

# === 1. Set config parameters ===
name: "" # default name for the experiment, "" means logger (eg. wandb) will generate a unique name
seed: 52 # seed for random number generators in pytorch, numpy and python.random
num_workers: 16 # number of subprocesses to use for data loading.

# === 2. Specify defaults here. Defaults will be overwritten by equivalently named options in this file ===
defaults:
  - env: default
  - dataset: cath
  - features: ca_seq
  - encoder: egnn
  - decoder: default
  - transforms: none
  - callbacks: default
  - optimiser: adam
  - scheduler: none
  - trainer: gpu
  - extras: default
  - hydra: default
  - metrics: none
  - task: inverse_folding # See: /proteinworkshop/config/task/
  - logger: wandb # wandb, tensorboard, csv
  - finetune: default # Specifies finetuning config. See: proteinworkshop/config/finetune/
  # debugging config (enable through command line, e.g. `python train.py debug=default)
  - debug: null
  - optional hparams: ${encoder}_${features}
  - _self_ # see: https://hydra.cc/docs/upgrades/1.0_to_1.1/default_composition_order/. Adding _self_ at bottom means values in this file override defaults.

task_name: "finetune"

#compile: True
compile: False

# simply provide checkpoint path to resume training
ckpt_path: null