Downloading Datasets#

Raw datasets will be downloaded from their respective sources and built the first time it is used. This may take a while.

Processed datasets are available from Zenodo (https://zenodo.org/record/8282470) and we provide a CLI tool to download them.

Note

If you wish to specify a custom location for the datasets, you can set the DATA_PATH environment variable.

export DATA_PATH=/path/to/where/you/want/datasets # e.g., `export DATA_PATH="proteinworkshop/data"`
workshop download <DATASET_NAME>
# Download pre-training datasets
workshop download pdb
workshop download afdb_rep_v4
workshop download cath

# Download downstream datasets
workshop download ec_reaction
workshop download fold_classification
workshop download antibody_developability
...

See also

Dataset