This repository contains scripts for generating ESM2 embeddings. If your machine has a GPU, the script will automatically utilize it for faster embedding generation. If you're running the script on a cluster with SLURM, a pre-configured bash script is provided for this purpose.
You can select the desired ESM2 model by uncommenting the appropriate line in the compute-esm.py
file:
# model, alphabet = esm.pretrained.esm2_t33_650M_UR50D()
model, alphabet = esm.pretrained.esm2_t36_3B_UR50D()
# model, alphabet = esm.pretrained.esm2_t48_15B_UR50D()
To generate embeddings, run the compute-esm.py
script using the following command:
python3 compute-esm.py --input /path/to/input --output /path/to/output
Here, --input
specifies the path to the text files containing sequences (refer to the example format in the data
folder), and --output
defines the path where the generated embeddings will be saved.
To run the script on a SLURM-managed cluster, use the provided bash script:
sbatch --gpus=1 run-sbatch.sh
Before executing, make sure to update the --input
and --output
parameters inside the run-sbatch.sh
script.