Schumer lab: Submitting slurm jobs
Getting started
There are some great resources for using slurm, including a guide for the Sherlock cluster:
[[1]] and FAS research computing's guide [[2]].
The very basics
Example slurm script header for Sherlock:
#!/bin/bash
#SBATCH --job-name=short_queue_test_job
#SBATCH --time=00:01:00
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --mem=32000
#SBATCH --mail-user=youremail@stanford.edu
You will probably want to edit the following for each job:
name your job: --job-name
give it a time limit: --time=hours:minutes:seconds
memory and resources details, see Sherlock documents for more details: --cpus-per-task, --ntasks,--mem
To actually run this job, you need to generate a file with this header, followed by the job command you'd like to run. See the following example:
cat /home/groups/schumer/example_slurm_short.sh
To submit this example job, navigate to that directory and type:
sbatch example_slurm_short.sh
To check on the status of your job, type:
squeue -u $USER
To cancel your job, copy the job id that you see when checking the queue status and type:
scancel [job id]
Remember to load modules you need to run your job inside the slurm script after the header. For example:
module load biology
module load bwa
bwa index ref_genome.fa
Partitions
You can run slurm jobs on a number of different sets of nodes, called partitions, on Sherlock. The partition you choose to run your job on depends on how much time and memory you want to allocate to complete your job.
There are two kinds of slurm jobs: interactive and batch jobs.
Interactive jobs
You can work interactively on a compute node by launching a session with the partition sdev, by running:
sdev
This allocates a core with 4GB memory for one hour to you. If you want your session to last longer than an hour, you can specify the amount of time you want to keep the session open for (ex. 2 hr):
sdev -t 2:00:00
Batch jobs
Below is a list of slurm partitions available to all Sherlock users for batch jobs:
queue name | default time limit per job | max runtime per job | max cores per user | default memory per core | max jobs per user | max jobs in queue per user | purpose |
---|---|---|---|---|---|---|---|
normal | 2 hr | 48 hrs | 512 | 4 GB | 256 | 3000 | normal production |
dev | 1 hr | 2 hrs | 2 | 4 GB | 2 | 4 | interactive and/or development |
bigmem | 2 hr | 16 hrs | 32 | 48 GB | 1 | 20 | large memory 48 GB/core, 32 cores/node |
gpu | 2 hr | 48 hrs | 32 | 16 GB | 16 | 300 | 16 GPU/node, 16 cores/node, 16 GB/core |
long | 2 hr | 7 days | 256 | 16 GB | 16 | 64 | queue for long-running jobs, user can use up to 16 jobs and/or 128 cores use "--qos=long" option. |
There are 2 nodes with 1.5TB of RAM. Use these if your job needs more than 64 GB of RAM by adding '#SBATCH --qos=bigmem --partition=bigmem' to your slurm script.
There are three additional partitions available to our lab group:
- hns - for all users in Humanities and Sciences
- owners - for all users whose lab group has purchased compute nodes
- schumer - our personal compute nodes (see "Lab specific nodes" section)
A note about the "owners" partition:
This partition is the set of all lab group compute nodes. So when you run a job on "owners", your job is allocated to a currently free compute node from a different lab group. Lab groups get priority over their own nodes, so if they were to start a job on their node while your job is running on it, your job will be killed. If you choose to use this partition, be aware that this is a possibility, and be sure to monitor your slurm output for "JOB KILLED" error messages. However, this seems to be a fairly rare issue unless your job is exceedingly large.
To run your job on a specific partition, add the following to your slurm script:
#SBATCH -p schumer
Typically, it's helpful to specify several partitions, ranked in order of preference. The scheduler will send your job to the highest ranked partition with the available specified time and memory resources. This way, you don't have to wait until sufficient resources are available on a single partition.
#SBATCH -p schumer,owners,hns,normal
Lab specific nodes
We have 96 dedicated cores for the lab on Sherlock! To use these instead of the general queue simply add this line to your slurm script:
#SBATCH -p schumer
You can also compare how long it will take a job to start running on our lab nodes versus the normal queue:
sbatch --test-only -p schumer myjob.sh
sbatch --test-only -p normal myjob.sh
Useful slurm commands
Cancel all of your jobs:
scancel -u $USER
Cancel all of you pending jobs:
scancel -t PENDING -u $USER
Cancel a job by job name:
scancel --name [job name]
for example:
scancel --name bwa-mem
Estimate how long it will take for a job to start running (does not actually submit the job):
sbatch --test-only myjob.sh
There is also a script in Lab_shared_scripts that will take a list of slurm job ids or slurm stdout files and cancel those jobs. Usage is:
perl /home/groups/schumer/shared_bin/Lab_shared_scripts/slurm_cancel_jobs_list.pl list_to_cancel
For example, if you want to cancel a batch of submitted jobs which all start with 3161, you could do the following to generate your list of jobs to cancel:
squeue -u $USER | grep 3161 | perl -pi -e 's/ +/\t/g' | cut -f 2 > list_to_cancel
or if you'd like to stop all jobs that are currently running in a folder:
ls slurm-*.out > list_to_cancel
To submit a job to run after another job is done you can add a job (or multiple job) dependencies. For example, to submit after job id 39584578 has finished:
sbatch --dependency=afterok:39584558 myjob.sh
To submit after job 39584578 and 39584579 have finished:
sbatch --dependency=afterok:39584558:39584579 myjob.sh