Schumer lab: Submitting slurm jobs

From OpenWetWare
Jump to navigationJump to search


Getting started

There are some great resources for using slurm, including a guide for the Sherlock cluster:

[[1]] and FAS research computing's guide [[2]].


The very basics

Example slurm script header for Sherlock:

#!/bin/bash

#SBATCH --job-name=short_queue_test_job

#SBATCH --time=00:01:00

#SBATCH --ntasks=1

#SBATCH --cpus-per-task=1

#SBATCH --mem=32000

#SBATCH --mail-user=youremail@stanford.edu


You will probably want to edit the following for each job:

name your job: --job-name

give it a time limit: --time=hours:minutes:seconds

memory and resources details, see Sherlock documents for more details: --cpus-per-task, --ntasks,--mem


To actually run this job, you need to generate a file with this header, followed by the job command you'd like to run. See the following example:

cat /home/groups/schumer/example_slurm_short.sh


To submit this example job, navigate to that directory and type:

sbatch example_slurm_short.sh


To check on the status of your job, type:

squeue -u $USER


To cancel your job, copy the job id that you see when checking the queue status and type:

scancel [job id]


Remember to load modules you need to run your job inside the slurm script after the header. For example:

module load biology

module load bwa

bwa index ref_genome.fa

Lab specific nodes

We have 96 dedicated cores for the lab on Sherlock! To use these instead of the general queue simply add this line to your slurm script:

#SBATCH -p schumer

You can also compare how long it will take a job to start running on our lab nodes versus the normal queue:

sbatch --test-only -p schumer myjob.sh

sbatch --test-only -p normal myjob.sh

Useful slurm commands

Cancel all of your jobs:

scancel -u $USER


Cancel all of you pending jobs:

scancel -t PENDING -u $USER


Cancel a job by job name:

scancel --name [job name]

for example:

scancel --name bwa-mem


Estimate how long it will take for a job to start running (does not actually submit the job):

sbatch --test-only myjob.sh


There is also a script in Lab_shared_scripts that will take a list of slurm job ids or slurm stdout files and cancel those jobs. Usage is:

perl /home/groups/schumer/shared_bin/Lab_shared_scripts/slurm_cancel_jobs_list.pl list_to_cancel


For example, if you want to cancel a batch of submitted jobs which all start with 3161, you could do the following to generate your list of jobs to cancel:

squeue -u $USER | grep 3161 | perl -pi -e 's/ +/\t/g' | cut -f 2 > list_to_cancel


or if you'd like to stop all jobs that are currently running in a folder:

ls slurm-*.out > list_to_cancel


To submit a job to run after another job is done you can add a job (or multiple job) dependencies. For example, to submit after job id 39584578 has finished:

sbatch --dependency=afterok:39584558 myjob.sh

To submit after job 39584578 and 39584579 have finished:

sbatch --dependency=afterok:39584558:39584579 myjob.sh