iCIS Intra Wiki
categories: Info - Support - Software - Hardware | AllPages - uncategorized
ClusterHelp
How to run a job on a linux cluster partition X with Slurm
You can run jobs on the cluster with the Slurm cluster software.
Concepts
A node is a single machine in the cluster.
A cluster exists of a set of nodes.
A partition is a defined subset of nodes of the whole cluster.
Next to being a subset a 'partition' can also limit a job's resources.
A job is typically run in a partition of the cluster.
A job step, is a (possibly parallel) task within a job.
Per partition, only people in certain unix groups are allowed to run jobs on this partition.
Example
E.g. for education you could run jobs in the "csedu" partition of the cluster which only contains the computing nodes cn47 and cn48. You have to be member of the "csedu" unix group to run jobs in this partition.
# following command show information about the "csedu" partition: $ scontrol show -a partitions csedu PartitionName=csedu AllowGroups=csedu AllowAccounts=ALL AllowQos=ALL AllocNodes=ALL Default=NO QoS=N/A DefaultTime=01:00:00 DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 Hidden=NO MaxNodes=UNLIMITED MaxTime=UNLIMITED MinNodes=0 LLN=NO MaxCPUsPerNode=UNLIMITED Nodes=cn[47-48] PriorityJobFactor=10 PriorityTier=10 RootOnly=NO ReqResv=NO OverSubscribe=FORCE:1 OverTimeLimit=NONE PreemptMode=GANG,SUSPEND State=UP TotalCPUs=96 TotalNodes=2 SelectTypeParameters=NONE JobDefaults=(null) DefMemPerCPU=2048 MaxMemPerNode=UNLIMITED
Let's see what happens when the current user is not member of the 'csedu' unix group but tries to run a job in the 'csedu' partition:
$ groups |grep csedu # gives empty result, because current user is not member of the csedu unix group
$ cat hello.csedu.sh #! /bin/bash #SBATCH --partition=csedu sleep 60 echo "Hello world!"
$ sbatch hello.csedu.sh sbatch: error: Batch job submission failed: User's group not permitted to use this partition
If you want to run a job in the 'csedu' partition ask the owner of that partition to grant you access. For the 'csedu' partition the owner can give you access by adding your user account into the 'csedu' unix group.
The partitions "cncz" and "cnczshort" are accessible for all users, although only for test purposes. So we can modify the script to use the "cnczshort" partition instead and the execution will be successfull. For more details how to run and query this job see the C&CZ wiki page about Slurm.
More info see
- for a good introduction to Slurm see the Slurm Quickstart documentation
- usefull commands and tips on the C&CZ wiki page about Slurm
- Slurm documentation