Right-Sizing Slurm Jobs

</aside>

💡 If you have a useful monitoring tool, add it! </aside>

Before you run your job on multiple samples - make sure that you’ve optimized the resource request.

There are two main resources that you can request: CPU(s) and RAM

CPU (aka `--cpus-per-task`)

The CPU is the central processing unit - the thinker. To make use of more than 1 CPU, your code needs to have been developed to work in parallel. Some R and Python packages can take advantage of parallel processing, but may require an accessory package.

Takeaway #1: If your tools were not designed to take advantage of parallel processing, adding CPUs to your job request will only slow down your resource request without speeding up the actual processing - it’s counter productive.

********Takeaway #2:******** If your tools were designed for parallel processing, and you specify more than 1 CPU in your arguments, you MUST specify the number of CPUs in your SBATCH command.

DIY - https://berkeley-scf.github.io/tutorial-parallelization/

RAM (aka `--mem-per-cpu`)

RAM is volatile memory used for your working tasks. It’s way faster than disk access. If you don’t have enough RAM to hold your object, your job may suffer from a lot of shuffling of data to/from disk. Each node on Exacloud has a different quantity of RAM - use scontrol to find out how much if you need a lot.

Gentle reminder about prefixes -

RAM is quantified in Bytes (each of which is 8 bits in the olden days…).

1000 bytes = 1KB (kilo)
1000000 bytes = 1000KB = 1MB (mega)
1000MB = 1GB (giga)
1000GB = 1TB (tera)
1000TB = 1PB (pita)
It goes on - google it! (useful for astronomers, but even we have not filled this much space YET!)

Process

Consider developing your workflow on a toy data set so you don’t have to wait forever to learn you’ve made a mistake…

Once it’s stable, run a sample. Request more resources than you think you need.

Once the job is complete, run this command:

sacct --units=G --format=JobIdRaw,JobName%30,User,Group,State,Submit,Start,End,Cluster,Partition,AllocNodes,ReqCPUs,AllocCPUs,TotalCPU,CPUTime,UserCPU,AveCPU,SystemCPU,Elapsed,Timelimit,ReqMem,MaxRSS,MaxVMSize,State,MaxDiskWrite,MaxDiskRead,CPUTimeRaw,ElapsedRaw,TimelimitRaw,SubmitLine --parsable2 -a -A cedar,cedar2 --starttime=2023-08-25 --endtime=2023-09-10 --user={yourID} > sacct_2023.09.15.txt

Note that this command is only accurate for COMPLETED jobs.

Keep in mind that sacct “polls” usage, and so is not a perfect reflection of what you are using.

What to look for?

How long was my job queued? How long did it run for?

Job 12 below was submitted and waited 36 hours in the queue, then ran in about 4 hours.

Job 15 got resources after about 1 second (this reflects far fewer resources requested)

JobIDRaw	Submit	Start	End
12	2023-08-19T20:08:29	2023-08-21T08:08:36	2023-08-21T12:33:41
12.batch	2023-08-21T08:08:36	2023-08-21T08:08:36	2023-08-21T12:33:41
15	2023-08-20T10:31:32	2023-08-20T10:31:33	2023-08-20T10:37:22
15.batch	2023-08-20T10:31:33	2023-08-20T10:31:33	2023-08-20T10:37:22

Job 41 requested 320GB, but used 16G - efficiency of ~5%.

Job 521 requested 32G, used 11G, about 30% efficiency.

Regarding MaxRSS vs MaxVMSize - from https://supercloud.mit.edu/submitting-jobs

JobIDRaw	ReqMem	MaxRSS	MaxVMSize
41	320G
41.batch		11.19G	15.50G
521	32G
521.batch		10.37G	10.67G

How much CPU did I ask for? How much did I use?

It’s not possible to request <1 CPU…

There are a lot of numbers that can be reported. Some key ones

Elapsed - this is the time that the job took
CPUTime = the Elapsed time * the number of CPUs allocated
TotalCPU = UserCPU + SystemCPU
- UserCPU - how long you were waiting for + using resources for your code
- SystemCPU - how much time the system kernel was working (eg, I/O)

Efficiency is TotalCPU/CPUTime -

Job 328 efficiency ~ 98% - it’s big and long, but using resources well.
Job 3281 efficiency 1:26:50/4:55:24 ~ 30%
Job 4429 efficiency - 0:2:13/1:02:40 ~3%

JobIDRaw	ReqCPUS	AllocCPUS	TotalCPU	UserCPU	AveCPU	CPUTime	SystemCPU	Elapsed
328	1	44	65-08:02:27	57-01:51:38		65-08:18:24	8-06:10:48	1-11:38:36
328.batch	44	44	65-08:02:27	57-01:51:38	65-07:58:37	65-08:18:24	8-06:10:48	1-11:38:36
3281	4	4	1:26:50	1:09:30		4:55:24	17:20.0	1:13:51
3281.batch	4	4	1:26:50	1:09:30	1:26:12	4:55:24	17:20.0	1:13:51
4429	16	16	02:12.6	01:46.1		1:02:40	00:26.5	0:03:55
4429.batch	16	16	02:12.6	01:46.1	0:00:47	1:02:40	00:26.5	0:03:55

How much did I write to disk?

Estimate how much space you need to execute your workflow.

**********This would benefit from some estimated/actual usage info!**********

JobIDRaw	MaxDiskWrite	MaxDiskRead
3
3.batch	593.49G	850.14G
325
325.batch	71.39G	72.03G