Right Sizing slurm jobs
Right-Sizing Slurm Jobs
</aside>
đź’ˇ If you have a useful monitoring tool, add it! </aside>
Before you run your job on multiple samples - make sure that you’ve optimized the resource request.
There are two main resources that you can request: CPU(s) and RAM
CPU (aka --cpus-per-task
)
The CPU is the central processing unit - the thinker. To make use of more than 1 CPU, your code needs to have been developed to work in parallel. Some R and Python packages can take advantage of parallel processing, but may require an accessory package.
Takeaway #1: If your tools were not designed to take advantage of parallel processing, adding CPUs to your job request will only slow down your resource request without speeding up the actual processing - it’s counter productive.
********Takeaway #2:******** If your tools were designed for parallel processing, and you specify more than 1 CPU in your arguments, you MUST specify the number of CPUs in your SBATCH command.
DIY - https://berkeley-scf.github.io/tutorial-parallelization/
RAM (aka --mem-per-cpu
)
RAM is volatile memory used for your working tasks. It’s way faster than disk access. If you don’t have enough RAM to hold your object, your job may suffer from a lot of shuffling of data to/from disk. Each node on Exacloud has a different quantity of RAM - use scontrol
to find out how much if you need a lot.
Gentle reminder about prefixes -
RAM is quantified in Bytes (each of which is 8 bits in the olden days…).
- 1000 bytes = 1KB (kilo)
- 1000000 bytes = 1000KB = 1MB (mega)
- 1000MB = 1GB (giga)
- 1000GB = 1TB (tera)
- 1000TB = 1PB (pita)
- It goes on - google it! (useful for astronomers, but even we have not filled this much space YET!)
Process
Consider developing your workflow on a toy data set so you don’t have to wait forever to learn you’ve made a mistake…
Once it’s stable, run a sample. Request more resources than you think you need.
Once the job is complete, run this command:
sacct --units=G --format=JobIdRaw,JobName%30,User,Group,State,Submit,Start,End,Cluster,Partition,AllocNodes,ReqCPUs,AllocCPUs,TotalCPU,CPUTime,UserCPU,AveCPU,SystemCPU,Elapsed,Timelimit,ReqMem,MaxRSS,MaxVMSize,State,MaxDiskWrite,MaxDiskRead,CPUTimeRaw,ElapsedRaw,TimelimitRaw,SubmitLine --parsable2 -a -A cedar,cedar2 --starttime=2023-08-25 --endtime=2023-09-10 --user={yourID} > sacct_2023.09.15.txt
Note that this command is only accurate for COMPLETED jobs.
Keep in mind that sacct “polls” usage, and so is not a perfect reflection of what you are using.
What to look for?
How long was my job queued? How long did it run for?
Job 12 below was submitted and waited 36 hours in the queue, then ran in about 4 hours.
Job 15 got resources after about 1 second (this reflects far fewer resources requested)
JobIDRaw | Submit | Start | End |
---|---|---|---|
12 | 2023-08-19T20:08:29 | 2023-08-21T08:08:36 | 2023-08-21T12:33:41 |
12.batch | 2023-08-21T08:08:36 | 2023-08-21T08:08:36 | 2023-08-21T12:33:41 |
15 | 2023-08-20T10:31:32 | 2023-08-20T10:31:33 | 2023-08-20T10:37:22 |
15.batch | 2023-08-20T10:31:33 | 2023-08-20T10:31:33 | 2023-08-20T10:37:22 |
How much RAM (memory) did I ask for? How much did I use? The more resources you request, the longer it will take to allocate them, the more you will impact the fair share.
Job 41 requested 320GB, but used 16G - efficiency of ~5%.
Job 521 requested 32G, used 11G, about 30% efficiency.
Regarding MaxRSS vs MaxVMSize - from https://supercloud.mit.edu/submitting-jobs
JobIDRaw | ReqMem | MaxRSS | MaxVMSize |
---|---|---|---|
41 | 320G | Â | Â |
41.batch | Â | 11.19G | 15.50G |
521 | 32G | Â | Â |
521.batch | Â | 10.37G | 10.67G |
How much CPU did I ask for? How much did I use?
It’s not possible to request <1 CPU…
There are a lot of numbers that can be reported. Some key ones
- Elapsed - this is the time that the job took
- CPUTime = the Elapsed time * the number of CPUs allocated
- TotalCPU = UserCPU + SystemCPU
- UserCPU - how long you were waiting for + using resources for your code
- SystemCPU - how much time the system kernel was working (eg, I/O)
Efficiency is TotalCPU/CPUTime -
- Job 328 efficiency ~ 98% - it’s big and long, but using resources well.
- Job 3281 efficiency 1:26:50/4:55:24 ~ 30%
- Job 4429 efficiency - 0:2:13/1:02:40 ~3%
JobIDRaw | ReqCPUS | AllocCPUS | TotalCPU | UserCPU | AveCPU | CPUTime | SystemCPU | Elapsed |
---|---|---|---|---|---|---|---|---|
328 | 1 | 44 | 65-08:02:27 | 57-01:51:38 | Â | 65-08:18:24 | 8-06:10:48 | 1-11:38:36 |
328.batch | 44 | 44 | 65-08:02:27 | 57-01:51:38 | 65-07:58:37 | 65-08:18:24 | 8-06:10:48 | 1-11:38:36 |
3281 | 4 | 4 | 1:26:50 | 1:09:30 | Â | 4:55:24 | 17:20.0 | 1:13:51 |
3281.batch | 4 | 4 | 1:26:50 | 1:09:30 | 1:26:12 | 4:55:24 | 17:20.0 | 1:13:51 |
4429 | 16 | 16 | 02:12.6 | 01:46.1 | Â | 1:02:40 | 00:26.5 | 0:03:55 |
4429.batch | 16 | 16 | 02:12.6 | 01:46.1 | 0:00:47 | 1:02:40 | 00:26.5 | 0:03:55 |
How much did I write to disk?
Estimate how much space you need to execute your workflow.
**********This would benefit from some estimated/actual usage info!**********
JobIDRaw | MaxDiskWrite | MaxDiskRead |
---|---|---|
3 | Â | Â |
3.batch | 593.49G | 850.14G |
325 | Â | Â |
325.batch | 71.39G | 72.03G |