Managing Disk Use

Talk with your team regarding the estimated number of samples you will process
Review pilot data in its rawest form - how big is it?? Is this the format that you want to archive? Can it be compressed?
For example, truly raw sequencing data is in the bcl format; it’s huuuuuuge. Step 1 is conversion to a fastq.gz, which is still big, but smaller.

Do you really need to uncompress data?
Where do you store any interim files? Consider gscratch /home/exacloud/gscratch/CEDAR for faster I/O, but NOT FOR STORAGE!!! Bonus points - consider local/node scratch /mnt/scratch for the fastest I/O, but make sure you delete any files at the conclusion of your workflow.
Know how much disk space you’re using - sacct --units=G --format=JobIdRaw,Submit,Start,End,AllocNodes,ReqCPUs,AllocCPUs,TotalCPU,ReqMem,AveRSS,MaxRSS,AveVMSize,MaxVMSize,User,State,AveDiskWrite,MaxDiskWrite,NTasks --parsable2 -a -A cedar, cedar2 --starttime=2023-08-29