Parallelizing Multiple Small Jobs

Independent, parallel jobs can be run on an HPC cluster using srun commands inside sbatch jobs. SLURM can first allocate resources based on the batch job parameters and then use multiple srun steps to launch jobs within allocated resources.

This allows for flexibility in launching independent runs simultaneously, unaffected by job submission limits.

Introductory Example

The following example uses multiple srun commands within a single batch job to run independent tasks concurrently:

#!/bin/bash
#SBATCH --job-name=multi_parallel
#SBATCH --nodes=1
#SBATCH --ntasks=4
#SBATCH --time=00:15:00
#SBATCH --output=output_%j.txt

# Launch 4 independent jobs in parallel
srun --exclusive -N1 -n1 ./task1 &
srun --exclusive -N1 -n1 ./task2 &
srun --exclusive -N1 -n1 ./task3 &
srun --exclusive -N1 -n1 ./task4 &

wait

Here’s what’s happening:

  • The --exclusive flag ensures each srun gets its own set of resources within the job allocation.
  • The ampersand (&) sends each task to the background, allowing the next command to run.
  • wait ensures the script doesn’t exit until all tasks finish.

This pattern is beneficial for embarrassingly parallel workloads—those that can easily be split into independent tasks.

Parallelizing SAS Jobs

SAS can be run in batch mode on the Explorer cluster. You can use multiple srun commands to execute independent SAS scripts in parallel.

#!/bin/bash
#SBATCH --job-name=sas_parallel
#SBATCH --nodes=1
#SBATCH --ntasks=4
#SBATCH --time=00:20:00
#SBATCH --output=sas_output_%j.txt

module load SAS

# Launch 4 independent SAS programs in parallel
srun --exclusive -n1 sas my_analysis1.sas -log my_analysis1.log &
srun --exclusive -n1 sas my_analysis2.sas -log my_analysis2.log &
srun --exclusive -n1 sas my_analysis3.sas -log my_analysis3.log &
srun --exclusive -n1 sas my_analysis4.sas -log my_analysis4.log &

wait
echo "All SAS analyses complete."

Parallelizing Python Jobs

A similar setup can be done for Python jobs too.

#!/bin/bash
#SBATCH --job-name=python_parallel
#SBATCH --nodes=1
#SBATCH --ntasks=4
#SBATCH --time=00:20:00
#SBATCH --output=python_output_%j.txt

module load anaconda3
source activate /path/to/the/conda/environment
cd /my/python/scripts/

# Launch 4 independent Python programs in parallel
srun --exclusive -n1 python my_script1.py > my_script1.out 2>&1 &
srun --exclusive -n1 python my_script2.py > my_script2.out 2>&1 &
srun --exclusive -n1 python my_script3.py > my_script3.out 2>&1 &
srun --exclusive -n1 python my_script4.py > my_script4.out 2>&1 &

wait
echo "All Python scripts complete."

How is Parallelizing Jobs Different Than Job Arrays?

Job arrays (using --array flag) is a standard way of submitting multiple jobs to the cluster. However, both of these methods can be used in different situations. The following table provides guidance on which method to use for different use cases.

ConceptJob Array (sbatch --array)Multiple sruns inside one sbatch
What it createsMany independent jobsOne job with many job steps
Scheduler viewEach array element is a separate jobAll sruns share one allocation
Resource allocationEach array task gets its own resources (and can run independently, even on different nodes at different times)All steps share the same allocation granted to the parent job
Job limitsCounts toward limits like “max 50 running jobs”Only counts as one job (the batch job)
Failure handlingArray elements can fail or be resubmitted individuallyIf the parent job ends or fails, all steps end too

Conclusion

Multiple small jobs can be parallelized using srun job steps within an sbatch session. The srun sessions then use the allocated resources to run all the tasks that need execution. This is a good way to get multiple small jobs executed after getting resources allocated first.

Need More Help?

The RC team offers weekly virtual drop-in office hours1:1 consultations at your convenience, and recorded introductory HPC training. You can also send the RC team an email at rchelp@northeastern.edu. We are happy to work through any questions you may have!

Arsalan Akhter

Research Computing Specialist, Research Computing

Arsalan supports researchers with their software-related queries on the cluster, and also interfaces with the HPC on a system level. In addition, he contributes to documenting technical and policy details of the HPC cluster for the Northeastern community. Arsalan brings an expertise of more than 15 years in Robotics to Research Computing at Northeastern, and is passionate about connecting with and supporting researchers in Robotics, ML, Operations Research, and Systems Engineering.

Leave a Reply

Your email address will not be published. Required fields are marked *