HPC Cluster Partition Improvements

Jan 27, 2025 | Announcements

HPC Cluster Partition Improvements

The Research Computing team recently made some exciting operational enhancements to improve the user experience and reduce queue wait time. A partition is a logical collections of nodes that comprise different hardware resources and limits to help meet the wide variety of jobs that get scheduled on the cluster. Occasionally, the Research Computing team might need to make updates to the partitions based on monitoring job submissions to help reduce job wait times. As our cluster grows, changes to the partitions also help to ensure the fair, efficient distribution of resources for all jobs being submitted to the cluster.

 

Increase the Maximum Allowable Wall Clock Time Limit for CPU Jobs

Now, researchers can submit CPU jobs to the ‘short’ partition that can run up to 48 hours (the previous limit was 24 hours).

 

Introduction of New GPU Queue (gpu-short)

A new partition (gpu-short) has been created to reduce the queue wait time for GPU jobs. The jobs submitted to this partition have a default job wall clock time of one hour and a maximum job time of two hours. The ‘gpu-short’ partition is configured to a higher priority than our existing ‘gpu’ partition. This new ‘gpu-short’ partition will enable shorter GPU jobs to start more quickly — reducing the queue wait time for users. The ‘gpu-short’ partition is particularly suitable for Open OnDemand and interactive sessions.

You can continue using existing ‘gpu’ and ‘multi-gpu’ partitions for relatively longer-running jobs and jobs that require more than one GPU. The ‘multi-gpu’ partition allows you to run GPUs up to 24 hours. More information about partitions is available on our Partitions page.