HPC Cluster Partition Improvements

Jan 27, 2025 | Announcements

HPC Cluster Partition Improvements

The Research Computing team recently made some exciting operational enhancements to improve the user experience and reduce queue wait time. A partition is a logical collections of nodes that comprise different hardware resources and limits to help meet the wide variety of jobs that get scheduled on the cluster. Occasionally, the Research Computing team might need to make updates to the partitions based on monitoring job submissions to help reduce job wait times. As our cluster grows, changes to the partitions also help to ensure the fair, efficient distribution of resources for all jobs being submitted to the cluster.

 

Increase the Maximum Allowable Wall Clock Time Limit for CPU Jobs

Now, researchers can submit CPU jobs to the ‘short’ partition that can run up to 48 hours (the previous limit was 24 hours).

 

Introduction of New GPU Queue (gpu-short)

A new partition (gpu-short) has been created to reduce the queue wait time for GPU jobs. The jobs submitted to this partition have a default job wall clock time of one hour and a maximum job time of two hours. The ‘gpu-short’ partition is configured to a higher priority than our existing ‘gpu’ partition. This new ‘gpu-short’ partition will enable shorter GPU jobs to start more quickly — reducing the queue wait time for users. The ‘gpu-short’ partition is particularly suitable for Open OnDemand and interactive sessions.

You can continue using existing ‘gpu’ and ‘multi-gpu’ partitions for relatively longer-running jobs and jobs that require more than one GPU. The ‘multi-gpu’ partition allows you to run GPUs up to 24 hours. More information about partitions is available on our Partitions page.

Researcher Spotlight

RC Spring Researcher Spotlight Series: David Kaeli

David Kaeli joined Northeastern in 1993, after spending 12 years at IBM, with 7 years at the T.J. Watson Research Center. He is a COE Distinguished Professor for the Department of Electrical & Computer Engineering.

Announcements

All About Explorer and the New H200 GPUs

The Research Computing (RC) Team is excited to announce the new High Performance Computing (HPC) cluster, “Explorer,” which will supersede the current Discovery Cluster. The Discovery HPC cluster has served the Northeastern University community extremely well over the last several years, and the Explorer Cluster will chart the path of a new journey in research and learning for the Northeastern community — with a new, more efficient operating system (Rocky Linux 9.3) and state-of-the-art GPU resources (NVIDIA H200).