Cluster Maintenance

Cluster Maintenance

Routine Cluster Maintenance

Routine cluster maintenance is performed on the first Tuesday of each month. The RC team will post all maintenance information to the Status Updates page and send a reminder email to inform users of the upcoming maintenance window, a description of the maintenance, and they will be affected.

 

MGHPCC Annual Shutdown

The Massachusetts Green High Performance Computing Center (MGHPCC) conducts an annual shutdown for maintenance work. During this shutdown, all RC-managed services are powered down and unavailable for approximately four days. RC will send frequent reminders leading up to the shutdown to ensure that users are able to plan accordingly.

 

Preparing for Cluster Maintenance

To ensure that your job scripts account for the scheduled shutdown period of the cluster, use the --time option when submitting your jobs. If maintenance is set to start in less than 24 hours from when you submit your job, be sure to ask for less than 24 hours or time with your srun command.

Note that if you usually run your jobs on a partition with short time limits (e.g., debug or express), you only need to ensure that those time limits (20 and 60 minutes, respectively) exist before the scheduled start of maintenance.

  • If you usually use the srun command:

 

srun –time=12:00:00 <srun args>

  • If you usually use the sbatch command to submit batch jobs:

sbatch –time=12:00:00 script.sbatch

The RC team can help you set up a default and maximum time configuration on your partition. This configuration can significantly alleviate the issues you may experience with job runtime. By defining default and maximum time limits, you can establish a predefined window for job execution without explicitly specifying the runtime for each job.

However, even with the default and maximum time configuration in place, there will always be a time equal to the default time limit where explicitly specifying the job’s runtime becomes helpful. This allows for better control and management of job scheduling within the available resources.

If you want to set up the default and maximum time configuration on your partition or have any concerns or questions regarding job runtime management, consider joining our Office Hours or scheduling a Consultation

How Can Research Computing Support You?

Accelerate your research at any stage by leveraging our online user guides, hands-on training sessions, and one-on-one guidance.

Documentation

Training

Consultations & Office Hours

Contact Us