Data Management Plan
Data Management Plan for Funding Agencies
The Northeastern Research Computing (RC) team provides high-end research computing resources to all Northeastern University-affiliated faculty, researchers, and students. The team also manages Northeastern’s partnership with the Massachusetts Green High Performance Computing Center (MGHPCC). Resources available to the Northeastern community include a centralized high-performance computing (HPC) cluster, storage, software, high-level technical and scientific consultations, education, documentation, and training. All of these resources are accessible to all faculty and students, with RC staff available to assist researchers through consultations on how to leverage hardware and software for scientific applications and workflows.
As of August 2024, the Discovery cluster provides access to over 50,000 CPU cores and over 525 GPUs to all Northeastern faculty and students free of charge. Hardware currently available for research consists of a combination of Intel Xeon (Cascadelake, Skylake, Broadwell, Haswell, Sandybridge and Ivybridge) and AMD (Zen, Zen2) CPU microarchitectures. Additionally, a selection of NVIDIA Pascal (P100), Volta (V100), Turing (T4), Ampere (A100), and Hopper (H100) GPUs. Discovery is connected to the university network over 10 Gbps Ethernet (GbE) for high-speed data transfer, and Discovery provides 6 PB of available storage on a high-performance file system. Compute nodes are connected with either 10 GbE or high data rate InfiniBand (200 Gbps or 100 Gbps), supporting all types and scales of computational workloads.
A dedicated team of staff including PhD scientists manage the RC environment and support researchers in their use of the Discovery cluster resources. The RC team updates computational resources available through Discovery with the newest technologies on a yearly cycle to support the cutting-edge research being performed by Northeastern faculty and students.
Research groups who require access to dedicated computational resources can request to be part of a “buy-in” option, integrating their hardware into the Discovery cluster to provide unified access to both private and shared compute nodes for their research group members. Faculty-owned hardware that is part of the Discovery cluster is fully managed and maintained by the RC staff at no charge.
Secure Data Enclave (SDE)
In response to research funding requirements, Northeastern University has developed a high-performance Secure Data Enclave (SDE) for researchers to store and process secure data. The SDE is comprised of an Infinidat Infinibox data storage system, located in the Massachusetts Green High-Performance Computing Center (MGHPCC) in Holyoke, Massachusetts. There are dedicated CPUs and GPUs as part of the SDE that can be used for the computation of secure data in the enclave. Scientific and statistical software is also available pre-installed on the SDE compute nodes. All components of the SDE are behind a network firewall.
Researchers affiliated with Northeastern must request an access account on the SDE. Globus is used for secure transfer of data.
Storage Services
The SDE has an Infinidat system that is configured with FIPS-140-2 validated encryption, at the MGHPCC in Holyoke, MA. the Infindat system includes snapshots but does not have off-site replication.
Data is staged into and out of the cluster using Globus. Globus is a data management system with a higher assurance level to meet secure data compliance requirements.
Compute Services
The SDE also offers compute services via Open OnDemand (OOD). OOD is a web-based interface to an HPC cluster that presents interfaces to traditional batch processing, conventional “desktop-style” research, and client-server based programs. The HPC cluster is managed by SLURM, making the SDE a conventional research cluster with enhanced security.
Data Management Plan (DMP)
Research groups are provided with up to 35 TB of complimentary high-performance data storage, with options to purchase additional storage. This storage is owned and maintained by the university and is physically located in the Massachusetts Green High Performance Computing Center, a secure data center. All storage systems are on uninterruptible power supply (UPS)-backed power supplies, so that in the event of a power outage or interruption, there can be a graceful shutdown, minimizing the possibility of data loss. Snapshots for files, depending on their usage category are created, one to several times a day. The Research Computing team maintains a Globus endpoint on Discovery for secure, reliable data access, enabling researchers to access and share their data.
Northeastern University provides free website hosting for students, staff, and faculty through Sites at Northeastern.
How Can Research Computing Support You?
Accelerate your research at any stage by leveraging our online user guides, hands-on training sessions, and one-on-one guidance.