Bioinformatics in Research Computing

Feb 6, 2023 | Tech Life


 Northeastern University’s Research Computing (RC) team works diligently to connect the research community with access to centralized high performance computing (HPC) clusters, storage, visualization, software, high-level technical and scientific consultations, documentation, and training. While the technical capabilities of RC’s HPC Cluster are impressive, what makes RC at Northeastern invaluable is the people that keep the cluster running.

Associate bioinformatician Shobana Sekar, who has a doctorate in bioinformatics, came to Northeastern with more than seven years of experience analyzing next-generation sequencing (NGS) data, including whole genome, RNA, exome, and ChIP-seq datasets. She has led and supported bioinformatic analyses for several collaborative projects in the neurogenomics and cancer genomics research spaces and has co-authored several peer-reviewed publications. Her interest lies in NGS data analysis related to cancer and neurodegenerative diseases, such as Alzheimer’s.

 Genomic data generation has grown exponentially over the past decade and continues to expand; such large-scale data generation necessitates advanced computing infrastructure for data storage and processing, which is where an HPC cluster comes into play.

 Almost all bioinformatics tools for data pre-processing, analysis, and post-processing are Linux-friendly and can be installed on the cluster. In fact, several bioinformatics packages are already onboarded into Northeastern’s cluster, making it easy for the lab teams to use, including quality-control tools, aligners, and variant callers. The cluster’s storage capabilities enable researchers to persistently store raw sequencing data and analysis results, thus providing end-to-end solutions for storing and processing genomic sequencing data.

During her tenure as an RC bioinformatician, Sekar collaborated with and facilitated the research of genomics-focused faculty, students, and researchers, enabling them to run their bioinformatics workflows on the HPC cluster in an optimal manner. She leveraged the cluster’s linux environment to implement analysis pipelines, install bioinformatics tools, and provide data-transfer solutions. 

picture of Shobana Sekar “Research is not always about coming up with brilliant ideas to save the world,” Sekar said. “It takes a lot of reading, understanding, experimenting, and ideating, which can get taxing. However, going through the process and gathering new insights from study results is a very rewarding experience that motivates us and keeps us going.”

Sekar has been working with the Ionescu lab at Northeastern, collaborating with the lab head/principal investigator Andreia Ionescu, who received a Ph.D. in biochemistry and biophysics, and other lab members to perform cutting-edge research. The group has been mapping out skeletal stem cells and studying their effect on skeletal development, tissue regeneration, and repair after injury.

The study is currently exploring single-cell transcriptomic methods to understand gene signatures during skeletal growth and tissue regeneration. While studying the tissue and looking for different cell populations, the team found a particular cell type that had not been previously characterized, leading to the discovery of a population of long-term stem cells in the physeal cartilage, the growth plate in children’s bones that helps with bones’ longitudinal growth. The team is seeking to understand how these newly identified stem cells contribute to skeletal growth and the role they play in cartilage repair after injury. These findings could be relevant for helping children who fracture their growth plates and develop bony bars that stunt their growth.

 Another project Sekar has collaborated on is the genomic surveillance of COVID-19 on campus, led by Northeastern’s Life Sciences Testing Center (LSTC). By analyzing the viral genome, this study plays a crucial role in tracking the variants of the coronavirus circulating on campus. LSTC performs next-generation sequencing of COVID-positive samples, which are then analyzed using bioinformatics workflows to determine the viral variant behind the positive test. The analysis workflow is now implemented on the RC cluster and was developed using the Nextflow workflow manager and various publicly available bioinformatics tools.

 For more information about Research Computing, visit the RC website, read through the group’s documentation, or book a one-on-one consultation. The Research Computing team looks forward to helping you overcome any obstacles that may impact your workload.