Deep Origin and NVIDIA help researchers save time and money by bringing GPU-accelerated genomics tools to Nextflow

Ben Siranosian

January 9, 2024

The integration of the NVIDIA Parabricks GPU-accelerated genomics tools into Nextflow modules enables researchers to easily build pipelines that accelerate genomics workloads by up to 100X. Deep Origin ComputeBenches are cloud-based development environments for bioinformatics, where you can use Parabricks interactively, without the need for Docker. Try out Deep Origin ComputeBenches with $500 in credits for your first month, and join our NVIDIA Parabricks and GATK virtual training project at the end of January!

The cost to generate omics data has fallen faster than Moore's law; however, the cost to store, process, and make sense of all that data certainly hasn't. That's why Deep Origin joined forces with NVIDIA to bring the Parabricks GPU-accelerated genomics tools to the nf-core modules open-source repository.nf-core modules simplify creating GPU-accelerated workflowsNVIDIA Parabricks is a suite of GPU-accelerated versions of popular software tools for processing genomic data, including alignment, variant calling, and RNA-seq analysis. Parabricks provides equivalent results to industry-standard tools, such as GATK and DeepVariant, while providing up to 100X acceleration. Parabricks is distributed as a Docker image and is freely available, with an option to obtain enterprise-level support via NVIDIA AI Enterprise.

nf-core modules simplify creating GPU-accelerated workflows

To make Parabricks easier to use for researchers, Deep Origin and NVIDIA worked together with the open source community to create Nextflow modules for many of the individual Parabricks tools. nf-core modules guarantee a unified API for interacting with the tools, built-in testing and version reporting, and can be integrated into any Nextflow workflow you create. For example, to use the GPU-accelerated alignment tool fq2bam in your Nextflow workflow, first install the module into your pipeline repository by running this command (you need the nf-core package):

nf-core modules install parabricks/fq2bam

Then, include the module in your Nextflow workflow:

#!/usr/bin/env nextflownextflow.enable.dsl = 2
include { PARABRICKS_FQ2BAM } from './modules/nf-core/modules/parabricks/fq2bam/main'

#input files, fasta reference, and known_sites defined earlier in the workflow
BWA_INDEX ( fasta )
PARABRICKS_FQ2BAM ( input, fasta, BWA_INDEX.out.index, known_sites=known_sites )

So far, the following Parabricks tools have been integrated into nf-core modules, with more on the way:

fq2bam (accelerated bwa mem equivalent alignment)
deepvariant (accelerated version of Google’s DeepVariant)
haplotypecaller (accelerated version of GATK haplotypecaller)
Mutectcaller (accelerated version of GATK mutect2)
applybqsr (accelerated alignment pre-processing tool)
indexgvcf (accelerated variant post-processing tool)
genotypegvcf (accelerated variant post-processing tool)

Use Parabricks interactively on a Deep Origin ComputeBench

nf-core modules are a key ingredient in building GPU-accelerated workflows. Researchers also need an environment for developing, testing, debugging and running these workflows. That's where the Deep Origin ComputeBench comes in.

ComputeBenches are cloud-based development environments for bioinformaticians and computational biologists. They give scientists the power of scalable, GPU-enabled computing without the hassle of infrastructure configuration or package management. With the “GPU-accelerated genomics” software blueprint, you can use Parabricks interactively on a ComputeBench, without messing with Docker containers, volume bind arguments, or GPU drivers. Everything just works! Request access to get started with $500 in credits for your first month.

Join our NVIDIA Parabricks and GATK virtual training project

Additionally, Deep Origin and MILRD have worked together to create a guided, half-day Virtual Training Project (VTP) on variant calling with NVIDIA Parabricks and GATK. The VTP will include tutorials on processing diverse datasets, downstream analysis, and visualization, and will include data from human tumor-normal whole exome sequencing, microbial whole genome sequencing, and a pharmacogenomics diagnostic panel.

Tuesday, February 27, 10-1:30pm PT.

Space is limited – signup now to reserve a slot.

What’s next for Parabricks in nf-core

The individual Parabricks modules are just the start of this collaboration. Modules help researchers build GPU-accelerated Nextflow pipelines, but they fall short of our goal to seamlessly accelerate genomics workflows everywhere. Next, we’re working to integrate Parabricks into nf-core/Sarek, the well-known and trusted somatic and germline variant calling workflow.

Once this is complete, changing a single configuration variable will enable the workflow to use GPU-accelerated tools for all the appropriate steps. Researchers will get up to 100X acceleration of the alignment and variant calling parts of the workflow, and save both time and money if they are working in the cloud.

References and acknowledgments

This effort was also supported by members of the nf-core community in development and code review, especially the following contributors:

Maxime Garcia (@maxulysse)
Eugenio Franzoso (@Furentsu)
Friederike Hanssen (@FriederikeHanssen)
Francesco Lescai (@lescai)
Matthias Hörtenhuber (@mashehu)

The nf-core framework for community-curated bioinformatics pipelines. Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen. Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.

Heading

Ben Siranosian

January 9, 2024