The race to understand protein structures has never been more critical. From accelerating drug discovery to preparing for future pandemics, the ability to predict how proteins fold determines the capacity to solve humanity’s most pressing biological challenges. Since the release of AlphaFold2, AI inference for determining protein structures has skyrocketed. Unoptimized tools for protein structure inference can cost organizations millions due to lost research time and prolonged compute utilization.
The new NVIDIA RTX PRO 6000 Blackwell Server Edition GPU fundamentally changes this. Despite the AlphaFold2 breakthrough, CPU-bound multiple sequence alignment (MSA) generation and inefficient GPU inference remained rate-limiting steps. Building on previous collaborative efforts, new accelerations developed by NVIDIA Digital Biology Research labs enable faster-than-ever protein structure inference using OpenFold at no accuracy cost compared to Alphafold2.
In this post, we will show how to run large-scale protein analysis using RTX PRO 6000 Blackwell Server Edition GPUs, providing unprecedented protein structure inference performance to software platforms, cloud providers, and research institutions.
Why do speed and scale matter in protein structure prediction?
Protein folding sits at the intersection of the most computationally demanding workloads in computational biology. Modern drug discovery pipelines require analyzing thousands of protein structures. At the same time, enzyme engineering projects demand rapid iteration cycles to optimize biological functions, and agricultural biotech applications require screening massive protein libraries to develop climate-resilient crops.
The computational challenge can become immense: a single protein structure prediction can involve metagenomic-scale MSAs, iterative refinement steps, and ensemble calculations that typically require hours of compute time. When scaled across entire proteomes or drug target libraries, these workloads become prohibitively time-consuming on CPU-based infrastructures.
For example, in a direct comparison of multiple-sequence alignment tools, MMseqs2‑GPU completed alignments 177x faster on a single L40S than CPU-based JackHMMER on a 128-core CPU and up to 720x faster when distributed across eight NVIDIA L40S GPUs. These speedups highlight how the GPU revolution dramatically alleviates computational bottlenecks in protein bioinformatics.
How does NVIDIA enable the fastest protein structure AI available?
Building on recent releases like cuEquivariance and the Boltz-2 NIM microservice, the NVIDIA Digital Biology Research lab validated groundbreaking performance improvements for OpenFold using RTX PRO 6000 Blackwell Server Edition and NVIDIA TensorRT across industry-standard benchmarks (Figure 1).
Figure 1. Protein structure prediction with MMseqs2-GPU and OpenFold2Leveraging new instructions and TensorRT, MMseqs2-GPU, and OpenFold on RTX PRO 6000 Blackwell delivers transformational performance for protein structure prediction, executing folding over 138x faster than AlphaFold2 and approximately 2.8x faster than ColabFold, while maintaining identical TM-scores.
First, faster inference speed was enabled with MMseqs2-GPU on RTX PRO 6000 Blackwell, which runs approximately 190x faster than JackHMMER and HHBlits on a dual-socket AMD 7742 CPU. In addition, bespoke TensorRT optimizations targeting OpenFold increased its inference speed 2.3x compared to baseline OpenFold. Validated on 20 CASP14 protein targets, these benchmarks establish RTX PRO 6000 Blackwell as a breakthrough solution for end-to-end protein structure prediction.
Eliminate memory bottlenecks
In addition, the 96 GB of high-bandwidth memory (1.6 TB/s) enables RTX PRO 6000 Blackwell to fold entire protein ensembles and large MSAs, allowing the full workflow to remain GPU-resident. Its Multi-Instance GPU (MIG) functionality enables a single RTX PRO 6000 Blackwell to act like four GPUs, each powerful enough to outperform an NVIDIA L4 Tensor Core GPU. This allows multiple users or workflows to share a server without compromising speed or accuracy.
Here’s a complete example demonstrating how to leverage RTX Pro 6000’s performance for rapid protein structure prediction. The first step is deploying the OpenFold2 NIM on your local machine.
Once the NIM has been deployed locally, you can construct inference requests and use the local endpoint to generate protein structure predictions.
Get started accelerating protein AI workflows
Whereas AlphaFold2 once required heterogeneous high-performance computing nodes, NVIDIA accelerations for protein structure prediction—including modular components in cuEquivariance, TensorRT, and MMseqs2-GPU—on RTX PRO 6000 Blackwell, enable folding on a single server at world-class speed. This makes proteome-scale folding accessible to any lab or software platform, with the fastest time-to-prediction to date.
Whether you’re developing software platforms for drug discovery, building agricultural biotech solutions, or conducting pandemic preparedness research, the unprecedented performance of RTX PRO 6000 Blackwell will transform your computational biology workflows. The power of RTX PRO 6000 Blackwell Server Edition is available today in NVIDIA RTX PRO Servers from global system makers as well as in cloud instances from leading cloud service providers.
Ready to get started? Find a partner for NVIDIA RTX PRO 6000 Blackwell Server Edition and experience protein folding at unprecedented speed and scale.
Acknowledgments
We’d like to thank the researchers from NVIDIA, University of Oxford, and Seoul National University who contributed to this research, including Christian Dallago, Alejandro Chacon, Kieran Didi, Prashant Sohani, Fabian Berressem, Alexander Nesterovskiy, Robert Ohannessian, Mohamed Elbalkini, Jonathan Cogan, Ania Kukushkina, Anthony Costa, Arash Vahdat, Bertil Schmidt, Milot Mirdita, and Martin Steinegger.



