[Pictured: NVIDIA founder and CEO Jensen Huang]
NVIDIA has announced the acquisition of SchedMD, the company behind Slurm, the world’s most widely used workload manager for high-performance computing (HPC) and AI. While the financial terms were not disclosed, the move marks another step in NVIDIA’s broader strategy to extend its control beyond acceleration hardware and into the critical software layers that govern how the most valuable compute resources in AI are actually used.
SchedMD is a U.S.-based company founded in 2010 by the original developers of Slurm, though the technology itself dates back even further. Slurm was first developed in the early 2000s at Lawrence Livermore National Laboratory, as an open alternative to proprietary schedulers for large-scale computing clusters. Since then, it has become the de facto standard: today, Slurm runs on roughly half of the world’s top supercomputers listed in the TOP500, and is used by universities, research institutes, defense organizations, pharmaceutical companies, financial institutions—and increasingly by enterprises operating in-house AI infrastructure.
At its core, Slurm is the engine that decides who gets compute resources, when, and how. It manages queues, allocates CPUs, memory, and GPUs, and ensures workloads are executed fairly and efficiently across clusters that may span thousands of servers. In the AI era—where model training consumes massive amounts of GPU capacity—Slurm has become a mission-critical component of the workflow. Without intelligent scheduling, a significant portion of these extremely expensive resources would simply go to waste.
Slurm’s primary users are not application developers, but infrastructure teams—the operators of data centers and compute clusters. AI developers typically encounter Slurm only indirectly, when submitting jobs, without visibility into the allocation logic running behind the scenes. In public cloud environments, similar scheduling mechanisms usually exist as internal systems, largely opaque to customers.
It is also important to distinguish Slurm from platforms such as Run:AI, which NVIDIA acquired earlier. While Slurm operates as the foundational scheduler of a cluster—a low-level infrastructure layer that is aware of physical resources—Run:AI sits above Kubernetes as an intelligent optimization layer, with awareness of teams, projects, experiments, and business priorities. Put simply: Slurm allocates the “iron,” while Run:AI allocates it in an organizational and business context. Together, they form a continuous stack—from hardware all the way up to enterprise-level AI workload management.
This is where the strategic significance of the acquisition becomes clear. Although Slurm is open source, control over the organization that leads its development gives NVIDIA substantial influence over the project’s direction, development velocity, and hardware optimization priorities. Slurm is already well tuned for NVIDIA GPUs, but the acquisition paves the way for even tighter integration with CUDA, NVLink, InfiniBand, and capabilities such as MIG (Multi-Instance GPU), which allows a single GPU to be partitioned for parallel workloads. The result is higher GPU utilization—ultimately translating into greater demand for NVIDIA hardware.
More broadly, NVIDIA continues to assemble end-to-end vertical control over the AI infrastructure stack: processors, networking, software libraries, workload scheduling, and enterprise management. While the SchedMD acquisition may appear modest compared to some of NVIDIA’s blockbuster deals, it targets one of the most critical choke points in the AI world: who controls compute time. In a domain where every minute of GPU usage carries significant economic value, that level of control is nothing short of strategic.
