New choices in Azure AI Foundry give companies an enterprise-grade platform to construct, deploy, and scale AI purposes and brokers.
Microsoft and NVIDIA are deepening our partnership to energy the subsequent wave of AI industrial innovation. For years, our corporations have helped gasoline the AI revolution, bringing the world’s most superior supercomputing to the cloud, enabling breakthrough frontier fashions, and making AI extra accessible to organizations in every single place. As we speak, we’re constructing on that basis with new developments that ship higher efficiency, functionality, and suppleness.
With added assist for NVIDIA RTX PRO 6000 Blackwell Server Version on Azure Native, clients can deploy AI and visible computing workloads distributed and edge environments with the seamless orchestration and administration you utilize within the cloud. New NVIDIA Nemotron and NVIDIA Cosmos fashions in Azure AI Foundry give companies an enterprise-grade platform to construct, deploy, and scale AI purposes and brokers. With NVIDIA Run:ai on Azure, enterprises can get extra from each GPU to streamline operations and speed up AI. Lastly, Microsoft is redefining AI infrastructure with the world’s first deployment of NVIDIA GB300 NVL72.
As we speak’s bulletins mark the subsequent chapter in our full-stack AI collaboration with NVIDIA, empowering clients to construct the longer term quicker.
Increasing GPU assist to Azure Native
Microsoft and NVIDIA proceed to drive developments in synthetic intelligence, providing modern options that span the private and non-private cloud, the sting, and sovereign environments.
As highlighted within the March weblog publish for NVIDIA GTC, Microsoft will supply NVIDIA RTX PRO 6000 Blackwell Server Version GPUs on Azure. Now, with expanded availability of NVIDIA RTX PRO 6000 Blackwell Server Version GPUs on Azure Native, organizations can optimize their AI workloads, no matter location, to offer clients with higher flexibility and extra choices than ever. Azure Native leverages Azure Arc to empower organizations to run superior AI workloads on-premises whereas retaining the administration simplicity of the cloud or working in totally disconnected environments.
NVIDIA RTX PRO 6000 Blackwell GPUs present the efficiency and suppleness wanted to speed up a broad vary of use instances, from agentic AI, bodily AI, and scientific computing to rendering, 3D graphics, digital twins, simulation, and visible computing. This expanded GPU assist unlocks a variety of edge use instances that fulfill the stringent necessities of essential infrastructure for our healthcare, retail, manufacturing, authorities, protection, and intelligence clients. This may occasionally embody real-time video analytics for public security, predictive upkeep in industrial settings, speedy medical diagnostics, and safe, low-latency inferencing for important companies similar to power manufacturing and significant infrastructure. The NVIDIA RTX PRO 6000 Blackwell permits improved digital desktop assist by leveraging NVIDIA vGPU know-how and Multi-Occasion GPU (MIG) capabilities. This can’t solely accommodate a better consumer density, but additionally energy AI-enhanced graphics and visible compute capabilities, providing an environment friendly answer for demanding digital environments.
Earlier this yr, Microsoft introduced a large number of AI capabilities on the edge, all enriched with NVIDIA accelerated computing:
- Edge Retrieval Augmented Technology (RAG): Empower sovereign AI deployments with quick, safe, and scalable inferencing on native information—supporting mission-critical use instances throughout authorities, healthcare, and industrial automation.
- Azure AI Video Indexer enabled by Azure Arc: Permits real-time and recorded video analytics in disconnected environments—ideally suited for public security and significant infrastructure monitoring or post-event evaluation.
With Azure Native, clients can meet strict regulatory, information residency, and privateness necessities whereas harnessing the most recent AI improvements powered by NVIDIA.
Whether or not you want ultra-low latency for enterprise continuity, sturdy native inferencing, or compliance with business rules, we’re devoted to delivering cutting-edge AI efficiency wherever your information resides. Prospects now entry the breakthrough efficiency of the NVIDIA RTX PRO 6000 Blackwell GPUs in new Azure Native options—together with Dell AX-770, HPE ProLiant DL380 Gen12, and Lenovo ThinkAgile MX650a V4.
To seek out out extra about upcoming availability and join early ordering, go to:
Powering the way forward for AI with new fashions on Azure AI Foundry
At Microsoft, we’re dedicated to bringing essentially the most superior AI capabilities to our clients, wherever they want them. By our partnership with NVIDIA, Azure AI Foundry now brings world-class multimodal reasoning fashions on to enterprises, deployable anyplace as safe, scalable NVIDIA NIM™ microservices. The portfolio spans a variety of various use instances:
NVIDIA Nemotron Household: Excessive accuracy open fashions and datasets for agentic AI
- Llama Nemotron Nano VL 8B is obtainable now and is tailor-made for multimodal vision-language duties, doc intelligence and understanding, and cell and edge AI brokers.
- NVIDIA Nemotron Nano 9B is obtainable now and helps enterprise brokers, scientific reasoning, superior math, and coding for software program engineering and power calling.
- NVIDIA Llama 3.3 Nemotron Tremendous 49B 1.5 is coming quickly and is designed for enterprise brokers, scientific reasoning, superior math, and coding for software program engineering and power calling.
NVIDIA Cosmos Household: Open world basis fashions for bodily AI
- Cosmos Purpose-1 7B is obtainable now and helps robotics planning and resolution making, coaching information curation and annotation for autonomous autos, and video analytics AI brokers extracting insights and performing root-cause evaluation from video information.
- NVIDIA Cosmos Predict 2.5 is coming quickly and is a generalist mannequin for world state era and prediction.
- NVIDIA Cosmos Switch 2.5 is coming quickly and is designed for structural conditioning and bodily AI.
Microsoft TRELLIS by Microsoft Analysis: Excessive-quality 3D asset era
- Microsoft TRELLIS by Microsoft Analysis is obtainable now and permits digital twins by producing correct 3D belongings from easy prompts, immersive retail experiences with photorealistic product fashions for AR and digital try-ons, and recreation and simulation growth by turning artistic concepts into production-ready 3D content material.
Collectively, these open fashions mirror the depth of the Azure and NVIDIA partnership: combining Microsoft’s adaptive cloud with NVIDIA’s management in accelerated computing to energy the subsequent era of agentic AI for each business. Study extra in regards to the fashions right here.
Maximizing GPU utilization for enterprise AI with NVIDIA Run:ai on Azure
As an AI workload and GPU orchestration platform, NVIDIA Run:ai helps organizations benefit from their compute investments, accelerating AI growth cycles and driving quicker time-to-market for brand new insights and capabilities. By bringing NVIDIA Run:ai to Azure, we’re giving enterprises the flexibility to dynamically allocate, share, and handle GPU sources throughout groups and workloads, serving to them get extra from each GPU.
NVIDIA Run:ai on Azure integrates seamlessly with core Azure companies, together with Azure NC and ND collection situations, Azure Kubernetes Service (AKS), and Azure Id Administration, and presents compatibility with Azure Machine Studying and Azure AI Foundry for unified, enterprise-ready AI orchestration. We’re bringing hybrid scale to life to assist clients rework static infrastructure into a versatile, shared useful resource for AI innovation.
With smarter orchestration and cloud-ready GPU pooling, groups can drive quicker innovation, cut back prices, and unleash the ability of AI throughout their organizations with confidence. NVIDIA Run:ai on Azure enhances AKS with GPU-aware scheduling, serving to groups allocate, share, and prioritize GPU sources extra effectively. Operations are streamlined with one-click job submission, automated queueing, and inbuilt governance. This ensures groups spend much less time managing infrastructure and extra time centered on constructing what’s subsequent.
This impression spans industries, supporting the infrastructure and orchestration behind transformative AI workloads at each stage of enterprise progress:
- Healthcare organizations can use NVIDIA Run:ai on Azure to advance medical imaging evaluation and drug discovery workloads throughout hybrid environments.
- Monetary companies organizations can orchestrate and scale GPU clusters for complicated threat simulations and fraud detection fashions.
- Producers can speed up pc imaginative and prescient coaching fashions for improved high quality management and predictive upkeep of their factories.
- Retail corporations can energy real-time suggestion programs for extra personalised experiences by environment friendly GPU allocation and scaling, in the end higher serving their clients.
Powered by Microsoft Azure and NVIDIA, Run:ai is purpose-built for scale, serving to enterprises transfer from remoted AI experimentation to production-grade innovation.
Reimagining AI at scale: First to deploy NVIDIA GB300 NVL72 supercomputing cluster
Microsoft is redefining AI infrastructure with the brand new NDv6 GB300 VM collection, delivering the primary at-scale manufacturing cluster of NVIDIA GB300 NVL72 programs, that includes over 4600 NVIDIA Blackwell Extremely GPUs linked through NVIDIA Quantum-X800 InfiniBand networking. Every NVIDIA GB300 NVL72 rack integrates 72 NVIDIA Blackwell Extremely GPUs and 36 NVIDIA Grace™ CPUs, delivering over 130 TB/s of NVLink bandwidth and as much as 136 kW of compute energy in a single cupboard. Designed for essentially the most demanding workloads—reasoning fashions, agentic programs, and multimodal AI—GB300 NVL72 combines ultra-dense compute, direct liquid cooling, and sensible rack-scale administration to ship breakthrough effectivity and efficiency inside a regular datacenter footprint.
Azure’s co-engineered infrastructure enhances GB300 NVL72 with applied sciences like Azure Enhance for accelerated I/O and built-in {hardware} safety modules (HSM) for enterprise-grade safety. Every rack arrives pre-integrated and self-managed, enabling speedy, repeatable deployment throughout Azure’s world fleet. As the primary cloud supplier to deploy NVIDIA GB300 NVL72 at scale, Microsoft is setting a brand new normal for AI supercomputing—empowering organizations to coach and deploy frontier fashions quicker, extra effectively, and extra securely than ever earlier than. Collectively, Azure and NVIDIA are powering the way forward for AI.
Study extra about Microsoft’s programs method in delivering GB300 NVL72 on Azure.
Unleashing the efficiency of ND GB200-v6 VMs with NVIDIA Dynamo
Our collaboration with NVIDIA focuses on optimizing each layer of the computing stack to assist clients maximize the worth of their current AI infrastructure investments.
To ship high-performance inference for compute-intensive reasoning fashions at scale, we’re bringing collectively an answer that mixes the open-source NVIDIA Dynamo framework, our ND GB200-v6 VMs with NVIDIA GB200 NVL72 and Azure Kubernetes Service(AKS). We’ve demonstrated the efficiency this mixed answer delivers at scale with the gpt-oss 120b mannequin processing 1.2 million tokens per second deployed in a production-ready, managed AKS cluster and have revealed a deployment information for builders to get began in the present day.
Dynamo is an open-source, distributed inference framework designed for multi-node environments and rack-scale accelerated compute architectures. By enabling disaggregated serving, LLM-aware routing and KV caching, Dynamo considerably boosts efficiency for reasoning fashions on Blackwell, unlocking as much as 15x extra throughput in comparison with the prior Hopper era, opening new income alternatives for AI service suppliers.
These efforts allow AKS manufacturing clients to take full benefit of NVIDIA Dynamo’s inference optimizations when deploying frontier reasoning fashions at scale. We’re devoted to bringing the most recent open-source software program improvements to our clients, serving to them totally notice the potential of the NVIDIA Blackwell platform on Azure.
Study extra about Dynamo on AKS.
Get extra AI sources

