GPUaaS on Cisco AI PODs with Rafay

October 13, 2025

36

Enterprises are making daring strikes into AI, and Cisco AI PODs present a strong, pre-validated basis for deploying AI infrastructure at scale. They carry collectively compute, storage, and networking in a modular design that simplifies procurement and deployment. Nonetheless, deploying {hardware} is barely the start. The following crucial step is making this highly effective infrastructure consumable as a service.

That is the place Rafay enhances Cisco AI PODs. Rafay’s GPU Platform as a Service (PaaS) provides the crucial consumption layer, turning the {hardware} right into a ruled, self-service GPU cloud. Collectively, Cisco and Rafay allow organizations to operationalize AI quicker by providing safe, multi-tenant entry, standardized workload SKUs, and policy-driven governance.

This submit explores how this joint resolution transforms uncooked GPU energy right into a production-ready AI platform, enabling developer self-service whereas sustaining enterprise-grade management.

From Infrastructure to Consumption: The Platform Problem

Organizations have accelerated investments in AI infrastructure, deploying platforms like Cisco AI PODs with the newest NVIDIA {hardware} to allow generative AI, Retrieval-Augmented Technology (RAG), and large-scale inference. As adoption grows, a brand new problem emerges: methods to allow a number of groups to securely and effectively eat this shared infrastructure.

Platform groups should stability entry throughout totally different teams, every with distinctive wants and safety necessities. With out a standardized consumption layer, this results in a number of issues:

Underutilized GPUs: Trade benchmarks report common GPU utilization charges usually fall under 30%. That is partly as a result of AI workloads are “bursty” and most environments lack the mechanisms to slice and share GPU assets effectively. When costly GPUs sit idle, it represents a big alternative price.
Handbook Provisioning: Platform groups usually depend on handbook configurations, ad-hoc scripts, and repair tickets to handle entry. These workflows decelerate supply, introduce inconsistencies, and make it tough to implement governance.
Siloed Sources: With out a unified platform, GPU infrastructure usually turns into siloed by workforce, limiting sharing and stopping a holistic view of utilization and prices. Builders and researchers should navigate advanced inner processes simply to run a job.

To unravel this, enterprises must function their GPU infrastructure as a service—one which helps shared assets, multitenant isolation, and automatic coverage enforcement.

The Joint Resolution: Cisco AI PODs + Rafay GPU PaaS

Cisco and Rafay have collaborated to ship a modular, absolutely validated GPU cloud structure. This resolution combines Cisco’s best-in-class AI POD infrastructure with Rafay’s GPU Platform as a Service, remodeling GPU {hardware} right into a safe, self-service, multitenant cloud.

Cisco AI PODs present the compute, cloth, storage, and pre-validated design. Based mostly on Cisco Validated Designs (CVDs), they combine next-generation Cisco UCS platforms (just like the C885A M8 Server) and the newest NVIDIA GPUs to energy your entire AI lifecycle.
Rafay GPU PaaS delivers the orchestration, coverage enforcement, and developer abstraction layer. It transforms the foundational {hardware} right into a production-grade GPU cloud that’s easy to eat.

This mixed structure permits organizations to quickly launch and function GPU clouds with full-stack orchestration, declarative SKU provisioning, and built-in price attribution.

Developer Self-Service Via a Curated Catalog

On the core of Rafay’s platform is the SKU Studio, a purpose-built catalog system that empowers platform groups to ship AI-ready infrastructure and purposes as reusable SKUs.

Every SKU is a modular abstraction that bundles:

Compute Configuration: GPU/MIG profiles, CPU, reminiscence, and storage.
Utility Stack: Pre-integrated instruments like vLLM, Triton, or Jupyter Notebooks.
Coverage Controls: Time-to-Stay (TTLs), RBAC, multitenancy, and quotas.
Billing Metadata: Utilization items and value attribution.

Builders can entry GPU environments immediately via a self-service portal (GUI, API, or CLI) while not having to file assist tickets. For instance, an information scientist can choose an “H100-Inference-vLLM” SKU, which robotically provisions a selected GPU slice, deploys a safe container, and applies a 48-hour TTL. This streamlines workflows and ensures safety greatest practices are utilized constantly.

Safe Multi-Tenancy and Governance

Sharing costly GPU assets requires strict isolation and governance. Rafay offers native, safe multi-tenancy that permits groups to securely share infrastructure with out interference.

Key safety controls are robotically enforced:

Hierarchical RBAC: Defines permissions and entry scope for tenants, initiatives, and workspaces.
Namespace Isolation: Ensures workloads are separated on the cluster and community degree.
Useful resource Quotas: Prevents any single workforce or job from monopolizing assets.
Centralized Audit Logs: Supplies an entire audit path of consumer actions for compliance.

These built-in protections permit platform groups to take care of full oversight and management whereas empowering builders with the liberty they should innovate.

Complete GPU Administration and Visibility

To maximise ROI, it’s essential know the way your GPUs are getting used. Rafay offers end-to-end visibility, metering, and value attribution tailor-made for multitenant environments.

Platform groups can use declarative blueprints to standardize GPU operator configurations and slicing methods (like MIG) throughout all clusters. Multi-tenant dashboards supply detailed insights into:

GPU stock and allocation
SKU utilization patterns
Occasion-level exercise and consumer attribution
Well being standing and uptime developments

A billing metrics API aggregates utilization knowledge, calculates billable compute, and generates auditable experiences, enabling chargebacks and monetary accountability.

Who Advantages from a Unified GPU Cloud?

This collectively validated resolution is designed for a various vary of shoppers who must operationalize GPU infrastructure with safety, pace, and scale.

Enterprise IT Groups: Achieve federated self-service, quota enforcement, and centralized visibility. This reduces infrastructure duplication and embeds governance into every day operations.
Sovereign & Public Sector Organizations: Meet compliance wants in air-gapped environments with safe multitenancy, coverage enforcement, and centralized audit logging.
Cloud & Managed Service Suppliers: Monetize GPU infrastructure with a white-labeled, multitenant platform that features automated tenant onboarding and built-in chargeback metering.
Present Cisco Prospects: Lengthen the ROI of present UCS deployments by including GPU orchestration as a seamless overlay with no re-architecture required.
Greenfield AI Builders: Begin recent with a pre-validated, absolutely built-in resolution that reduces the time from procurement to operational AI companies from months to weeks.

Operationalize Your AI Infrastructure Right now

Pairing Cisco’s validated AI infrastructure with Rafay’s GPU PaaS management aircraft permits organizations to rework GPU methods into absolutely ruled inner platforms. The result’s a consumption-driven structure the place builders acquire self-service entry, operators implement quotas and observe consumption, and the enterprise maximizes the worth of its AI investments.

This structure presents a transparent path ahead: ship GPU infrastructure as a service, allow safe and compliant multitenancy, and make consumption predictable and cost-aligned from day one.

To see this highly effective resolution in motion, be part of our upcoming webinar. Specialists from Cisco and Rafay will display methods to remodel your GPU infrastructure right into a production-ready AI service.

Stay Webinar: From AI PODs to GPU Cloud
October 21, 2025 at 8:00 a.m. PST / 3:00 p.m. GMT

We’d love to listen to what you assume. Ask a Query, Remark Beneath, and Keep Linked with #CiscoPartners on social!

Cisco Companions Fb | @CiscoPartners X/Twitter | Cisco Companions LinkedIn

Previous articleNEPCON ASIA 2025: Showcasing the Way forward for Sensible Electronics Manufacturing

Next articleNorthumbrian Water appoints set up associate for good metering programme

GPUaaS on Cisco AI PODs with Rafay

From Infrastructure to Consumption: The Platform Problem

The Joint Resolution: Cisco AI PODs + Rafay GPU PaaS

Developer Self-Service Via a Curated Catalog

Every SKU is a modular abstraction that bundles:

Safe Multi-Tenancy and Governance

Key safety controls are robotically enforced:

Complete GPU Administration and Visibility

Who Advantages from a Unified GPU Cloud?

Operationalize Your AI Infrastructure Right now

Shawn Hymel’s CLI Information Frees Arduino UNO Q Customers From the “Fairly Limiting” App Lab

Raspberry Pi Goals for Extra Versatile OS Configuration with a Transfer to Cloud-Init

Korea Innovation Basis selects 2 AI/IoT corporations for World Know-how Commercialisation Help Program

LEAVE A REPLY Cancel reply

Most Popular

decodable – What’s unsuitable with my enum decoding in Swift?

Introducing catalog federation for Apache Iceberg tables within the AWS Glue Knowledge Catalog

Shawn Hymel’s CLI Information Frees Arduino UNO Q Customers From the “Fairly Limiting” App Lab

Safety researchers warning app builders about dangers in utilizing Google Antigravity

Recent Comments

ABOUT US

POPULAR POSTS

decodable – What’s unsuitable with my enum decoding in Swift?

Introducing catalog federation for Apache Iceberg tables within the AWS Glue Knowledge Catalog

Shawn Hymel’s CLI Information Frees Arduino UNO Q Customers From the “Fairly Limiting” App Lab

POPULAR CATEGORY