Clever Kubernetes Load Balancing at Databricks

October 1, 2025

36

Introduction

At Databricks, Kubernetes is on the coronary heart of our inner methods. Inside a single Kubernetes cluster, the default networking primitives like ClusterIP providers, CoreDNS, and kube-proxy are sometimes adequate. They provide a easy abstraction to route service site visitors. However when efficiency and reliability matter, these defaults start to indicate their limits.

On this publish, we’ll share how we constructed an clever, client-side load balancing system to enhance site visitors distribution, scale back tail latencies, and make service-to-service communication extra resilient.

If you’re a Databricks person, you don’t want to grasp this weblog to have the ability to use the platform to its fullest. However in case you’re eager about taking a peek beneath the hood, learn on to listen to about a few of the cool stuff we’ve been engaged on!

Downside assertion

Excessive-performance service-to-service communication in Kubernetes has a number of challenges, particularly when utilizing persistent HTTP/2 connections, as we do at Databricks with gRPC.

How Kubernetes Routes Requests by Default

The shopper resolves the service title (e.g., my-service.default.svc.cluster.native) through CoreDNS, which returns the service’s ClusterIP (a digital IP).
The shopper sends the request to the ClusterIP, assuming it is the vacation spot.
On the node, iptables, IPVS, or eBPF guidelines (configured by kube-proxy) intercept the packet. The kernel rewrites the vacation spot IP to one of many backend Pod IPs primarily based on primary load balancing, reminiscent of round-robin, and forwards the packet.
The chosen pod handles the request, and the response is shipped again to the shopper.

Whereas this mannequin usually works, it rapidly breaks down in performance-sensitive environments, resulting in important limitations.

Limitations

At Databricks, we function lots of of stateless providers speaking over gRPC inside every Kubernetes cluster. These providers are sometimes high-throughput, latency-sensitive, and run at important scale.

The default load balancing mannequin falls brief on this setting for a number of causes:

Excessive tail latency: gRPC makes use of HTTP/2, which maintains long-lived TCP connections between purchasers and providers. Since Kubernetes load balancing occurs at Layer 4, the backend pod is chosen solely as soon as per connection. This results in site visitors skew, the place some pods obtain considerably extra load than others. Consequently, tail latencies enhance and efficiency turns into inconsistent beneath load.
Inefficient useful resource utilization: When site visitors will not be evenly unfold, it turns into arduous to foretell capability necessities. Some pods get CPU or reminiscence starved whereas others sit idle. This results in over-provisioning and waste.
Restricted load balancing methods: kube-proxy helps solely primary algorithms like round-robin or random choice. There is no help for methods like:

These limitations pushed us to rethink how we deal with service-to-service communication inside a Kubernetes cluster.

Our Method: Shopper-Facet Load Balancing with Actual-Time Service Discovery

To deal with the constraints of kube-proxy and default service routing in Kubernetes, we constructed a proxyless, totally client-driven load balancing system backed by a customized service discovery management airplane.

The basic requirement we had was to help load balancing on the software layer, and eradicating dependency on the DNS on a vital path. A Layer 4 load balancer, like kube-proxy, can’t make clever per-request choices for Layer 7 protocols (reminiscent of gRPC) that make the most of persistent connections. This architectural constraint creates bottlenecks, necessitating a extra clever strategy to site visitors administration.

The next desk summarizes the important thing variations and the benefits of a client-side strategy:

Desk 1: Default Kubernetes LB vs. Databricks’ Shopper-Facet LB

Function/Facet	Default Kubernetes Load Balancing (kube-proxy)	Databricks’ Shopper-Facet Load Balancing
Load Balancing Layer	Layer 4 (TCP/IP)	Layer 7 (Software/gRPC)
Determination Frequency	As soon as per TCP connection	Per-request
Service Discovery	CoreDNS + kube-proxy (digital IP)	xDS-based Management Airplane + Shopper Library
Supported Methods	Primary (Spherical-robin, Random)	Superior (P2C, Zone-affinity, Pluggable)
Tail Latency Affect	Excessive (as a consequence of site visitors skew on persistent connections)	Decreased (even distribution, dynamic routing)
Useful resource Utilization	Inefficient (over-provisioning)	Environment friendly (balanced load)
Dependency on DNS/Proxy	Excessive	Minimal/Minimal, not on a vital path
Operational Management	Restricted	Fantastic-grained

This method allows clever, up-to-date request routing with minimal dependency on DNS or Layer 4 networking. It provides purchasers the flexibility to make knowledgeable choices primarily based on reside topology and well being knowledge.

The determine exhibits our customized Endpoint Discovery Service in motion. It reads service and endpoint knowledge from the Kubernetes API and interprets it into xDS responses. Each Armeria purchasers and API proxies stream requests to it and obtain reside endpoint metadata, which is then utilized by software servers for clever routing with fallback clusters as backup.”

Customized Management Airplane (Endpoint discovery service)

We run a light-weight management airplane that constantly screens the Kubernetes API for adjustments to Companies and EndpointSlices. It maintains an up-to-date view of all backend pods for each service, together with metadata like zone, readiness, and shard labels.

RPC Shopper Integration

A strategic benefit for Databricks was the widespread adoption of a standard framework for service communication throughout most of its inner providers, that are predominantly written in Scala. This shared basis allowed us to embed client-side service discovery and cargo balancing logic immediately into the framework, making it simple to undertake throughout groups with out requiring customized implementation effort.

Every service integrates with our customized shopper, which subscribes to updates from the management airplane for the providers it will depend on throughout the connection setup. The shopper maintains a dynamic record of wholesome endpoints, together with metadata like zone or shard, and updates robotically because the management airplane pushes adjustments.

As a result of the shopper bypasses each DNS decision and kube-proxy completely, it all the time has a reside, correct view of service topology. This enables us to implement constant and environment friendly load balancing methods throughout all inner providers.

Superior Load Balancing in Shoppers

The rpc shopper performs request-aware load balancing utilizing methods like:

Energy of Two Decisions (P2C): For almost all of providers, a easy Energy of Two Decisions (P2C) algorithm has confirmed remarkably efficient. This technique includes randomly deciding on two backend servers after which selecting the one with fewer lively connections or decrease load. Databricks’ expertise signifies that P2C strikes a robust stability between efficiency and implementation simplicity, constantly resulting in uniform site visitors distribution throughout endpoints.
Zone-affinity-based: The system additionally helps extra superior methods, reminiscent of zone-affinity-based routing. This functionality is significant for minimizing cross-zone community hops, which might considerably scale back community latency and related knowledge switch prices, particularly in geographically distributed Kubernetes clusters.
The system additionally accounts for eventualities the place a zone lacks adequate capability or turns into overloaded. In such instances, the routing algorithm intelligently spills site visitors over to different wholesome zones, balancing load whereas nonetheless preferring native affinity each time potential. This ensures excessive availability and constant efficiency, even beneath uneven capability distribution throughout zones.
Pluggable Assist: The structure’s flexibility permits for pluggable help for extra load balancing methods as wanted.

Extra superior methods, like zone-aware routing, required cautious tuning and deeper context about service topology, site visitors patterns, and failure modes; a subject to discover in a devoted follow-up publish.

To make sure the effectiveness of our strategy, we ran in depth simulations, experiments, and real-world metric evaluation. We validated that load remained evenly distributed and that key metrics like tail latency, error charge, and cross-zone site visitors value stayed inside goal thresholds. The flexibleness to adapt methods per-service has been helpful, however in follow, preserving it easy (and constant) has labored finest.

xDS Integration with Envoy

Our management airplane extends its utility past the inner service-to-service communication. It performs an important position in managing exterior site visitors by talking the xDS API to Envoy, the invention protocol that lets purchasers fetch up-to-date configuration (like clusters, endpoints, and routing guidelines) dynamically. Particularly, it implements Endpoint Discovery Service (EDS) to offer Envoy with constant and up-to-date metadata about backend endpoints by programming ClusterLoadAssignment assets. This ensures that gateway-level routing (e.g., for ingress or public-facing site visitors) aligns with the identical supply of reality utilized by inner purchasers.

Abstract

This structure provides us fine-grained management over routing habits whereas decoupling service discovery from the constraints of DNS and kube-proxy. The important thing takeaways are:

purchasers all the time have a reside, correct view of endpoints and their well being,
load balancing methods might be tailor-made per-service, enhancing effectivity and tail latency, and
each inner and exterior site visitors share the identical supply of reality, making certain consistency throughout the platform.

Affect

After deploying our client-side load balancing system, we noticed important enhancements throughout each efficiency and effectivity:

Uniform Request Distribution
Server-side QPS turned evenly distributed throughout all backend pods. In contrast to the prior setup, the place some pods had been overloaded whereas others remained underutilized, site visitors now spreads predictably. The highest chart exhibits the distribution earlier than EDS, whereas the underside chart exhibits the balanced distribution after EDS.
Secure Latency Profiles
The variation in latency throughout pods dropped noticeably. Latency metrics improved and stabilized throughout pods, lowering long-tail habits in gRPC workloads. The diagram under exhibits how P90 latency turned extra steady after client-side load balancing was enabled.
Useful resource Effectivity
With extra predictable latency and balanced load, we had been capable of scale back over-provisioned capability. Throughout a number of providers, this resulted in roughly a 20% discount in pod rely, liberating up compute assets with out compromising reliability.

Challenges and Classes Discovered

Whereas the rollout delivered clear advantages, we additionally uncovered a number of challenges and insights alongside the best way:

Server chilly begins: Earlier than client-side load balancing, most requests had been despatched over long-lived connections, so new pods had been not often hit till present connections had been recycled. After the shift, new pods started receiving site visitors instantly, which surfaced cold-start points the place they dealt with requests earlier than being totally warmed up. We addressed this by introducing slow-start ramp-up and biasing site visitors away from pods with larger noticed error charges. These classes additionally bolstered the necessity for a devoted warmup framework.
Metrics-based routing: We initially experimented with skewing site visitors primarily based on useful resource utilization alerts reminiscent of CPU. Though conceptually engaging, this strategy proved unreliable: monitoring methods had totally different SLOs than serving workloads, and metrics like CPU had been usually trailing indicators moderately than real-time alerts of capability. We in the end moved away from this mannequin and selected to depend on extra reliable alerts reminiscent of server well being.
Shopper-library integration: Constructing load balancing immediately into shopper libraries introduced sturdy efficiency advantages, but it surely additionally created some unavoidable gaps. Languages with out the library, or site visitors flows that also rely upon infrastructure load balancers, stay exterior the scope of client-side balancing.

Alternate options Thought of

Whereas growing our client-side load balancing strategy, we evaluated different different options. Right here’s why we in the end determined towards these:

Headless Companies

Kubernetes headless providers (clusterIP: None) present direct pod IPs through DNS, permitting purchasers and proxies (like Envoy) to carry out their very own load balancing. This strategy bypasses the limitation of connection-based distribution in kube-proxy and allows superior load balancing methods supplied by Envoy (reminiscent of spherical robin, constant hashing, and least-loaded spherical robin).

In principle, switching present ClusterIP providers to headless providers (or creating further headless providers utilizing the identical selector) would mitigate connection reuse points by offering purchasers direct endpoint visibility. Nevertheless, this strategy comes with sensible limitations:

Lack of Endpoint Weights: Headless providers alone do not help assigning weights to endpoints, limiting our capability to implement fine-grained load distribution management.
DNS Caching and Staleness: Shoppers regularly cache DNS responses, inflicting them to ship requests to stale or unhealthy endpoints.
No Assist for Metadata: DNS data don’t carry any further metadata in regards to the endpoints (e.g., zone, area, shard). This makes it tough or inconceivable to implement methods like zone-aware or topology-aware routing.

Though headless providers can supply a short lived enchancment over ClusterIP providers, the sensible challenges and limitations made them unsuitable as a long-term answer at Databricks’ scale.

Service Meshes (e.g., Istio)

Istio gives highly effective Layer 7 load balancing options utilizing Envoy sidecars injected into each pod. These proxies deal with routing, retries, circuit breaking, and extra – all managed centrally by way of a management airplane.

Whereas this mannequin affords many capabilities, we discovered it unsuitable for the environment at Databricks for just a few causes:

Operational complexity: Managing hundreds of sidecars and management airplane parts provides important overhead, significantly throughout upgrades and large-scale rollouts.
Efficiency overhead: Sidecars introduce further CPU, reminiscence, and latency prices per pod — which turns into substantial at our scale.
Restricted shopper flexibility: Since all routing logic is dealt with externally, it’s tough to implement request-aware methods that depend on application-layer context.

We additionally evaluated Istio’s Ambient Mesh. Since Databricks already had proprietary methods for features like certificates distribution, and our routing patterns had been comparatively static, the added complexity of adopting a full mesh outweighed the advantages. This was very true for a small infra workforce supporting a predominantly Scala codebase.

It’s price noting that one of many largest benefits of sidecar-based meshes is language-agnosticism: groups can standardize resiliency and routing throughout polyglot providers with out sustaining shopper libraries all over the place. At Databricks, nonetheless, the environment is closely Scala-based, and our monorepo plus quick CI/CD tradition make the proxyless, client-library strategy way more sensible. Moderately than introducing the operational burden of sidecars, we invested in constructing first-class load balancing immediately into our libraries and infrastructure parts.

Future instructions and Areas of exploration

Our present client-side load balancing strategy has considerably improved inner service-to-service communication. But, as Databricks continues to scale, we’re exploring a number of superior areas to additional improve our system:

Cross-Cluster and Cross-Area Load Balancing: As we handle hundreds of Kubernetes clusters throughout a number of areas, extending clever load balancing past particular person clusters is vital. We’re exploring applied sciences like flat L3 networking and service-mesh options, integrating seamlessly with multi-region Endpoint Discovery Service (EDS) clusters. This can allow sturdy cross-cluster site visitors administration, fault tolerance, and globally environment friendly useful resource utilization.

Superior Load Balancing Methods for AI Use Circumstances: We plan to introduce extra refined methods, reminiscent of weighted load balancing, to higher help superior AI workloads. These methods will allow finer-grained useful resource allocation and clever routing choices primarily based on particular software traits, in the end optimizing efficiency, useful resource consumption, and value effectivity.

Should you’re eager about engaged on large-scale distributed infrastructure challenges like this, we’re hiring. Come construct with us — discover open roles at Databricks!

Previous articleThe Obtain: Our thawing permafrost, and a drone-filled future

Next articleLG explores Saudi deal on AI knowledge heart cooling

Clever Kubernetes Load Balancing at Databricks

Introduction

Downside assertion

How Kubernetes Routes Requests by Default

Limitations

Our Method: Shopper-Facet Load Balancing with Actual-Time Service Discovery

Customized Management Airplane (Endpoint discovery service)

RPC Shopper Integration

Superior Load Balancing in Shoppers

xDS Integration with Envoy

Abstract

Affect

Challenges and Classes Discovered

Alternate options Thought of

Headless Companies

Service Meshes (e.g., Istio)

Future instructions and Areas of exploration

Run Apache Spark and Apache Iceberg write jobs 2x quicker with Amazon EMR

Getting Began with Langfuse [2026 Guide]

Apache Spark encryption efficiency enchancment with Amazon EMR 7.9

LEAVE A REPLY Cancel reply

Most Popular

Recreation Improvement on the PICO-8 with Johan Peitz

Run Apache Spark and Apache Iceberg write jobs 2x quicker with Amazon EMR

RigiTech Targets Logistics Corporations With Scalable Drone Supply

Hye-jin Park’s Hint Line Clock Exhibits Hours and Minutes with Simply One Hand

Recent Comments

ABOUT US

POPULAR POSTS

Recreation Improvement on the PICO-8 with Johan Peitz

Run Apache Spark and Apache Iceberg write jobs 2x quicker with Amazon EMR

RigiTech Targets Logistics Corporations With Scalable Drone Supply

POPULAR CATEGORY