Thoughts on Tenant Isolation and Load Shedding

Tony Allen
6 min readApr 5, 2021

I spend a fair bit of time these days thinking about overloaded servers and one recent problem I’ve wrestled with is how to shed load fairly when a server is overloaded. For example, if there are two separate tenants (users, accounts, clients, etc.) consuming vastly different amounts of resources, should we be treating them the same when shedding load? If one of those tenants is “misbehaving” and flooding the system with traffic, this can begin to affect everyone else sharing the system with this “noisy-neighbor”.

The different approaches to fairness I see folks taking are to either give each tenant some static limit, shedding the excess traffic, or just ignoring the problem altogether. I think we can do better. In this article, I want to take a look at this tenant fairness problem in a bit more detail, discuss some neat queuing techniques we can use to solve it, and their tradeoffs. This one’s for the nerds.

The Noisy Neighbor Problem

Let’s say we have a simple one-node service that FIFO enqueues requests to be scheduled on some worker threads. We also have a basic, but super effective load shedding mechanism in the form of queue timeouts — if a request sits in the server queue for too long, we shed it. Each request has an associated tenant ID like a user account, originating microservice, source client, someone’s name — whatever.

Our example service architecture.

Now, let’s assume that this is an ideal world where each request is identical, uses the same amount of server resources, and has the same latency profile. If each tenant’s request rate is identical, the worker threads, server queue, and traffic shed has the same number of requests from each tenant. Things are peachy!

But what happens if we start drifting closer to reality and one of those tenants starts to send way more traffic than the others? Well… we’d have a noisy-neighbor problem.

A noisy-neighbor fills our server queue.

I hacked something together using Bufferbloater to illustrate what this would look like. There are two tenants sending the same small amount of traffic and one tenant continuously increasing the amount of traffic it sends. With a queue timeout in place, we get the following result:

From top to bottom for each tenant: request latencies, rate of requests sent, rate of requests throttled, and request success %.

Each color (blue, green, and orange) in the graph represents a different tenant. At the top, we have the request latencies — each point is an individual request. This is followed by the requests per second (RPS) for each tenant in the 2nd graph from. The noisy-neighbor tenant (in blue) continuously increases the rate of requests it sends to the server, which results in elevated latencies for everyone. As the noisy-neighbor ramps up its traffic, all of the latencies start to increase due to the queuing delays, but eventually plateau since we’ve put an upper-bound on the delays thanks to the queue timeouts!

The bottom two graphs are where we can clearly see the problem caused by the noisy-neighbor. The graphs show the number of requests shed per-second followed the client-observed request success rate. Even though the well-behaved tenants have not increased their RPS, their success rates are affected even though they are not the ones responsible here. The server is spending most of its time servicing the noisy-neighbor’s requests. We can hardly call that fair!

Don’t Use Fixed Resource Limits

One approach I see quite often when tackling these noisy-neighbor problems is to give each tenant some static resource quota and strictly enforce it. It can take the form of something like setting a per-tenant max request rate or a per-tenant request concurrency limit. This is at best an operational nightmare, requiring constant tweaking to get right, and at worst makes things worse than they were before.

It’s extremely difficult to know the capacity of a server. The maximum concurrency limit or RPS that can be serviced before the server degrades is a moving target. The performance characteristics of the server can change with new commits, the types of requests being sent, CPU thermal throttling, cloud vendor nonsense, heterogeneous environments, etc.

Tenant Isolation in the Queue

Rather than using a FIFO queuing discipline in our server queue, let’s use a fair or weighted fair queue (WFQ) instead.

A closer look at a fair queue.

For each tenant, or “input class”, we can give them their own separate logical queue. These tenant queues can then be scheduled in a fair manner, which provides isolation among the tenants. When it comes to how these fair queues are implemented, the devil is in the scheduling details. A basic approach is to simply round-robin among the tenant queues, which is sometimes referred to as a Nagle fair queue. That’s what we’ll do here for simplicity, but just know that scheduling techniques can get quite interesting and are their own area of research.

When using a fair queue in the same noisy-neighbor test from before, we get wildly different results:

A noisy-neighbor test with isolated server queues.

The first thing to notice is the top-most graph that shows the latency for each request (recall that they are colored by tenant). When we did not have a fair queue in place, all of the tenants’ request latencies suffered, but now you can see that only the noisy-neighbor sees the elevated latencies! The bottom two graphs showing the throttled request rate and overall success rate also show us that only the noisy neighbor has its traffic shed, completely eliminating the noisy-neighbor problem.

One neat property of the WFQ is that while it guarantees tenants get their “fair share” of the resources, it can do so dynamically and only as needed. If there’s a single tenant using more than its share, it may continue to do so as long as the server has resources to accommodate. As the other tenants use more resources, the queuing will adapt.

The WFQ adapts to the request rates of the different tenants.

Here, the blue tenant’s requests consumed a disproportionate amount of server resources and did so consistently throughout the test. The server could accommodate this for the first half of the test and did not cause any kind of load shedding. Once another tenant increased its request rate halfway through the test, it caused resource contention and the queue timeouts began shedding the blue tenant’s load. Compare this with the same test, but with a simple FIFO queue on the server. This caused all three tenants to begin to shed load and experience queuing delays.

In the absence of a WFQ, all tenants are throttled.

Final Thoughts

Hopefully this was an interesting introduction to the idea of tenant-isolated load shedding. There are shortcomings to using a WFQ I didn’t mention. For instance, memory utilization which scales linearly with the number of tenants is a real concern for some workloads. If the number of tenants in the system is unbounded, there is a risk of memory exhaustion. Also, the devil is in the implementation details of the request scheduler, which I didn’t say much about. Many of the shortcomings are mitigated by WFQ variants such as the stochastic fair queue (SFQ), which warrants its own article.

Thanks for reading! Let me know if this was interesting and if there’s interest in a deeper dive into anything I’ve covered.

Special thanks to Dale Jackson for his feedback on earlier drafts of the post.

--

--