AWS Lambda concurrency and why you don’t need a warmer
One of the most common questions I’m asked as an AWS Solutions Architect is how do I avoid AWS Lambda cold starts, and should I use some mechanism for warming Lambda functions. This post is aimed at developers / DevOps engineers new to Lambda. Hopefully, by the end of it, you’ll understand the Lambda concurrency model and why warming functions are in virtually all cases not required, or of little real benefit.
The Lambda concurrency model is probably very different to what you’re used to if you’ve come from containers, VMs or bare metal. The latter handle multiple threads of execution at any time. Lambda functions operate in a different way. Only one invocation is handled by a single instance of a Lambda function at any one time. A Lambda instance sometimes referred to as an execution environment or a sandbox is a dedicated microVM. These microVMs are incredibly fast to start up and give each owner strong security around where and how their code is being executed.
So each Lambda function can be made up of thousands of microVMs all executing the same code base, working on different requests. A fleet of instances all managed by the Lambda service on your behalf.
It’s the Lambda service’s responsibility to make sure that each invocation of your function is executed as fast as possible. If all the current instances are busy with work, then the Lambda service will instruct a new microVM to start up, copy your code to it and get it ready to handle the waiting request. This process of provisioning a new microVM is called a cold start. When a microVM is not handling a request it isn’t automatically shut down. The Lambda service expects more requests to come, so it keeps instances around for a period of time. If these instances aren’t being kept busy they are shut down. When a request comes to an instance which is idle but ready, this is called a warm start. The instance is up and has all your code ready to go.
Cold starts are slower than warm starts. Although microVMs are quick to start up, each step in the process adds a small amount of time and there will always be a difference between them. In some workloads latency is very important, user-facing websites for example. You want to do everything you can to minimize the effects of cold starts. That is a topic for a very long post of which I’m sure there are many in existence already. There are also a number of posts that recommend mechanisms to warm Lambda functions. This is done by injecting synthetic requests into the Lambda function which aren’t from users, in order for those synthetic requests to hit cold starts and thereafter users to hit warm starts.
Unfortunately, it’s not that simple.
The most common warming mechanisms I see are ones that use CloudWatch scheduled events to trigger a Lambda function every 15 minutes. This will start one Lambda instance, we call this having a concurrency of 1. After the synthetic execution has finished the instance will be available for reuse by any other requests. If a request comes in while the instance is executing another request, then the Lambda service will create a new instance, a cold start and the concurrency will be 2. The warming mechanism hasn’t prevented the second cold start. Simple synthetic traffic generators like this won’t prevent cold starts above the level of concurrency which they themselves generate. Now you absolutely could create a sophisticated synthetic traffic generation mechanism, but it’s going to be, in my opinion, more effort than it’s worth. Lambda handles scaling quickly and effectively so you don’t have to.
The exceptions to this are if you have a very low traffic volume and the value of each execution is high. In this case, it might be worth spending the time and effort into creating a warming mechanism. For the majority of workloads that have unpredictable traffic, warming mechanisms won’t have the effect you want them to have.