The Other Feature of AWS Lambda Provisioned Concurrency — Saving Money
Provisioned concurrency was launched in December 2019 and its primary feature is to enable workloads which require highly consistent latency. It does this by provisioning a configurable amount of AWS Lambda execution environments ready for your function to use at any point. This is different to regular usage where your function will experience a cold start if its concurrency increases from its current value. This functionality and its performance benefits are presented in the AWS Blog Provisioned Concurrency for Lambda Functions.
In this blog post we’ll look at the lesser known benefit of provisioned concurrency, how it also saves you money. Modern day builders have an increasingly long list of technology they need to know about, and yes I’m also suggesting they should learn about the cost implications of their architectures. In most cases builders will be building systems for a business and the thing you build will have a value. Understanding the cost of producing and running that thing is important and will ultimately make you a better builder.
So lets dive in. Lambda is billed on two components, requests and duration. Requests are the number of individual invocations made to your function, however they are called. This could be in response to a API Gateway request, an SQS message, it doesn’t matter. Each invocation no matter the source is billed the same. Duration is how many GB/s are used. Using 1024 MB of memory for 1 second is billed at the same price as using 512 MB of memory for 2 seconds.
In my local region EU-WEST-1 (Ireland) these are the current Lambda prices as of the time of writing.
Requests $0.20 per 1M requests
Duration $0.0000166667 for every GB-second
Provisioned Concurrency is billed slightly differently, it has 3 components. Requests and duration are the same as regularly billing. Provisioned concurrency also has a set fee for each GB-second you reserve. This is charged whether you use it or not.
Provisioned Concurrency $0.000004646 for every GB-second
Requests $0.20 per 1M requests
Duration $0.0000108407 for every GB-second
These values are so small it’s hard to comprehend, at least for me. But if you add up the two GB-second values of provisioned concurrency it equals about 7% less than the normal billing. However you will only get this saving if your provisioned concurrency period is equal to your function duration. Which means if you’re using the capacity you’ve requested, the billing for that capacity will be less. If you don’t use it, you pay a premium for having it ready and waiting.
So how do I translate this knowledge into saving money for my workload? With most workloads there will be a tipping point at which the reduced cost of provisioned concurrency is countered by periods when provisioned concurrency isn’t fully used. How and when this point is reached will be dependent on the traffic pattern and the amount of concurrency provisioned. The way to skip this complexity is to configure provisioned currency to your lowest level of concurrency.
If your traffic profile has periods of complete inactivity then you will most likely not benefit from this as a cost saving mechanism. If you have a more consistent traffic pattern, it will probably be the case that you benefit more.
For full details of provisioned concurrency read the Lambda pricing page.
A note on premature optimizations. I’ve seen entire startups running very successfully spending less than $100 on Lambda a month. Unless you’re using Lambda to a very high volume the discussion of whether to implement this change could cost more than the savings it brings. Sanity check your workload first. Remember also that if you use this as a cost optimization mechanism, you will also receive the more consistent performance profile for the concurrency you reserve.