FaaS[t] Growth with Serverless Computing: Cost Optimization

FaaS[t] Growth with Serverless Computing: Cost Optimizing Your AWS Lambda

Posted February 15, 2019

In Cloud Cost Management, Cloud Management

Self-scaling, highly-available, no infrastructure, a smaller attack surface, only pay for what you use – what’s to not love about Serverless Computing – or more precisely Function as a Service (FaaS).

It’s the ultimate value proposition to a developer – upload your code to the cloud, let it run and only pay for the resources consumed at runtime!

While FaaS isn’t the panacea for all computing needs, it is an excellent framework to create workflow-driven applications – which make up the majority of the business and data-management applications out there today.

Exact data on the FaaS market is not readily available, there seems to be consensus among various analysts – that FaaS is expected to grow faster than any of its cloud computing peers and become a dominant player.

FaaS has significant benefits for the Cloud Service Provider as well, because of the “drag effect” for the rest of their portfolio. A function doesn’t exist in a vacuum and needs other services for persistence (storage), data management (databases), connectivity, messaging, monitoring and automation. Through tight vertical integration between their back-end offerings and FaaS, a cloud provider can drive significant revenue off a FaaS platform.

All this sounds great, but most organizations are not yet taking a full plunge into serverless computing. Gartner estimates only 5% of global enterprises have deployed FaaS today. Many organizations are still dipping their toes into FaaS and have deployed it for a few key use cases while the majority of their estate in the public cloud is still dedicated to IaaS for compute, storage and networking.

In this article, we will examine what barriers organizations are encountering in their FaaS adoption and what can be done to overcome them to facilitate faster adoption.

What is Holding You Back ?

For developers looking to accelerate their development, FaaS provides a compelling way to abstract infrastructure management even more. However, for Ops there are several hurdles:

Fear of vendor lock-in –. There is a concern that using FaaS will lock them into a specific provider owing to the tight integration and use of a number of other provider services.
Performance – Unlike IaaS, users don’t get to specify their CPU needs directly for their functions. In many cases, they will need to guess and adjust for optimal run-time performance
Cost transparency – FaaS metering is quite different than regular IaaS – it’s a combination of function duration and memory allocated. While billing is done in increments of 100ms and there are typically free tiers, FaaS can end being more expensive than IaaS if not properly architected. There is a critical need for having a holistic view on costs and performance.
Outdated security/compliance practices –Traditional Role Based Access Control mechanisms have been built over years of practice with long-running servers and applications; the shift to ephemeral functions with short-lived contexts implies a shift from traditional security and compliance practices, and there is a need for a much higher level of automation around monitoring these than before in order to ensure the security and compliance of the cloud accounts.

#1 (fear of vendor lock-in) should sound familiar to early adopters of IaaS. Like with IaaS, this risk can be mitigated somewhat by adhering to open standards and platform-agnostic frameworks. For example, the Serverless Framework is the leading deployment framework in the FaaS world, and can deploy to AWS, Google, Azure, IBM Cloud, or any Kubernetes Cluster – public or private.

At HyperGrid, we felt #2 (cost transparency) and #3 (outdated security/compliance practices) were significant barriers holding back the deployment of FaaS.

While AWS Lambda isn’t the only FaaS platform, it is certainly where we see a lot of early adoption by enterprises. Consequently, we have elected to bring Lambda support to market before the other platforms and for the rest of this article, we will use Lambda as the proxy for FaaS. The concepts we discuss here are just as applicable to Google Cloud Functions and Azure Functions.

Cost Management for AWS Lambda

Uncertainty around pricing often leads to some discomfort around choosing Lambda for large scale deployments. EC2 may have its complications around overlapping instance types, but once you’ve chosen, there are no pricing surprises – it doesn’t matter whether the utilization is 5% or 100%, the per second pricing stays the same.

In contrast, Lambda pricing is multi-dimensional, based on both number of invocations and memory allocated to the function at runtime. This can be difficult to calculate when rolling out a new product or feature.

When we think of Lambda, we are very often thinking of a model where API calls made in response to a user action (e.g. a mobile device interaction) and performing a short-lived action (e.g update a database table or two). However, these are not the most popular Lambda use cases. Lambda has a significant “cold-start” penalty, which can make it a difficult choice for events that need real-time responses – such as user interaction.

This can easily be evidenced from the case studies AWS has highlighted on their Lambda page – the use cases highlighted are automation and integration services, event processing, data management and transformation. These tend to be both resource intensive and have long runtimes – typically running into several minutes. Additionally, any disk I/O can result in additional wait times for the function, only increasing the cost of invocation.

While AWS does not share data about Lambda utilization, it is telling that they recently increased the Lambda timeout from 5 minutes to 15; this is a good indicator of what AWS is seeing on internal dashboards for runtime requirements of a Lambda function – we know AWS is very customer-centric!

When planning for Lambda deployments, there are two key points to keep in mind:

Your function will likely need to interact with other services (e.g. S3, DynamoDB or third-party web services). Any latency is interacting with these services is charged against your Lambda runtime. Due to this, it is worth keeping your data within AWS as well as investing in a caching solution (e.g. ElastiCache, DynamoDB Accelerator etc.) to reduce access latency, and hence runtime and costs.
While Lambda counts memory, CPU is just as important. In Lambda, CPU allocation is proportional to Memory allocation – allocate more memory, AWS will automatically allocate more CPU to it, which should reduce the runtime and therefore costs. However, multiple changes can happen to the run-time environment including those that are code-related, which can increase the duration when the memory is reduced as part of optimization efforts.Its important to consider the combine effort of duration and memory allocation to formulate a view on cost optimization.

AWS offers a generous (and indefinite) free tier for Lambda. These numbers – 1 million requests and 400GB-sec of runtime per month – may sound large, but the numbers add up quickly.

Take for example, a single function with 1GB allocated to it (FOOTNOTE: 1GB is the default if you’re using the Serverless Framework, the most popular platform neutral deployment tool for FaaS) and a runtime of 60 seconds. It would take only one invocation every six seconds to max out the free tier. Keep in mind that we are talking about invocations and not Transactions Per Second. A single transaction may require invoking several Lambda functions before it is complete, and a single lambda function may run for several seconds to minutes.

At HyperGrid, we felt the following tools would increase the cost transparency of Lambda and increase adoption:

Spend Visualization –show cost breakdown by function in near real-time
Cost Optimization – find optimization opportunities by right-sizing functions that are over-allocated resources
Waste Reduction – identify functions with a high error rate, which is indicative of a function with sub-optimal resource allocation or a poorly written one

Spend Visualization

HyperCloud Analytics provides a consolidated dashboard of all Lambda functions and their costs in near real-time, broken down by their various components – memory, runtime, invocations etc. This level of granularity can help the ops team get real-time visibility into Lambda costs and give organizations the predictability of spend that they need to increase their lambda footprint.

Below is a screenshot of our Spend Visualization console, which gives a dashboard view of the Lambda environment and provides opportunities for Cost Optimization (Right Sized Savings) and Waste Reduction (High Error Rate)

Cost Optimization

Since Lambda is paid for in terms of resource utilization over runtime (GB-seconds you’re charged for memory allocated (regardless of use) over the duration of the function execution It does not matter that the function may have dependencies like I/O that result in waiting periods.

By giving the ops team the ability to identify functions that have overallocated resources, HyperCloud creates opportunities for cost optimization in real-time. When coupled with the spend visualization dashboard, it is possible to see the effects of cost-optimization in near real-time and make incremental adjustments till the right tradeoff is reached.

Keep in mind that behind the scenes, AWS allocates CPU proportional to the amount of memory allocated to the function. If your function is CPU-bound, then even though it shows up as low memory usage, you may still need to keep its memory allocation high to give it the resources. The implication is that you can simply reduce memory allocation without simultaneously observing duration to understand where bottlenecks may exist.

Doing so, may result in higher execution times leading to similar or even higher GB-sec consumption and therefore higher costs. HyperCloud allows customers to consider both aspects to truly optimize Lambda run-time costs. You can use the next dashboard (Waste Reduction) to get a better handle on how to spot these functions and ensure that they have the resources they need to work optimally.

Waste Reduction

Lambda is charged by the invocation, so each failed invocation is money wasted. By keeping an eye on wasted invocations – i.e. functions with a high error rate – it is possible to adjust the resource allocation (or the function code) and reduce the number of wasted invocations.

The next figure shows the HyperCloud dashboard for high error rate functions. In this specific screenshot, of particular note is the processVideo function. While this function has an appropriate amount of memory allocated, it repeatedly times out (average duration > timeout), which is a good indication that this function does not have enough CPU to proceed.

Keep in mind that the function could be either Memory-starved or CPU-starved. Even though there is no CPU lever available in Lambda, we can get the desired result by increasing the amount of memory allocated to the function as AWS assigns CPU in proportionality with memory.

CloudSphere