Understanding the Cold Start Problem with AWS Lambda

AWS Lambda and Azure Functions are popular Function-as-a-Service (FaaS) offerings and are a staple of the serverless landscape. 

With FaaS offerings the cloud provider is responsible for running and managing the underlying compute that powers the service. This allows you to focus solely on building out your function. These serverless platforms automatically scale resources based on demand, allowing for highly efficient and cost-effective execution of code – abstracting away the complexities of server management, enabling rapid development, deployment, and scaling of applications.

One of the challenges to understand with serverless is the issue of cold starts. A cold start refers to the initial delay experienced when invoking a serverless function for the first time, or after it has been idle for some duration, during which the cloud provider must allocate resources, load the runtime, and start the function’s code, resulting in a longer response time compared to subsequent invocations – for us to understand this better, we need to look at the execution model for a serverless FaaS environment. In these examples, we’re going to focus on the AWS Lambda implementation of serverless functions.

The Serverless Execution Environment

All serverless FaaS environments have a control plan that is responsible for handling and orchestrating inbound requests for the FaaS service.

Handling a request in AWS Lambda has the following steps involved:

  1. Download Function Code
    Download the code for the AWS Lambda function from either Amazon S3 or Amazon Elastic Container Registry (ECR) if a docker image is being used
  2. Start a new execution environment
    A function request executes in a micro virtual machine (mircoVM) therefore any new request need a new micoVM execution environment
  3. Execute initialisation code
    Initialisation code runs only once as part of setting up the execution environment. It is often used for reusable and heavier-weight activities such as establishing database connections, reading configuration files, or loading machine learning models.
  4. Execute handler code
    The handler code is the core part of a function that is called in response to an event. AWS Lambda provides the handler with the event payload and a context which contains information about the runtime environment

The first two steps of this process are essential in setting up an execution environment and are often referred to as the cold start. During these first two steps, you are not billed for the time it takes the AWS Lambda control plan to prepare the execution environment. However, the execution of initialisation and handler code are billable activities and contribute to the invocation duration.

Once an execution environment has been used to run a function, the control plane will freeze it and retain it for a non-deterministic period of time. Many factors contribute to how long this time period is; however, these aren’t documented or communicated publicly.

When a subsequent request is received, if a paused execution environment is still available for the function it will be re-used – as a result the only step in the process that needs to be run is the “Execute handler code” step.

Environments

Non-production environments, such as development and staging environments, are more susceptible to cold starts because of their infrequent usage and activity compared to a production environment. Because these environments aren’t used as frequently, the likelihood of having paused execution environments ready is reduced.

It is also worth noting that a code change will force execution environments to be pruned so that the latest code can be downloaded on the next run. As a result, changing the code for a lambda function will cause cold starts to occur on the next execution.

AWS Lambda is a highly available service that manages the execution of your function across a number of availability zones (AZ). The implication of this is that cold starts can still occur if there is a warm execution environment in one availability zone, but through load balancing, AWS Lambda has run the function in an AZ that doesn’t yet have a warm execution environment.

Concurrency and Scaling Up

Each concurrent request requires its own execution environment. This means that cold starts can be common during scale up periods where the number of concurrent requests is increasing.

For example, if you have a continuous stream of 5 concurrent requests, the first of these 5 concurrent requests will experience a cold start, and all subsequent requests will reuse these warm execution environments provided the control plane assumes the requests are frequent enough that the environments aren’t disposed of. Let us then assume that the throughput increases to a continuous stream of 10 concurrent requests, we will be able to continue using the 5 execution environments we already have, but we will need another 5 new execution environments each of which will experience a cold start.

For situations where a predictable start time is critical, AWS Lambda offers provisioned concurrency which can be used to ensure that a given number of execution environments are always initialised, ready to receive new requests and run the “Execute handler code” step of the lambda lifecycle.

The initialisation process for the serverless computing model is different to what happens for containers or virtual machines.

With containers or virtual machines, it is typically to spin up a new container or VM and wait for the instance to become healthy, based on a health check probe before adding the VM or container into a load balancer and allowing it to accept traffic. This can be thought of as an “offline” approach to initialisation.

Alternatively, with serverless, an online initialisation approach is used where traffic is routed immediately to the execution environment. As a result, the initialisation time becomes part of the latency of the entire request – and this is why containers and VMs don’t have the same cold start issue.

Reducing The Cold Start Delay

It is possible to optimise a serverless function to reduce the amount of time required for a cold start and the overall latency of your function.

Pinging

A basic, but common technique, is to periodically ping your function with the intention of using a frequent ping to reduce the likelihood of the control plane from purging the environment and increasing the chances that a request will be processed by a warm environment.

This approach can be limited, especially for production environments where AWS Lambda needs to scale the number of execution environments to meet the concurrent traffic needs. As a result, a pinging approach is likely to deliver benefits in lower-traffic environments such as development and staging environments but is unlikely to significantly benefit production workloads.

Reduce The Code You Load

The larger the codebase is for your function, the more bytes AWS Lambda has to download and load into the execution environment as part of the cold start. Therefore, reducing the size of the codebase and dependencies will reduce the time required to invoke the function.

As well as reducing and removing unnecessary code and dependencies, being conscious of what and how code and dependencies are loaded is also important.

For instance, a common issue with AWS Lambda function implementations is loading the entire AWS SDK, when only specific modules are required.

In the following example, the entire AWS SDK is being loaded, only for the S3 module to be used:

const AWS = require('aws-sdk');

exports.handler = async (event) => {
    AWS.config.update({
        region: 'us-east-1'
    });
    
    const s3 = new AWS.S3();
    
    try {
        // List the first 10 buckets
        const data = await s3.listBuckets().promise();
        console.log("Bucket List:", data.Buckets);
        return data.Buckets;
    } catch (error) {
        console.error("Error:", error);
        return error;
    }
};

Instead, functions should only load the modules that they need:

const S3 = require('aws-sdk/clients/s3');

exports.handler = async (event) => {
    const s3 = new S3({
        region: 'us-east-1'
    });
    
    try {
        // List the first 10 buckets
        const data = await s3.listBuckets().promise();
        console.log("Bucket List:", data.Buckets);
        return data.Buckets;
    } catch (error) {
        console.error("Error:", error);
        return error;
    }
};

In the above example, we only load the S3 module from the AWS SDK rather than the entire SDK.

Leverage Initialisation Code

Executing the initialisation code is the third step of the lambda lifecycle, and only needs to be executed the first time the execution environment is created, if a paused execution environment is re-used the initialisation doesn’t need to happen again, instead AWS Lambda can go straight to executing the handler code.

Typically, initialisation code should be used sensibly to initialise resources that can be re-used between invocations, such as static configuration, establishing database connections, reading configuration files, or loading machine learning models.

For example, in the previous section, we load the S3 module in the initialiser, but set the region in the handler:

const S3 = require('aws-sdk/clients/s3');

exports.handler = async (event) => {
    const s3 = new S3({
        region: 'us-east-1'
    });
    // ...
}

Configuring the region is a static configuration that can be re-used between invocations. Therefore, we can optimise our function to perform this during initialisation rather than during handler execution:

const S3 = require('aws-sdk/clients/s3');
const s3 = new S3({
    region: 'us-east-1'
});

exports.handler = async (event) => {
    // ...
}

This approach can also be used to re-use database connections between function invocations:

const mysql = require('mysql');
const dbConfig = {
  host: 'your-database-host',
  user: 'your-database-user',
  password: 'your-database-password',
  database: 'your-database-name'
};

// Initialise the database connection outside of the lambda handler
const connection = mysql.createConnection(dbConfig);

exports.handler = async (event, context) => {
  // Tell AWS Lambda to freeze the process only after the callback is executed,
  // allowing us to reuse the database connection.
  context.callbackWaitsForEmptyEventLoop = false;
  
  return new Promise((resolve, reject) => {
    connection.query('SELECT * FROM your_table', (error, results, fields) => {
      if (error) {
        reject(error);
      } else {
        resolve(results);
      }
    });
  });
};

In this example, the database connection is initialised outside of the handler which means that it can be reused between invocations. Optimising database connections for serverless functions is a topic of its own – which is outside the scope of this discussion. However, it is worth noting that additional optimisations such as setting callbackWaitsForEmptyEventLoop on the context and correct configuration of database pools are important to allow for database connections to be reused without overwhelming the database server.

These are just a few examples of how a function can be optimised to improve the reusability of resources and reduce the cold start time. The specific optimisations that are relevant to your use case will depend on the needs and behaviour of your function, however, here are a few operations and configurations that are often moved out of the handler into initialisation code:

  • SDK and Library Initializations
    Initializing SDKS (e.g., DynamoDB, S3), database clients, or any other libraries that your Lambda function uses outside of the handler allows these clients to be reused across multiple invocations, reducing the initialisation overhead for each invocation.
  • External Configuration Loading
    Loading configuration settings from environment variables or external sources (e.g., Parameter Store, Secrets Manager) during initialisation. This way, configuration data is fetched once and cached for subsequent invocations.
  • Database Connections / Connection Pools / HTTP Clients
    Establishing connections to databases / connection pools / HTTP clients outside the handler allows these connections to be reused, reducing latency.

Compiled Models and  Assets
If your Lambda function uses compiled assets or needs to pre-process some data (e.g., loading machine learning models), doing this during initialisation can save significant time.

Conclusion

Understanding how the execution environment and lifecycle for serverless functions work allows you to understand why cold starts are a problem and what can be done to reduce the cold start time.

In this article, we also covered a number of techniques for reducing the cold start time. Ultimately, techniques such as reducing the size of a lambda function and performing the correct operations in the initialiser versus the handler are the techniques that will deliver the most benefit.