- Simple AWS
- Posts
- Managing Multiple AWS Lambda Functions
Managing Multiple AWS Lambda Functions
Managing the codebase, shared dependencies, monitoring and debugging, and improving performance
Let's say your team is working on a web app that facilitates a peer-to-peer (P2P) book exchange. Users can list their books for exchange, browse available books, initiate trades, and manage their profiles.
You've identified the following services, each its own Lambda function:
listBook: Allows users to list a book they'd like to exchange, storing the information in DynamoDB.
browseBooks: Retrieves the list of available books from DynamoDB.
initiateTrade: Processes trade requests between users.
manageProfile: Allows users to update their profile information, stored in Cognito.
You wrote the first version, and it seems to be working. However, you've identified the following problems:
Codebase management: There's significant code duplication. There's also no centralized error handling, meaning each function has its own way of dealing with errors, leading to inconsistencies.
Dependency management: Updating shared libraries across all functions is a pain, since each function handles its dependencies separately.
Monitoring and debugging: It's difficult to track and debug issues that span multiple functions across the same session. It's also challenging to get a comprehensive view of the app's performance.
Performance: Since there aren't a lot of users, functions aren't invoked frequently. Most users are experiencing delays, due to the cold starts of the Lambda functions.
Concurrency: Lambda functions scale really fast, but the DynamoDB table takes a bit longer. If there was a sudden spike in traffic, the DynamoDB table wouldn't be able to cope. You've already found the solution in the past issue of Simple AWS titled Using SQS to Throttle Database Writes, and you're going to implement that soon. For this issue, let's ignore this problem.
We're going to use the following AWS services:
Lambda: A serverless compute service that runs your code in response to events and automatically manages the underlying compute resources for you. It lets you focus on the core application logic without worrying about managing servers.
Lambda Layers: A distribution mechanism for libraries, custom runtimes, and other function dependencies for Lambda functions. Layers help manage code in large applications and optimize package sizes for deployment.
X-Ray: A service that collects data about requests and provides tools you can use to view, filter, and gain insights into that data to identify issues and opportunities for optimization. It helps you analyze and debug distributed applications.
Serverless Application Model (SAM): An open-source framework for building serverless applications. It extends AWS CloudFormation to provide a simplified way of defining API Gateway APIs, Lambda functions, and DynamoDB tables. SAM includes shorthand syntax to express functions, APIs, databases, and event source mappings.
How to Manage Multiple AWS Lambda Functions
Here's a simplified version of the code for the listBook function, in the file listBook.js
.
exports.handler = async (event, context) => {
const AWS = require('aws-sdk');
const dynamodb = new AWS.DynamoDB.DocumentClient();
const params = {
TableName: 'BooksTable',
Item: event.book
};
try {
await dynamodb.put(params).promise();
} catch (error) {
console.error(error);
let errorMessage = 'Internal Server Error';
let statusCode = 500;
if (error.statusCode && error.message) {
errorMessage = error.message;
statusCode = error.statusCode;
}
return {
statusCode,
body: JSON.stringify({ message: errorMessage }),
};
}
};
You'll also have the code for the other functions, in browseBooks.js
, initiateTrade.js
and manageProfile.js
respectively. I'll omit them for simplicity.
You should also have a SAM template to deploy all 4 functions: functions.yaml
AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Description: Initial version of the SAM Template
Resources:
BooksTable:
Type: AWS::DynamoDB::Table
Properties:
AttributeDefinitions:
- AttributeName: bookId
AttributeType: S
KeySchema:
- AttributeName: bookId
KeyType: HASH
ProvisionedThroughput:
ReadCapacityUnits: 5
WriteCapacityUnits: 5
listBookFunction:
Type: AWS::Serverless::Function
Properties:
CodeUri: listBook/
Handler: listBook.handler
Runtime: nodejs14.x
Events:
HttpPost:
Type: Api
Properties:
Path: /book
Method: post
Policies:
- Statement:
Effect: Allow
Action:
- dynamodb:GetItem
- dynamodb:PutItem
- dynamodb:Scan
- dynamodb:Query
- dynamodb:UpdateItem
Resource: !GetAtt BooksTable.Arn
# The same configuration applies to the other 3 functions...
You might be using Serverless Framework instead of SAM, which is also a great choice. However, if that's your case, I'll let you figure out the infrastructure code. Don't hesitate to ask if you need help!
If you're not using any form of Infrastructure as Code, check the Discussion section.
Changes to Make to Improve Managing Lambda Functions
Identify common code used across different Lambda functions and move it into a shared library. For our case, we'll do this for error handling. Create the folder called shared/nodejs
and create a file shared/nodejs/utils.js
with the following contents:
module.exports = {
errorHandler: (error) => {
console.error(error);
return {
statusCode: error.statusCode || 500,
body: JSON.stringify({ message: error.message || 'Internal Server Error' }),
};
},
// Other common functionalities go here
};
In the shared/nodejs
folder run the following commands to install the aws-sdk
library. Add any other shared libraries you identify.
npm init -y
npm install -y aws-sdk
You'll also need to uninstall these libraries from each function, otherwise the Node.js module resolution algorithm will always default to the version installed in the function and will ignore the layer.
Log in with the AWS CLI and run the following commands to publish the shared
folder as a Lambda Layer:
zip -r shared_layer.zip ./shared
aws lambda publish-layer-version \
--layer-name shared_layer \
--description "Shared libraries and utilities for Lambda functions" \
--content fileb://shared_layer.zip \
--compatible-runtimes nodejs14.x
Write down the output of the second command, which is the ARN of the shared_layer Lambda Layer. You'll need it for Step 5.
In each Lambda function, import the AWS SDK, DynamoDB Document Client (if that function needs it), and the shared utils.js
module from the shared layer. This is what the code for listBook.js
looks like now:
exports.handler = async (event, context) => {
const AWS = require('aws-sdk');
const dynamodb = new AWS.DynamoDB.DocumentClient();
const shared = require('/opt/nodejs/utils');
const params = {
TableName: 'BooksTable',
Item: event.book
};
try {
await dynamodb.put(params).promise();
} catch (error) {
// Shared error handler
return shared.errorHandler(error);
}
};
Step 5: Add the Lambda Layer to the SAM Template
Update the SAM template to use the Shared Layer in all Lambda functions. SHARED_LAYER_ARN
is the output you wrote down in Step 3.
AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Description: SAM Template with Lambda Layers
Resources:
BooksTable:
# Same as in the initial template
listBookFunction:
Type: AWS::Serverless::Function
Properties:
CodeUri: listBook/
Handler: listBook.handler
Runtime: nodejs14.x
Layers:
- SHARED_LAYER_ARN
Events:
HttpPost:
Type: Api
Properties:
Path: /book
Method: post
Policies:
- Statement:
Effect: Allow
Action:
- dynamodb:GetItem
- dynamodb:PutItem
- dynamodb:Scan
- dynamodb:Query
- dynamodb:UpdateItem
Resource: !GetAtt BooksTable.Arn
# The same configuration applies to the other 3 functions...
Step 6: Configure Provisioned Concurrency
Add 5 executions of provisioned concurrency to every Lambda function, to ensure there's always a few executions ready to run, avoiding cold starts because of a lack of constant traffic.
AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Description: SAM Template with provisioned concurrency
Resources:
BooksTable:
# Same as in the initial template
listBookFunction:
Type: AWS::Serverless::Function
Properties:
ProvisionedConcurrencyConfig:
ProvisionedConcurrentExecutions: 5
# Rest of the properties
# The same configuration applies to the other 3 functions...
Step 7: Move Initialization Code Outside the Handler
Move outside the handler function any code that doesn't depend on the request, such as importing libraries. Here's what the code for listBook.js
looks like:
// Dependencies and clients are instantiated outside the handler
const AWS = require('aws-sdk');
const dynamodb = new AWS.DynamoDB.DocumentClient();
const shared = require('/opt/nodejs/utils');
exports.handler = async (event, context) => {
const params = {
TableName: 'BooksTable',
Item: event.book
};
try {
await dynamodb.put(params).promise();
} catch (error) {
// Shared error handler
return shared.errorHandler(error);
}
};
Step 8: Add AWS X-Ray
Add the aws-xray library to the Shared Layer and publish a new version:
cd shared/nodejs
npm install -y aws-xray-sdk
cd ../..
zip -r shared_layer.zip ./shared
aws lambda publish-layer-version \
--layer-name shared_layer \
--description "Shared libraries and utilities for Lambda functions" \
--content fileb://shared_layer.zip \
--compatible-runtimes nodejs14.x
Write down the output of the last command, which is the ARN of the Shared Layer. It will be different than the one you wrote down in Step 3 (because it's a new version), and you'll need to update it in the SAM template (we'll do that in a second).
Add X-Ray to each Lambda function. Here's what the code for listBook.js
looks like:
const AWSXRay = require('aws-xray-sdk-core');
const AWS = AWSXRay.captureAWS(require('aws-sdk'));
const dynamodb = new AWS.DynamoDB.DocumentClient();
const shared = require('/opt/nodejs/utils');
exports.handler = async (event, context) => {
const params = {
TableName: 'BooksTable',
Item: event.book
};
try {
// A segment to represent the work of this function
const segment = AWSXRay.getSegment();
// Create a subsegment to represent a unit of work e.g. interacting with DynamoDB
const subsegment = segment.addNewSubsegment('DynamoDB: Put Book');
await dynamodb.put(params).promise();
// Close the subsegment once the work is done
subsegment.close();
} catch (error) {
// Shared error handler
return shared.errorHandler(error);
}
};
Modify the SAM template to add the AWSXRayDaemonWriteAccess IAM Policy so the Lambda functions have the necessary permissions for X-Ray, and the Tracing: Active
configuration. Also, replace NEW_SHARED_LAYER_ARN
with the new value of the ARN of the Shared Layer. Here's the final version of the SAM template:
AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Description: SAM Template with Lambda Layers
Resources:
BooksTable:
Type: AWS::DynamoDB::Table
Properties:
AttributeDefinitions:
- AttributeName: bookId
AttributeType: S
KeySchema:
- AttributeName: bookId
KeyType: HASH
ProvisionedThroughput:
ReadCapacityUnits: 5
WriteCapacityUnits: 5
listBookFunction:
Type: AWS::Serverless::Function
Properties:
ProvisionedConcurrencyConfig:
ProvisionedConcurrentExecutions: 5
CodeUri: listBook/
Handler: listBook.handler
Runtime: nodejs14.x
Layers:
- NEW_SHARED_LAYER_ARN
Events:
HttpPost:
Type: Api
Properties:
Path: /book
Method: post
Policies:
- AWSXRayDaemonWriteAccess
- Statement:
Effect: Allow
Action:
- dynamodb:GetItem
- dynamodb:PutItem
- dynamodb:Scan
- dynamodb:Query
- dynamodb:UpdateItem
Resource: !GetAtt BooksTable.Arn
Tracing: Active
# The same configuration applies to the other 3 functions...
Explaining All the Steps
The idea is to abstract repeated code into a library, which is shared across all functions. This way you don't repeat that code across different functions. More importantly, if you need to change this functionality, you only need to do it once, and you maintain consistency across the whole app.
In our example the only code to abstract was error handling, but a real application will surely have a few more things.
You install in a common location any libraries that all functions share. In our example we did this for the AWS SDK. By installing these libraries in a shared location, you ensure that all of your functions use the same version of each library. This is not about duplication (it's not worth it for a couple of lines that won't change much), but about dependency management.
After separating the shared code and libraries, you need to bundle everything up into a Lambda Layer. A Lambda Layer is a distribution package that contains libraries, a custom runtime, or other function dependencies. Lambda functions will use this layer to access this code. In our case, the Layer includes our shared error handling code and the aws-sdk library. By publishing this shared folder as a Layer, we make it available so our functions can import it (which happens in the next step).
Now we need to import the shared code and libraries from the Layer into our functions. This involves modifying the require
statements to import from the /opt
directory, which is where Lambda Layers are mounted in the function execution environment. This ensures that our functions are using the shared code from the Layer. require
statements for libraries stay the same, and we need to uninstall the shared libraries from each function. That way the Node.js module resolution algorithm doesn't find them in the function, checks the Layer next, and imports them from there.
Step 5: Add the Lambda Layer to the SAM Template
Once our Layer is created, you need to add it to the SAM template. This ensures that when you deploy the application, the Layer is included and made available to your functions. In the SAM template, you specify the Layer under the Layers property for each function.
Step 6: Configure Provisioned Concurrency
Provisioned concurrency is a feature of AWS Lambda that allows you to keep functions initialized and ready to respond, avoiding cold starts. This is especially useful for functions that need to respond quickly, but aren't invoked often. We're configuring 5 executions for provisioned concurrency, meaning AWS will keep 5 executions of each Lambda function warmed up and ready. If we got 6 concurrent requests, a 6th execution would need to be started, and that execution would suffer from a cold start.
Provisioned concurrency runs continually and is billed separately from initialization and invocation costs. Here's the price. Savings Plans can be applied to Provisioned Concurrency.
Step 7: Move Initialization Code Outside the Handler
When a Lambda function execution is created, an execution context is created and the code that's outside the handler method (called initialization code) is executed. Invocations that use an existing execution will reuse this execution context, and only the code inside the handler method will be executed.
By moving outside the handler all code that doesn't depend on the request being processed, you ensure that this code is only executed once per Lambda execution, and not per invocation. This improves performance when not dealing with a cold start.
Step 8: Add AWS X-Ray
AWS X-Ray helps you trace requests from start to finish and provides a comprehensive view of the entire request lifecycle. By adding AWS X-Ray to the Lambda functions, you can gain insights into the behavior of your functions, understand how they're performing, and identify potential bottlenecks. In our case, we've added AWS X-Ray to all of our functions and enabled active tracing, giving us insights into each function's performance and making it easier to identify and troubleshoot issues.
A while back, I published a Simple AWS issue about AWS X-Ray. Go check it out!
Discussion On Why Multiple Lambdas are Hard to Manage
Code management and deployment
We've barely scratched the surface of the code management problem, and we never even mentioned how to deploy the functions. A few questions you might be asking yourself:
Do I use a monorepo, or a separare repo per function?
How do I structure my CI/CD pipeline?
How do I deploy a change that impacts several functions?
Another post will deal with these issues.
Infrastructure as Code
Managing infrastructure manually (known as clickops) is just asking for trouble. This is especially true when building serverless applications, because so many responsibilities are moved from the code to the serverless platform, and transformed into configurations. For example, code is shared using Lambda Layers, and you need the configuration that tells a Lambda function to use a specific Layer. Imagine deploying a new version of your Layer, and having to manually update the ARN in every single Lambda function. Imagine you missed one, and you get an error because one Lambda function is using an older version of a library. Or worse than an error, you leave an unpatched vulnerability.
Serverless Application Model (SAM) is an extension of CloudFormation (that's why the templates look just like cfn, but say Transform: AWS::Serverless-2016-10-31
) made to manage serverless applications more easily. Another great option is Serverless Framework, which is basically the same idea but multi-cloud and not done by AWS. I picked SAM for this issue because we've been using CloudFormation throughout the whole newsletter and I felt it introduced less new stuff, but both are great.
Other options for Infrastructure as Code that aren't serverless specific:
CloudFormation: Anything you can do with SAM you can do with CloudFormation, it just takes more lines of code.
Terraform: Another declarative tool, really mature and honestly fantastic. If you're interested, I highly recommend the Terraform Weekly newsletter.
Pulumi: Another great tool, which mixes declarative and imperative infrastructure, trying (and mostly managing) to get the best of both worlds. It's pretty big, and has been growing a lot lately.
AWS CDK: An imperative tool, where you write infrastructure code in TypeScript and can use decision and control structures (if, while, for, etc). Made by AWS, and it compiles to a CloudFormation template.
Terraform CDK: Same as AWS CDK, but done by Terraform. It's rather new, but has been doing great so far, partly because it's great, partly because it's done by Terraform (which is also great, and already trusted by a lot of people).
Best Practices for Managing Multiple AWS Lambda Functions
Operational Excellence
Automated deployments: Always automate the deployment of your serverless applications. This reduces the potential for human error, ensures consistency across environments, and simplifies rollback in case of issues. We'll deal with this in next week's issue.
Monitoring and observability: Use services like CloudWatch and X-Ray to gain insights into your application's performance and health. Set up alarms to alert of any anomalies, and pipe those alarms to SNS to get notified.
Version management: Use Lambda versions to manage different versions of your functions. This can help in maintaining stability and rollback in case of errors. We'll deal with this in next week's issue.
Security
Least privilege principle: Always follow the principle of least privilege while setting IAM roles for your Lambda functions. Only grant the minimum permissions necessary for your function to work.
Encrypt sensitive data: If your application handles sensitive data, make sure to use encryption at rest (using KMS) and in transit (using SSL/TLS).
Keep Secrets Secret: Don't pass secret information such as API keys or database passwords in environment variables (or worse, in the code). Store them in Secrets Manager and add to your Lambda functions the code and permissions to retrieve the value.
Reliability
Error handling and retries: Ensure your Lambda functions handle errors correctly. Lambda automatically retries failed function invocations, so your functions should be idempotent.
Provisioned Concurrency: To avoid cold starts and ensure consistent performance, use provisioned concurrency for your Lambda functions, especially for those that require low-latency responses. We did that in this issue!
Performance Efficiency
Optimize your function's memory and timeout settings: Tune these settings based on your function's requirements. Remember, CPU and network IO are proportional to the amount of memory configured. So is price.
Use Lambda Layers: Use Layers to manage your code and dependencies across multiple functions. This improves code sharing and separation of responsibilities, and allows you to manage your function code separately from your resources. We did that in this issue!
Cost Optimization
Right-size your functions: Monitor your Lambda functions' usage and adjust memory settings as needed. Over-provisioned functions cost more and under-provisioned functions may lead to performance issues.
Use cost allocation tags: Using tags can help you track your AWS resources and understand how much you're spending on each function or application.
Delete unused resources: Regularly review and delete unused resources like old Lambda function versions, unused Layers, and old CloudWatch log groups to avoid unnecessary costs.
Use Savings Plans: Savings Plans let you commit to pay for some capacity (whether you end up using it or not) at a discounted price. Lambdas are great for unpredictable traffic patterns, so sometimes it's hard to figure out whether it's worth it to commit to a Savings Plan or not. You should explore the option though. And for your provisioned concurrency, you'll be billed anyways, so it makes sense to add a Savings Plan.
Resources
We didn't touch on DynamoDB. Check out the past issues of Simple AWS on DynamoDB Database Design and Transactions in DynamoDB.
You should use API Gateway to expose your Lambda functions. Here's the issue on how to do that. Also, that's where you'll configure Cognito, like this.
Ever been in this situation?
Credits to XKCD. Source: https://xkcd.com/303/
Here's a tool to fix that in Docker, greatly reducing the time to Docker build:
Did you like this issue? |
Reply