Jens Båvenmark for AWS Community Builders

Posted on May 19 • Originally published at Medium

AWS Alert Validation - Lambda

#aws #cloudwatch #lambda #monitoring

We are continuing the blog series about testing your AWS alarms. The first part of the series, which looked at CloudWatch actions and EC2 alarms, can be found here.

An untested alarm is not one you can trust.

This time we will look at alarms for your Lambda functions. As before, we will test the alarms by ”breaking” the Lambda so you get the same outcome as when a real issue would occur.

Since this is Lambda, we will add code (or entire Lambdas) to make the Lambda act as we want. I will use Python, but the logic works for all the other supported languages.

I have an examples repo where you can find Terraform code to deploy Lambas and required resources to AWS to test the alarms. You will need to connect your alarms to the Lambda functions, though.

Remember to lower the thresholds on your alarms so they trigger more easily. If they trigger for one value, they will trigger for your real value as well.

Lambda alarms

The alarms we are going to look at are:

Error Alarm
Throttling Alarm
Timeout Alarm
High Duration Alarm
Out of Memory Alarm
Log Alarm
Failed Lambda message to DLQ Alarm
Dead Letter Failure Alarm

Error Alarm

The Error alarm is the most common Lambda alarm. To test it we just need to crash the Lambda or exit it in a TODO state.

We will manage this by running a Lambda that is monitored with this code snippet, making it crash:

def lambda_handler(event, context):
    raise Exception("Triggered error alarm for testing purposes.")

An example Lambda can be found here.

To trigger the alarm, invoke the Lambda with this CLI command.

aws lambda invoke --function-name {FunctionName} outfile

Throttling Alarm

To test throttling, we need to deploy a Lambda with a reserved concurrency limit set to one, so only one Lambda can be run at a time.

I would suggest having the Lambda run a sleep or similar to keep it running so you can trigger multiple runs easily.

Example Lambda and Terraform can be found here.

To trigger the alarm, we will need to invoke the Lambda at least two time by running this command simultaneously in multiple terminals.

aws lambda invoke --function-name {FunctionName} outfile

Timeout Alarm

If you are monitoring for Lambda timeouts (timeout creates log entries that can be checked for with a metric filter) we can test that alarm with just adding a sleep in a Lambda that is longer than the configured timeout.

import time

def lambda_handler(event, context):
    time.sleep(25)

Example Lambda and Terraform can be found here.

To trigger the alarm, invoke the Lambda with this CLI command.

aws lambda invoke --function-name {FunctionName} outfile

High Duration Alarm

To test for Lambda that takes a long time to finish (high duration), we will use the same setup as for Timeout Alarms, but we will set the Lambda timeout to longer than the sleep set in the Lambda.

Make sure to set your alarm threshold lower than the time set to sleep in the Lambda function.

import time

def lambda_handler(event, context):
    time.sleep(15)

Example Lambda and Terraform can be found here.

To trigger the alarm, invoke the Lambda with this CLI command.

aws lambda invoke --function-name {FunctionName} outfile

Out Of Memory Alarm

If you are monitoring for Out Of Memory (OOM) events on your Lambdas (when your Lambdas are using more memory than they are assigned, they will crash and log: Error Type: Runtime.OutOfMemory), we will run a Lambda that will use more memory than it has been assigned.

def lambda_handler(event, context):
    mem_size_mb = 128

    # Allocate memory slightly over the limit
    bytes_to_allocate = (mem_size_mb + 10) * 1024 * 1024  # exceed by 10 MB
    memory_hog = "X" * bytes_to_allocate
    return len(memory_hog)

Example Lambda and Terraform can be found here.

To trigger the alarm, invoke the Lambda with this CLI command.

aws lambda invoke --function-name {FunctionName} outfile

Log Alarm

To test log alarms from Lambdas, we just need to run a Lambda that logs what your metric filter is checking for.

So if for example, you have a metric filter for the string “This is a test log line” run this code.

import os
import logging

logger = logging.getLogger()
logger.setLevel(logging.INFO)

def lambda_handler(event, context):
    logger.info("This is a test log line")

    return {"statusCode": 200, "body": "Test completed successfully."}

Example Lambda and Terraform can be found here.

To trigger the alarm, invoke the Lambda with this CLI command.

aws lambda invoke --function-name {FunctionName} outfile

Failed Lambda async message to DLQ alarm

If you send the event triggering the Lambda to an SQS Dead Letter Queue (DLQ) if the Lambda fails, and monitor whether the DLQ gets messages, we can test it the same way we did with testing error alarms.

def lambda_handler(event, context):
    raise Exception("Triggered error alarm for testing purposes.")

Example Lambda and Terraform can be found here.

To trigger the alarm, invoke the Lambda with this CLI command (DLQ only works for asynchronous invocations).

aws lambda invoke --function-name {FunctionName}--invocation-type Event output.json

The DLQ can take a little time to report the message, so do not stress if you don't see the message there straight away.

Dead Letter Errors Alarm

If you are monitoring for failure to send messages to the DLQ (DeadLetterError) in case of async event failures with your Lambda, we can test it almost the same way as with the DLQ alarm above.

The difference will be that we will remove the IAM permission for the Lambda to publish the message to the DLQ. This will trigger any monitoring set on the metric DeadLetterErrors.

Example Lambda and Terraform can be found here.

After deploying the Lambda with the DLQ, you will need to remove the permissions for the Lambda to post to the DLQ. Terraform will block the setup of the Lambda with DLQ configured if it doesn't have access to post to it.

Final Thoughts

Testing that your alarms work as you expect can save you a lot of headaches in the future.

I hope that these tests will make your Lambda monitoring more secure.

This was the second part in this series. In the first part, we looked at tests for EC2 alarms and CloudWatch actions. In the upcoming part we will look at alarms for other resources.

DEV Community

AWS Alert Validation - Lambda

Lambda alarms

Error Alarm

Throttling Alarm

Timeout Alarm

High Duration Alarm

Out Of Memory Alarm

Log Alarm

Failed Lambda async message to DLQ alarm

Dead Letter Errors Alarm

Final Thoughts

Top comments (0)