Self-healing, Single-instance Environment with AWS EC2
How to add resiliency to an EC2 instance by making it self-healing, keeping the same public IP address

Guille Ojeda
February 22, 2023
Welcome to Simple AWS! This is issue #17, and it's coming out 2 days late. My apologies! Before we start, there's something I'd like to know about you:
What role do you work on? |
Use case: Self-healing environment that doesn't need to scale
Scenario
You've got a simple environment with 1 EC2 instance. Either you don't need it to scale (e.g. it's a dev environment) or you can't scale it right now (the instance isn't stateless, meaning you're saving data in the instance's EBS volumes, and you don't want to fix that right now). When it fails, you want it to fix itself automatically, but you don't want to pay for a load balancer.
Services
EC2: For your instances.
EC2 Auto Scaling: Just your typical Auto Scaling Groups. You set a minimum and maximum number of instances, and it creates and destroys instances to fit those numbers. You can also set metrics such as average CPU usage, based on which the min and max instances are modified (which is where the scaling part comes in).
Elastic IP: Just a static IP address that exists separate from any EC2 instances (meaning it's not created or destroyed with an instance). It can be attached to an instance, and moved to another one at will. It's free while attached to a running instance. PS: It's not a separate service.
Solution
Allocate an Elastic IP address.
Create an IAM instance profile with an IAM Role that allows the instance to associate that Elastic IP address. Here's the policy (replace {IpAddressArn} with the ARN of your Elastic IP address):
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AssociateElasticIpAddress",
"Action": [
"ec2:AssociateAddress"
],
"Effect": "Allow",
"Resource": "{IpAddressArn}"
}
]
}Create a Launch Template using the instance profile created in Step 2. In the User Data section (the script that runs when an instance launches) you add a command to associate the Elastic IP address with that instance. Like this (you'll need to replace ${AWS::Region} with your region and ${EIP.AllocationId} with the AllocationId of the Elastic IP address):
#!/bin/bash -ex
INSTANCEID=$(curl -s -m 60 http://169.254.169.254/latest/meta-data/instance-id)
aws --region ${AWS::Region} ec2 associate-address --instance-id $INSTANCEID --allocation-id ${EIP.AllocationId}Create an Auto Scaling Group with min 1 instance and max 1 instance, and associate the Launch Template.
The Auto Scaling Group will detect there are 0 instances and will create a new instance (to match the minimum of 1 instance), and when that instance starts it will associate the Elastic IP address to itself. When the instance fails, the Auto Scaling Group detects there are 0 healthy instances and repeats the process.
(Optional) Point your DNS record to the Elastic IP address
Discussion
Why are we even discussing this? Can't we just put a load balancer there and be done with it? $22/month is not that expensive!
Indeed, it's not that expensive. And an Application Load Balancer has other benefits, such as easily handling the SSL certificate or integrating with WAF. However, for an app to scale horizontally you need to remove the state from it. Any data that needs to be shared across instances is part of the state of the application. This includes configs, shared files, databases and session data. For session data you can use sticky sessions (all requests for the same session go to the same instance), but the rest needs to be moved to a separate storage (S3, EFS, DynamoDB, RDS, etc).
Ok, so I just design my compute layer to be stateless! That's easy.
Yes, it is! And you should have done that in the first place! Unfortunately, not everyone does that. And if you didn't get it right from the start, changing that later is a lot of work. Still worth it, and you should still do it! But if you need a self-healing environment right now, this is the solution (while you work on removing the state from your app).
You also said something about dev environments. Shouldn't a dev environment be identical to a prod environment?
We usually call that environment staging. Dev is usually cheap and dirty. But you still don't want it to fail, since dev hours spent fixing a dev environment can add up to a lot of money. This is a good solution for a self-healing dev environment.
Best Practices
If you're in this situation, the best thing you can do is just remove the state from your app and make it horizontally scalable. I'll keep the tips focused on this issue's solution though, because I think it's a pretty creative solution that can be useful in certain situations.
Operational Excellence
Use Session Manager: I actually have a CloudFormation template to create an SSH key pair and associate it to the instance at launch. But you shouldn't use key pairs. Use Session Manager instead. Here's how to set it up.
Mind the service quotas: There's a default limit of 5 Elastic IP addresses per region per account. FYI, NAT Gateways consume 1 Elastic IP address from that quota. You can increase this limit if you want, but AWS takes a couple of weeks to do it, so do it before you hit the limit.
Store configs in SSM Parameters Store: If you have configuration values, there's 3 ways to set them: hard code them in the code (bad idea), write them in the instance (bad idea because when the instance fails the new instance won't have them), and set them up in a separate storage where the instance can read them from. SSM Parameters Store is that separate storage.
Send logs to CloudWatch Logs: Logs stored in the instance will be lost when the instance fails. Instead, set up the CloudWatch Agent to send logs to CloudWatch Logs, so you can view them later.
Security
Use Session Manager: Yeah, this is a repeat from the one above. Session Manager is also more secure than an ssh key pair, so I wanted to add it here as well.
Use HTTPS: If I want HTTPS (which I do, always), I normally set up an SSL certificate in the Application Load Balancer. I can't do that here, because there is no load balancer. But we should still use secure connections! A way to do this is to set up an Nginx reverse proxy in the instance. Keep in mind that if your SSL certificate is only inside the instance, a new instance will need to recreate it.
Reliability
Use a secondary EBS volume: If you store everything in the root EBS volume, you can lose that data when the instance fails. Instead, use a secondary EBS volume with all your data. In the User Data section of the Launch Template you can add a line to automatically associate the EBS volume with a new instance.
Performance Efficiency
Pick the right EBS volume type: If you're using a single EC2 instance for your prod environment, you're likely relying on EBS a lot, so this likely matters a lot to you. Here's a lot of info on EBS volume types and how to pick the right one.
Cost Optimization
Use Savings Plans: You can share it across instances, so if an instance fails and a new one is spun up, the new one gets the benefit now.
Turn off your dev env at night: After office hours, you don't need to keep paying for your EC2 instance. Set the min and max instance number to 0 when your team shuts off for the night, and back to 1 when they begin work the next day. There's many ways to do this, such as a Lambda function triggered by EventBridge. Note that while the Elastic IP address is not associated with a running EC2 instance, you'll be charged $0.005/hour (that's $3.60/month).
Resources
Have you check out Application Composer? It's still a bit limited on advanced features such as multiple stacks, but it's fantastic for people jumping into serverless (that's their target customer). And it's free!
I'm looking to get the Security Specialty cert, and this is the course I'm using to prepare for it. There's courses for other certs as well. <-- Affiliate links.
If you like to stay on top of interesting things going on in the world, check out this newsletter. <-- Affiliate links.
I'm trying to lose a bit of weight, and I'm giving this app a shot. Seems interesting so far. <-- Affiliate links.
Some of the above resources are paid promotions or contain affiliate links. I only recommend resources I've tried for myself and found actually useful, regardless of whether I get paid for it or not.
Misc.
I hope this issue doesn't sound like a way to save $22/month by avoiding a load balancer. If the $22/month cost for the Load Balancer breaks your bank, you have other problems than your cloud infrastructure.
This issue is about understanding that best practices are contextual, and not a dogma. But above all, it's about a simple AWS solution. It took me a while to find the format for the newsletter, but the message has always been clear to me: Don't overengineer, just build simple AWS solutions that fit your actual problems, and grow them when you need to.
Did you like this issue? |
Thank you for reading! See ya on the next issue.