- Simple AWS
- Posts
- Self-healing, Single-instance Environment with AWS EC2
Self-healing, Single-instance Environment with AWS EC2
How to add resiliency to an EC2 instance by making it self-healing, keeping the same public IP address
You've got a simple environment with 1 EC2 instance. Either you don't need it to scale (e.g. it's a dev environment) or you can't scale it right now (the instance isn't stateless, meaning you're saving data in the instance's EBS volumes, and you don't want to fix that right now). When it fails, you want it to fix itself automatically, but you don't want to pay for a load balancer.
We're going to use the following AWS services:
EC2: To create instances.
EC2 Auto Scaling: To create an Auto Scaling Group. You can set a minimum and maximum number of instances, and it creates and destroys instances to fit those numbers. You can also set metrics such as average CPU usage, based on which the min and max instances are modified (which is where the scaling part comes in).
Elastic IP: Just a static IP address that exists separate from any EC2 instances (meaning it's not created or destroyed with an instance). It can be attached to an instance, and moved to another one at will.
Architecture diagram of a self-healing EC2 instance
How to Set Up a Self-Healing EC2 Instance
Step 1: Allocate an Elastic IP address.
Step 2: Create an IAM instance profile
Create an IAM instance profile with an IAM Role that allows the instance to associate that Elastic IP address. Here's the permissions policy (replace ${Region}
, ${Account}
, ${AllocationId}
with your AWS Region, AWS Account ID and the Allocation ID of your Elastic IP address):
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AssociateElasticIpAddress",
"Effect": "Allow",
"Action": "ec2:AssociateAddress",
"Resource": "arn:aws:ec2:${Region}:${Account}:elastic-ip/${AllocationId}"
}
]
}
And here's the trust policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "ec2.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}
Step 3: Create an EC2 Launch Template
Create a Launch Template using the instance profile created in Step 2. In the User Data section (the script that runs when an instance launches) you add a command to associate the Elastic IP address with that instance. Like this (you'll need to replace ${AWS::Region}
with your region and ${EIP.AllocationId}
with the AllocationId of the Elastic IP address):
#!/bin/bash -ex
INSTANCEID=$(curl -s -m 60 http://169.254.169.254/latest/meta-data/instance-id)
aws --region ${AWS::Region} ec2 associate-address --instance-id $INSTANCEID --allocation-id ${EIP.AllocationId}
Step 4: Create an EC2 Auto Scaling Group
Create an Auto Scaling Group with min 1 instance and max 1 instance, and associate the Launch Template.
The Auto Scaling Group will detect there are 0 instances and will create a new instance (to match the minimum of 1 instance), and when that instance starts it will associate the Elastic IP address to itself. When the instance fails, the Auto Scaling Group detects there are 0 healthy instances and repeats the process.
Step 5: Point your DNS record to the Elastic IP address
Why use an EC2 Self-Healing Instance
Let's use a bit of role play for this explanation. Questions will be in italics, and my answers in regular font.
Q: First of all, why are we even discussing this? Can't we just put a load balancer there and be done with it? $22/month is not that expensive!
A: Indeed, it's not that expensive. And an Application Load Balancer has other benefits, such as easily handling the SSL certificate or integrating with WAF. However, for an app to scale horizontally you need to remove the state from it. Any data that needs to be shared across instances is part of the state of the application. This includes configs, shared files, databases and session data. For session data you can use sticky sessions (all requests for the same session go to the same instance), but the rest needs to be moved to a separate storage (S3, EFS, DynamoDB, RDS, etc).
Q: Ok, so I just design my compute layer to be stateless! That's easy.
A: Yes, it is! And you should have done that in the first place! Unfortunately, not everyone does that. And if you didn't get it right from the start, changing that later is a lot of work. Still worth it, and you should still do it! But if you need a self-healing environment right now, this is the solution (while you work on removing the state from your app).
Q: You also said something about dev environments. Shouldn't a dev environment be identical to a prod environment?
A: We usually call that environment staging. Dev is usually cheap and dirty. But you still don't want it to fail randomly, since dev hours spent fixing a dev environment still cost you money. This is a good solution for a self-healing dev environment.
Best Practices for a Self-Healing EC2 Instance
If you're in this situation, the best thing you can do is just remove the state from your app and make it horizontally scalable. I'll keep the tips focused on this solution though, because I think it's a pretty creative solution that can be useful in certain situations.
Operational Excellence
Use Session Manager: Session Manager lets you SSH into EC2 instances in a much more secure way than with SSH keys. Here's how to set it up.
Mind the service quotas: There's a default limit of 5 Elastic IP addresses per region per account. FYI, NAT Gateways consume 1 Elastic IP address from that quota. You can increase this limit if you want, but AWS takes a couple of weeks to do it, so make sure you keep track of your IP usage and request the increase well in advance.
Store configurations in Systems Manager Parameter Store: If you have configuration values, there's 3 ways to set them: hard code them in the code (bad idea), write them in a file in the instance (bad idea because when the instance fails the new instance won't have them), and set them up in a separate storage where the instance can read them from. Systems Manager Parameter Store is that separate storage.
Send logs to CloudWatch Logs: Logs stored in the instance will be lost when the instance fails. Instead, set up the CloudWatch Agent to send logs to CloudWatch Logs, so you can view them later.
Security
Use Session Manager: Yeah, this is a repeat from the one above. Session Manager is also more secure than an ssh key pair, so I wanted to add it here as well.
Use HTTPS: If I want HTTPS (which I do, always), I normally set up an SSL certificate in the Application Load Balancer. I can't do that here, because there is no load balancer. But we should still use secure connections! A way to do this is to set up an Nginx reverse proxy in the instance. Keep in mind that if your SSL certificate is only inside the instance, a new instance will need to recreate it.
Reliability
Use a secondary EBS volume: If you store everything in the root EBS volume, you can lose that data when the instance fails. Instead, use a secondary EBS volume with all your data. In the User Data section of the Launch Template you can add a line to automatically associate the EBS volume with a new instance.
Performance Efficiency
Pick the right EBS volume type: If you're using a single EC2 instance for your prod environment, you're likely relying on EBS. Read the basics and best practices of Amazon EBS.
Cost Optimization
Use Savings Plans: You can share them across instances, so if an instance fails and a new one is spun up, the new one gets the benefit now.
Turn off your dev env at night: After office hours, you don't need to keep paying for your EC2 instance. Set the min and max instance number to 0 when your team shuts off for the night, and back to 1 when they begin work the next day. There's many ways to do this, such as a Lambda function triggered by EventBridge. Note that while the Elastic IP address is not associated with a running EC2 instance, you'll be charged $0.005/hour (that's $3.60/month).
Recommended Tools and Resources
I'm looking to get the Security Specialty cert, and this is the course I'm using to prepare for it. There's courses for other certs as well.
Did you like this issue? |
Reply