Kubernetes on AWS - Basics and Best Practices
What is Kubernetes, how to set it up on AWS, and best practices for managing services on Kubernetes
Welcome to Simple AWS! Simple solutions on AWS, and how to scale and secure them with best practices. This is issue #14. Shall we?
Use case: Containerized microservices on Kubernetes on AWS
You're working on a new app, and you've decided to break it down into multiple services (could be microservices or regular services). This way, each service can scale independently, and you can deploy each one separately. However, you're worried that deploying, communicating and scaling each service separately will be a lot of work.
Enter Kubernetes. Kubernetes is a container orchestrator (like ECS), which means you define your services and the resources they need, and it will manage deploying them, scaling them, communication (networking, security) and management (logging, monitoring, etc). You focus on writing the code for your services, Kubernetes handles the ops side of things.
EKS (Elastic Kubernetes Service): This is a fully managed Kubernetes service by AWS. Setting up a k8s cluster is actually not trivial. EKS gives you a cluster already set up, and connects to all the other AWS services needed (IAM for permissions, EC2/Fargate for capacity, etc). $72/month, but well worth it.
EC2: You know this one. Why is it here? EC2 is the infrastructure that runs your containers. You create EC2 instances and tell your EKS cluster to put your containers there.
Fargate: It's the serverless way of running containers on EKS. You forget about instances and just pay per use. More expensive per resource, but allows faster scaling, and it's much simpler to start.
First, design your services. There's way too much to say about this, but for now let's just say that you should first know whether you want a monolith o services, and then understand what each service does. (the following steps don't actually need this, but a real use case does)
Then, code and Dockerize your services. (the following steps don't actually need this, but a real use case does)
Next, we'll set up some tools (these commands are for bash):
AWS CLI: sudo curl --silent --location -o "awscliv2.zip" "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" && sudo unzip awscliv2.zip && sudo ./aws/install
Check it with aws --version
Kubectl: sudo curl -o /usr/local/bin/kubectl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl" && sudo chmod +x /usr/local/bin/kubectl
Check it with kubectl version
eksctl: curl --silent --location "https://github.com/weaveworks/eksctl/releases/latest/download/eksctl_$(uname -s)_amd64.tar.gz" | tar xz -C /tmp && sudo mv -v /tmp/eksctl /usr/local/bin
Check it with eksctl version
Now we'll use eksctl to create the cluster:
eksctl create cluster --version=1.20 --name=simpleaws --nodes=2 --managed --region=us-east-1 --node-type t3.medium --asg-access
Wait 15-20 minutes and check it with kubectl get nodes
Deploy an app: curl https://raw.githubusercontent.com/aws-containers/ecsdemo-nodejs/main/kubernetes/deployment.yaml && kubectl apply -f deployment.yaml
Expose it through a service: curl https://raw.githubusercontent.com/aws-containers/ecsdemo-nodejs/main/kubernetes/service.yaml && kubectl apply -f service.yaml
Clean up! (Don't forget this)
What did we just do??? Here's an explanation:
We decided how we want to structure our code. (again, not needed for this step by step, but you'll need to do this for your app)
We wrote some Dockerfiles. (again, not needed for this step by step, but you'll need to do this for your app)
We installed some tools:
We used a command from eksctl to create an EKS cluster. After this command (which should take 15 or 20 minutes to complete) you'll be able to use kubectl to get the nodes, and if you log in to the AWS Console and go to the EKS service you'll see a cluster created there. So far we have a cluster but no services (it's a bit like having an EC2 instance up and running but not having installed any app there).
We took a sample app from here, downloaded the file with the curl command, and deployed that app to our cluster with the kubectl apply command. These YAML files specify what app will be deployed (in this case the Docker image brentley/ecsdemo-nodejs:latest) and how (how many replicas, which ports are used, the update strategy, etc). These YAML files should remind you a bit of Docker Compose, though they're more complex because Kubernetes is more powerful. You'll need to write your own YAMLs for your app.
We did the same as in step 5, but for a Service. Deployments in Kubernetes are not automatically exposed to the internet, you need to create a Service for that. Think of it like a Load Balancer to expose all the instances of that service (that's actually one of the ways to achieve it, but not the only one). You'll need to write your own YAMLs for your app.
We cleaned up, because this is just a sample and we don't want to be paying over $70/month because we forgot we deployed this.
These are the basic building blocks of Kubernetes:
Cluster: It's a collection of worker nodes running a container orchestration system, in this case, Kubernetes. The cluster is managed by the control plane, which includes components like the API server, controller manager, and etcd.
Pod: A pod is the smallest and simplest unit in the Kubernetes object model. It represents a single instance of a running process in your cluster. Pods contain one or more containers, storage resources, and a unique network IP.
Deployment: A deployment is an object in Kubernetes that manages a replicated application. It ensures that a specified number of replicas of your application are running at any given time. Deployments are responsible for creating, updating, and rolling back pods.
Service: A service is a logical abstraction on top of one or more pods. It defines a policy by which pods can be accessed, either internally within the cluster or externally. Services can be accessed by a ClusterIP, NodePort, LoadBalancer, or ExternalName.
As you see, Kubernetes is actually pretty complex. That's why I typically promote ECS, it's simpler. However, Kubernetes has some clear advantages. You should decide whether they're worth the extra complexity, for every particular use case. The most important benefits:
Cloud Agnostic: Kubernetes basically runs on anything. You can use a managed service like AWS EKS, Azure AKS, GCP GKE (which is actually the best of all three, and it's free), set it up manually on any kind of VMs, on bare metal servers, your own laptop, and even on a raspberry pi. It removes vendor lock in from any cloud provider (which is not such a big deal), and it lets you run your own servers with a really fantastic platform.
Open Source: Kubernetes is published under the Apache License 2.0, which allows anyone to make changes and use it commercially for free. It's actively maintained by the Cloud Native Computing Foundation (Google actually built the first versions, and donated the project).
Very mature: Kubernetes was released in 2014, and has grown really quickly. Nowadays it has been widely used in small to enormous production loads for several years. Basically, we're really sure that it works really well.
REALLY powerful: Seriously, there's so many things that you can do with Kubernetes, and a ton of tools around it. That level of flexibility requires a similar level of complexity though. In my experience, for most projects it's not worth the added complexity. However, when it is worth, it's undeniably the best solution out there, and I don't see that changing any time soon.
In my experience, the bestest best practice is deciding whether you actually need Kubernetes or not. If you do, pay attention to the best practices that follow. Keep in mind though that this is NOT an exhaustive list.
Define everything as code: Everything in Kubernetes is some form of configuration, from what app to deploy to how it's exposed. Everything can be written in YAML files, and it should be (and added to version control). Same logic as with Infrastructure as Code.
Use CI/CD: Once you have everything in YAMLs, you could just use kubectl apply -f yourfile.yaml. Don't do that (at least not for prod). Merge to prod or main and have a pipeline that runs that same command. That way you always know what version is deployed.
Set up a DNS Server: Kubernetes only knows about the services that are already running. So, when you deploy a pod which needs to access a service, kubernetes can only inject the service's address if the service is already running. That means you'll first need to deploy all services, then all pods. Or you can set up a DNS Server as a much easier way to handle this.
Consider a Service Mesh: Containers need to solve 3 things: Whatever they're meant to do (this is your code), where they'll live (this is what Kubernetes solves) and how they'll talk to each other (you can solve this through a lot of config). If communication between services is complex, a Service Mesh can solve traffic management, resiliency, policy, security, strong identity, and observability. That way, your app is decoupled from these operational problems and the service mesh moves them out of the application layer, and down to the infrastructure layer. For AWS you can use App Mesh, or check this list.
Use Labels: If you're using Kubernetes, you'll have multiple services, deployments, pods, volumes, volume claims, ingresses, etc. That's a lot to manage. Label everything, so you know at a glance what any component is doing.
Use Ingress Controllers: Don't set up a Load Balancer for every service, instead use services for internal visibility, and expose them with an Ingress Controller.
Use namespaces: Group up resources in namespaces, to have a better organization.
Set up logging and monitoring: You can use CloudWatch Logs, set up Prometheus, ElasticSearch + LogStash + Kibana, or other options.
Use an artifact repository: You're already using a Docker registry for your images. Use Helm for your deployments, services, and configs.
Test and Scan Container Images: Before deploying to prod, you can run security tools to scan container images for common vulnerabilities and exploits, and run tests.
Run pods with minimal permissions: Pods can run as privileged (more permissions) or restricted (less permissions). Run them as restricted when you can.
Use Secrets: If something is supposed to be secret (such as an API key), store it as a Kubernetes Secret.
Limit user permissions: Not everyone needs to be a Kubernetes admin. Use Role-Based Access Controls to restrict what each user can do.
Do Blue-Green Deployments: Blue-Green deployments is a deployment strategy where you deploy a new version without destroying the old one, send traffic to the new version, and once you're sure (with real data) that the new version works, only then you destroy the old version.
Always use Deployments or ReplicaSets: You can launch pods on their own, or launch a Deployment or ReplicaSet, which will in turn create the pods. Pods launched on their own are not recreated on another node if the node fails. So if you need a pod, just launch a deployment with 1 pod.
Use a Network Plugin: Kubernetes is not aware of Availability Zones. Set up a Network Plugin for that.
Scale the cluster: Kubernetes scales the number of pods. The Cluster Autoscaler scales the number of nodes. Check out Karpenter as an autoscaler.
Use Savings Plans: You're paying for the EKS cluster (which is the Kubernetes control plane) and for the capacity that you're using (either EC2 or Fargate). Set up Savings Plans for that capacity.
Securing a Kubernetes cluster is really hard. Lightspin's EKS Creation Engine gives you an (opinionated) really secure EKS cluster.
The best way I know to learn Kubernetes is the EKS workshop. When you get to the monitoring part, check out the App Mesh workshop. And compare it to the ECS workshop to see what I mean about Kubernetes being more complex.
If you're just building an MVP, don't use Kubernetes. Instead, use Namecheap to buy a domain, Canva to design stuff and Bubble to build a no-code app. Once you've got paying users, then we can talk about AWS and maybe Kubernetes. <-- This recommendation contains affiliate links.
Want to get AWS Certified? Check out Adrian Cantrill's courses. With their mix of theory and practice, they're the best I've seen. <-- This recommendation contains affiliate links.
Some of the above resources are paid promotions or contain affiliate links. I only recommend resources I've tried for myself and found actually useful, regardless of whether I get paid for it or not.
You asked for Kubernetes. I hope I delivered! If you think it's too complex, so do I. But it's good to know what we're talking about.
Thank you for reading! See ya on the next issue.