- Simple AWS
- Posts
- Amazon S3 Advanced Features
Amazon S3 Advanced Features
Amazon S3 is an object storage service by AWS. S3 stands for Amazon Simple Storage Service, and its core use case is storing a practically unlimited number amount of data with high availability and durability, and allowing retrieval with high performance in constant time. The key aspect for that is storage classes, which we discussed in the last article. But as time went by, the S3 team added more and more features that support the core use case. In this article we'll explore the advanced features of Amazon S3.
S3 Object Versioning
Without object versioning, the name of the object is its unique identifier. If you write a new object with the same name, the old one is overwritten and lost, unless you rename (i.e. download and re-upload) the old one as object_old or something like that.
Object Versioning removes the risk of overwriting, and the need for an _old suffix. For every file you upload, a new version of the object is created, with its own version id and its own object metadata. Each object has multiple versions, and one of them has the latest
pointer. If you GET
an object without specifying the version you get the latest version, but you can also specify an older version to the GET
operation. Moreover, when you delete a version it's not actually deleted by default, but instead a Delete Marker is added to it (you can really delete it if you want).
How Amazon S3 Object Versioning works.
The most important use case for object versioning is allowing updates to an object while limiting deleting data. You can't overwrite a version, so even if someone with write access uploads a new version and makes it the default, the old version is still there. All you need to do is Deny delete operations, and your data is safe from being deleted, but you're still able to push changes.
Object Versioning is enabled at the bucket level, and will be enabled for all objects within the bucket. You can't do half and half. But you can split your objects across multiple buckets. They're free, after all.
Amazon S3 Replication Rules
The naive way to store data in two buckets is to, well, write it to both buckets. The problem with this is that it makes your application dependent on implementation details it shouldn't need to know about, such as your cloud infrastructure being multi-region or you needing to copy data to a separate account.
Of course, you could build an Event-Driven Application that's triggered every time an object is uploaded to S3 (here's an example). It's not that complex, and it solves your problem. Good news though: That's exactly what AWS does for you behind the scenes with Replication Rules, and it's much easier than building it yourself.
Replication Rules can be configured to automatically copy objects (with optional filters) from one bucket to another one. They're common when you need to copy your data to another AWS region as part of a Disaster Recovery strategy. Additionally, you can create a Replication job to replicate existing objects (useful when setting up said DR strategy), and you can do batch replication.
Static Website Hosting with Amazon S3
Static websites are websites where no processing is needed, and the content doesn't change (in the backend at least) based on any aspect of the user. That typically means serving a few HTML, CSS and JS files, but any application using any modern frontend framework like React is actually a static website (so long as it's doing Client Side Rendering, if you want to do Server Side Rendering check out AWS Amplify).
So, if it's just serving files, why not do it through S3? There are a few extra technical aspects involved, which I won't dive into. But if you enable Static Website Hosting, S3 solves those few technical things, and you can indeed serve your entire website from S3. You'll need to do this if you're not using Amazon CloudFront (AWS's CDN, which serves content from edge locations). Tip: If you do use CloudFront, don't enable this, since it doesn't work with Origin Access Control.
Note: Dynamic websites (i.e. non-static ones) need some form of backend. Typical examples are old technologies like PHP (yes, I'm bashing on PHP!), JSP and JSF, ASP, etc. Additionally, if you want your backend to implement any behavior you'll need a backend for that, even with newer technologies that aren't as bad as PHP is.
Tags should be self explanatory: Key-values that you add to an object to make management easier. You can filter objects by tags, both to view them and for processing with other tools like Replication Rules. You can even add cost-allocation tags, which will work as filters in your billing dashboard and Cost and Usage Report.
Amazon S3 Multi-Region Access Points
If you build a multi-region architecture, for example for Disaster Recovery, you'll end up with at least two S3 buckets with the same data. Ideally you want your infrastructure in region A to read from the bucket in region A, and your infrastructure in region B to read from the one in region B, to avoid cross-region data charges and increased latency. But setting that up means the same app in different regions needs different configuration values. And you also need to configure the other bucket as a backup, in case S3 fails in one region (after all, that's what DR is about).
Multi-Region Access Points solve that problem. You configure a global access point that serves as the single point of access for all applications in all regions. Then in the access point you set up both your buckets, and have it automatically route operations to the appropriate bucket depending on what region the request is coming from. It also supports failover to another region if the preferred region is down. It's a bit like a DNS in that regard.
Security Overview in Amazon S3
Alright, let's talk about security. I'll give you a rundown of what security looks like and all the stuff you can do to protect your data. But first, a couple of tips:
At the very least set up a bucket policy
If you're using CloudFront, set up Origin Access Control
If you're accessing the bucket from a VPC, set up a VPC Endpoint
Keep in mind how you can lose your data, including someone stealing/encrypting it and asking for a ransom.
Encryption in Amazon S3
By default, all objects in S3 are encrypted server-side with an encryption key managed by S3, for data protection. That has been going since January 5, 2023. Objects uploaded before that are not encrypted, but you can encrypt them by re-uploading them. Most data is perfectly fine with this, and if you need more you're either dealing with compliance and security audits for highly confidential data, or you really know what you're doing in regards with security.
But just in case you should really know what you're doing and you don't, here are the other options:
Amazon S3 managed keys (SSE-S3): This is the default. S3 manages an AES-256 root key, and uses it to create a unique encryption key per object, which it then uses to encrypt and decrypt the object. This is transparent to you, doesn't impact performance, and can't be disabled.
AWS Key Management Service (AWS KMS) keys (SSE-KMS): Essentially the same as the above, but instead of S3 using a key that it manages, it uses a key that you manage in KMS. This allows you to configure separate keys, set up access policies for them, and log uses in CloudTrail.
Dual-layer encryption with AWS Key Management Service (AWS KMS) keys (DSSE-KMS): Same as above, but instead of encrypting the objects once (one layer), they're encrypted twice (two layers). Security experts agree this is more secure.
Customer-provided keys (SSE-C): The least managed option. You provide the keys, S3 just uses them to encrypt and decrypt.
Additionally, you can do Client-Side Encryption, which means you encrypt stuff on your app (which means writing your own code for this), and then store the already encrypted object (which will then get encrypted again server-side). Retrieving the object will remove the server-side encryption and give you the client-side encrypted object, which your application will need to decrypt using custom code.
The type of encryption is per object, and it's specified in the PutObject request. You can enforce some specific types of encryption by only Allowing PutObject operations that contain the desired types of encryption. This next topic is how you achieve that.
Amazon S3 Bucket Policies
Bucket policies are IAM Resource Policies that restrict permissions on the bucket itself. The most important thing they can do, which is impossible to do with an IAM Policy associated with an IAM User or Role, is to control what unauthenticated users can do. A Bucket policy can be used to make an S3 bucket public, allowing read and even write operations from unauthenticated users (writes is a bad idea, obviously).
Additionally, it's a great idea to use them to control what users with general access to S3 can do on that specific bucket. For example, if you want to Deny delete operations on 2 out of 4 buckets, it's easier to set that up as a bucket policy than to review all policies associated with IAM Users to make sure they contain that clause. In the end the statements will end up being combined, as per the evaluation rules of IAM Policies, but implementing bucket-specific statements in bucket policies means you're doing access management where the data lives, making it much easier to maintain for you.
An example of a bucket policy is to force a specific type of encryption:
{
"Version": "2012-10-17",
"Id": "PutObjPolicy",
"Statement": [
{
"Sid": "DenyIncorrectEncryptionHeader",
"Effect": "Deny",
"Principal": "*",
"Action": "s3:PutObject",
"Resource": "arn:aws:s3:::<bucket_name>/*",
"Condition": {
"StringNotEquals": {
"s3:x-amz-server-side-encryption": "AES256"
}
}
},
{
"Sid": "DenyUnEncryptedObjectUploads",
"Effect": "Deny",
"Principal": "*",
"Action": "s3:PutObject",
"Resource": "arn:aws:s3:::<bucket_name>/*",
"Condition": {
"Null": {
"s3:x-amz-server-side-encryption": true
}
}
}
]
}
Amazon S3 VPC Endpoints
When you connect to the public endpoint of S3, you're connecting to the S3 API over the public internet. Every "public" AWS service (i.e. those that don't reside in one of your VPCs) have these public endpoints, which can be accessed from the public internet. Security features like bucket policies still apply, so this doesn't mean all your data is public. What it means is that your requests will be routes through the public internet instead of AWS's internal network, making them slower and accruing data transfer charges (remember that you pay for data transferred to the internet, but not to other AWS services).
VPC Endpoints are essentially a private IP address within your VPC (provided via AWS PrivateLink), which is connected to the S3 service endpoint directly, via the AWS internal network. This way you can use a private IP address to communicate with the Amazon S3 service, which serves three important purposes:
Reduces network latency by taking a more direct network route
Reduces security risks by avoiding an untrusted network like the public internet
Removes the need for internet connectivity, public IPs and NAT Gateways
VPC Endpoints is a feature of the Amazon VPC service, and they can be used to access any public AWS service. You need to set up one VPC endpoint per service. In this article I'm just focusing on S3, but here's a complete workshop on how to set up a VPC Endpoint for S3, including Endpoint Policies.
VPC Endpoint Policies for Amazon S3
Just like there are policies for buckets, there are policies for endpoints. VPC Endpoint Policies allow you to control what a VPC Endpoint can be used for. By default, a VPC Endpoint doesn't have a policy, and if it allows access to S3, then it allows access to all S3 buckets and all S3 features. It's common for a specific workload to only need access to some S3 buckets, and that's what Endpoint Policies are for: You can restrict what operations are allowed via the VPC Endpoint, based on the actions and/or the resources. That way, you can allow only some operations, and/or allow access to only some S3 buckets. Of course, S3 Bucket Policies will still apply.
In this sense, VPC Endpoints don't introduce this excessive privilege. If you didn't have one and were accessing the S3 public endpoint, your EC2 instances could still access all buckets. What they do is provide a way to restrict this, via VPC Endpoint Policies.
By the way, you can use an S3 Bucket Policy to restrict access to that bucket from only a list of specific VPC Endpoints.
Multi-Factor Authentication (MFA) Delete
Multi-Factor Authentication means adding another factor other than your password. Typically this means a physical device like your phone (proven via an Authenticator app) or a USB device, but it can also be biometric data like your fingerprint. It's common and highly recommended to use Multi-Factor Authentication for any login that would cause you harm if it fell in the wrong hands, like your AWS account, email account, and practically everything.
The idea behind MFA Delete in S3 is that a Delete operation is highly destructive: You can't recover the data after it has been deleted! So, even if you already required MFA to authenticate to your AWS account, you can require that the second factor of authentication is provided again. That way, if someone gains access to your unlocked computer (which they shouldn't! But it may happen), the AWS management console is open, and your AWS credentials haven't expired yet (this is why they typically expire quickly), then they can read stuff, but not delete it without providing that MFA.
Conclusion
At its core, S3 is a cloud storage service. But there's a lot going on around storing objects, and with time S3 grew into a pretty big service with a lot of storage management features. They're all useful in some way, and they all do a great job of supporting the core use case of storing data.
In this article we went over the most important advanced features of S3. I believe you should be aware of all of these, and what they do. Realistically, you won't need to use them all, and even if they're a good idea, they might not be your highest priority. But being aware of them gives you a lot more tools to build great solutions on AWS with best practices, focusing on security and operational excellence. Additionally, understanding storage classes and pricing will help you a lot with cost optimization.
What Security Features Should I Focus On For Amazon S3?
The short answer is: As many as you can afford to. But if you're looking for a place to start, start with Bucket Policies. They're often way too permissive, and they're easy to tighten since those permissions change very rarely.
Did you like this issue? |
Reply