• Simple AWS
  • Posts
  • Authenticated Access to S3 with AWS Cognito

Authenticated Access to S3 with AWS Cognito

Serving content from S3 only to authenticated users

Continuing our online learning platform that we've been building in our previous posts on microservices design and microservices security, we have three microservices: Course Catalog, Content Delivery, and Progress Tracking. The Content Delivery service is responsible for providing access to course materials such as videos, quizzes, and assignments. These files are stored in Amazon S3, but they are currently publicly accessible. We need to secure access to these files so that only authenticated users of our app can access them.

  • S3: We have a bucket called simple-aws-courses-content where we store the videos of the course. The bucket and all objects are public right now (this is what we're going to fix).

  • API Gateway: Last week we set it up to expose our microservices, including Content Delivery.

  • Cognito: It stores the users, gives us sign up and sign in functionality, and works as an authorizer for API Gateway. We're going to use it as an authorizer for our content as well.

  • CloudFront: A CDN (basically a global cache). It's a good idea to add it to reduce costs and latency, and it's also going to allow us to run our authorization code.

  • Lambda: A regular Lambda function, run at CloudFront's edge locations. Called Lambda@Edge.

Complete flow for getting the content

Securing Access to S3

Here's how this works right now: The user clicks View Content, the frontend sends a request to the Content Delivery endpoint in API Gateway with the auth data, API Gateway calls the Cognito authorizer, Cognito approves, API Gateway forwards the request to the Content Delivery microservice, the Content Delivery microservice reads the S3 URL of the requested video from the DynamoDB table, and returns that URL. The URL is public (which is a problem). Here's how we fix it:

Step 1: Update the IAM permissions for the Content Delivery microservice

  1. Go to the IAM console.

  2. Find the IAM role associated with the Content Delivery microservice (you might need to head over to ECS if you don't remember the name).

  3. Click "Attach policies" and then "Create policy".

  4. Add the content below, to grant permissions to read all objects.

  5. Name the policy something like "SimpleAWSCoursesContentAccess", add a description, and click "Create policy".

  6. Attach the new policy to the IAM role.

  7. Here's the content of the policy:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "cloudfront:ListPublicKeys",
        "cloudfront:GetPublicKey"
      ],
      "Resource": "*"
    }
  ]
}

Step 2: Set up a CloudFront distribution

  1. Open the CloudFront console.

  2. Choose Create distribution.

  3. Under Origin, for Origin domain, choose the S3 bucket simple-aws-courses-content.

  4. Use all the default values.

  5. At the bottom of the page, choose Create distribution.

  6. After CloudFront creates the distribution, the value of the Status column for the distribution changes from In Progress to Deployed. This typically takes a few minutes.

  7. Write down the domain name that CloudFront assigns to your distribution. It looks something like d111111abcdef8.cloudfront.net.

Step 3: Make the S3 bucket not public

  1. Go to the S3 console.

  2. Find the "simple-aws-courses-content" bucket and click on it.

  3. Click on the "Permissions" tab and then on "Block public access".

  4. Turn on the "Block all public access" setting and click "Save".

  5. Remove any existing bucket policies that grant public access.

  6. Add this bucket policy to only allow access from the CloudFront distribution (replace ACCOUNT_ID with your Account ID and DISTRIBUTION_ID with your CloudFront distribution ID):

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
            "Service": "cloudfront.amazonaws.com"
        },
      "Action": "s3:GetObject",
      "Resource": "arn:aws:s3:::simple-aws-courses-content/*",
      "Condition": {
        "StringEquals": {
          "aws:SourceArn": "arn:aws:cloudfront::ACCOUNT_ID:distribution/DISTRIBUTION_ID"
        }
      }
    }
  ]
}

Step 4: Create the CloudFront Origin Access Control

  1. Go to the CloudFront console.

  2. In the navigation pane, choose Origin access.

  3. Choose Create control setting.

  4. On the Create control setting form, do the following:

    1. In the Details pane, enter a Name and a Description for the origin access control.

    2. In the Settings pane, leave the default setting Sign requests (recommended).

  5. Choose S3 from the Origin type dropdown.

  6. Click Create.

  7. After the OAC is created, write down the Name. You'll need this for the following step.

  8. Go back to the CloudFront console.

  9. Choose the distribution that you created earlier, then choose the Origins tab.

  10. Select the S3 origin and click Edit.

  11. From the Origin access control dropdown menu, choose the OAC that you just created.

  12. Click Save changes.

Step 5: Create a Lambda@Edge function for authorization

  1. Go to the Lambda console, click "Create function" and choose "Author from scratch".

  2. Provide a name for the function, such as "CognitoAuthorizationLambda".

  3. Choose the Node.js runtime.

  4. In "Function code", put the code below. You'll need the packages npm install jsonwebtoken jwk-to-pem, and you'll need to replace , , and with the Cognito User Pool's region, ID, and the user group

  5. In "Execution role", create a new IAM Role and attach it a policy with the contents below. You'll need to replace , , and (this last one is the ID of the Cognito user pool)

  6. Click "Create function".

const AWS = require('aws-sdk');
const jwt = require('jsonwebtoken');
const jwkToPem = require('jwk-to-pem');

const cognito = new AWS.CognitoIdentityServiceProvider({ region: '<REGION>' });
const userPoolId = '<USER_POOL_ID>';

let cachedKeys;

const getPublicKeys = async () => {
  if (!cachedKeys) {
    const { Keys } = await cognito.listUserPoolClients({
      UserPoolId: userPoolId
    }).promise();

    cachedKeys = Keys.reduce((agg, current) => {
      const jwk = { kty: current.kty, n: current.n, e: current.e };
      const key = jwkToPem(jwk);
      agg[current.kid] = { instance: current, key };
      return agg;
    }, {});
  }
  return cachedKeys;
};

const isTokenValid = async (token) => {
  try {
    const publicKeys = await getPublicKeys();
    const tokenSections = (token || '').split('.');
    const headerJSON = Buffer.from(tokenSections[0], 'base64').toString('utf8');
    const { kid } = JSON.parse(headerJSON);

    const key = publicKeys[kid];
    if (key === undefined) {
      throw new Error('Claim made for unknown kid');
    }

    const claim = await jwt.verify(token, key.key, { algorithms: ['RS256'] });
    if (claim['cognito:groups'].includes('<USER_GROUP>') && claim.token_use === 'id') {
      return true;
    }
    return false;
  } catch (error) {
    console.error(error);
    return false;
  }
};

exports.handler = async (event) => {
  const request = event.Records[0].cf.request;
  const headers = request.headers;

  if (headers.authorization && headers.authorization[0].value) {
    const token = headers.authorization[0].value.split(' ')[1];
    const isValid = await isTokenValid(token);

    if (isValid) {
      return request;
    }
  }

  // Return a 401 Unauthorized response if the token is not valid
  return {
    status: '401',
    statusDescription: 'Unauthorized',
    body: 'Unauthorized',
    headers: {
      'www-authenticate': [{ key: 'WWW-Authenticate', value: 'Bearer' }],
      'content-type': [{ key: 'Content-Type', value: 'text/plain' }]
    }
  };
};
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "logs:CreateLogGroup",
        "logs:CreateLogStream",
        "logs:PutLogEvents"
      ],
      "Resource": "arn:aws:logs:*:*:*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "cognito-idp:ListUsers",
        "cognito-idp:GetUser"
      ],
      "Resource": "arn:aws:cognito-idp:<REGION>:<ACCOUNT_ID>:userpool/<USER_POOL_ID>"
    }
  ]
}

Step 6: Add the Lambda@Edge function to the CloudFront distribution

  1. In the CloudFront console, select the distribution you created earlier.

  2. Choose the "Behaviors" tab and edit the existing default behavior.

  3. Under "Lambda Function Associations", choose the "Viewer Request" event type.

  4. Enter the ARN of the Lambda function you created in the previous step.

  5. Click "Save changes" to update the behavior.

Step 7: Update the Content Delivery microservice

  1. Modify the Content Delivery microservice to return the CloudFront distribution domain, something like d111111abcdef8.cloudfront.net, for the requested content.

  2. Test and deploy these changes.

Step 8: Update the frontend code

  1. Update the frontend code to include the auth data in the content requests.

Step 9: Test the solution end-to-end

  1. Sign in to the app and go to a course.

  2. Verify that the course content is displayed correctly and that the URLs are pointing to the CloudFront distribution domain (e.g., d111111abcdef8.cloudfront.net).

  3. Test accessing the S3 objects directly using their public URLs to make sure they are no longer accessible.

Solution explanation

  • Update the IAM permissions for the Content Delivery microservice

    So far our solution was using minimum permissions, which means the Content Delivery microservice only had access to what it needed, and not more. Well, now it needs more access! We're adding permissions so it can access the (not-created-yet) CloudFront distribution to generate the pre-signed URLs.

  • Set up a CloudFront distribution

    CloudFront is a CDN, which means it caches content near the user.
    Without CloudFront: User -----------public internet-----------> S3 bucket.
    With CloudFront: User ---public internet---> CloudFront edge location.

    See? It's a shorter path! That's because CloudFront has multiple edge locations all over the world, and the request goes to the one nearest the user. That reduces latency for the user.

  • Make the S3 bucket not public

    With the new solution, users don't need to access the S3 bucket directly. We want everything to go through CloudFront, so we'll remove public access to the bucket.

    This is basically the same thing as what I should have done in last week's issue. We add a new layer on top of an existing resource, now we need to make sure that existing resource is only accessible through that new layer. In this case, the resource is the S3 bucket, and the new layer is the CloudFront distribution.

  • Create the CloudFront Origin Access Control

    In the previous step we restricted access to the S3 bucket. Here we're giving our CloudFront distribution a sort of “identity” that it can use to access the S3 bucket. This way, the S3 service can identify the CloudFront distribution that's trying to access the bucket, and allow the operation.

  • Create a Lambda@Edge function for authorization

    Last week I mentioned the old way of doing Cognito authorization was with a Lambda function. For API Gateway, the much simpler way is with a Cognito authorizer, just like we did. For CloudFront we don't have such luxuries, so we ended up having to write the Lambda function that checks with Cognito whether the auth headers included in the request belong to an authenticated and authorized user.

  • Add the Lambda@Edge function to the CloudFront distribution

    CloudFront can run Lambda functions when content is requested, or when a response is returned. They're called Lambda@Edge because they run at the Edge locations where CloudFront caches the content, so the user doesn't need to wait for the round-trip from the edge location to an AWS region. In this case, the user needs to wait anyways, because our Lambda@Edge accesses the Cognito user pool, which is in our region. There's no way around this that I know of.

  • Update the Content Delivery microservice

    Before, our Content Delivery microservice only returned the public URLs of the S3 objects. Now, it needs to return the URL of the CloudFront distribution.

    I'm glossing over the details of how Content Delivery knows which object to access. Heck, I'm not even sure why we need the Content Delivery microservice. But let's stay in focus.

  • Update the frontend code

    Before, we were just requesting a public URL, much like after we first defined our microservices. Now, we need to include the auth data in the request, just like we did in last week's issue.

  • Test the solution end-to-end

    Don't just test that it works. Understand how it can fail, and test that. In this case, a failure would be for non-users to be able to access the content, which is the very thing we're trying to fix.

Discussion

For the past weeks we've been dealing with the same scenario, focusing on different aspects. We designed our microservices, secured them, and secured the content. We found problems, we fixed them. We made mistakes (well, I did), we fixed them.

This is how cloud architecture is designed and built.

First we understand what's going on, and what needs to happen. That's why I lead every issue with the scenario. Otherwise we end up building a solution that's looking for a problem (which is as common as it is bad, unfortunately).

We don't do it from scratch though. There are styles and patterns. We consider them all, evaluate the advantages and tradeoffs, and decide on one or a few. In this case, we decided that microservices would be the best way to build this (rather arbitrarily, just because I wanted to talk about them).

Then we look for problems. Potential functionality issues, unintended consequences, security issues, etc. We analyze the limits of our architecture (for example scaling) and how it behaves at those limits (for example scaling speed). If we want to make our architecture as complete as possible, we'll look to expand those limits, ensure the system can scale as fast as possible, all attack vectors are covered, etc. That's overengineering.

If you want to make your architecture as simple as possible, you need to understand today's issues and remove things (instead of adding them) until your architecture only solves those. Keep in mind tomorrow's issues, consider the probability of them happening and the cost of preparing for them, and choose between: Solving for them now, making the architecture easy to change so you can solve them easily in the future, or not doing anything. For most issues, you don't do anything. For the rest, you make your architecture easy to change (which is part of evolutionary architectures).

I think you can guess which one I prefer. Believe it or not, it's actually harder to make it simpler. The easiest approach to architecture is to pile up stuff and let someone else live with that (high costs, high complexity, bad developer experience, etc).

Best Practices for Securing Access to S3

Operational Excellence

  • Monitoring: Metrics, metrics everywhere… CloudFront shows you a metrics dashboard for your distribution (powered by CloudWatch), and you can check CloudWatch itself as well.

  • Monitor the Lambda@Edge function: It's still a Lambda function, all the @Edge does it say that it runs at CloudFront's edge locations. Treat it as another piece of code, and monitor it accordingly.

Security

  • Enable HTTPS: You can configure the CloudFront distribution to use HTTPS. Use Certificate Manager to get a TLS certificate for your domain, and apply it directly to the CloudFront distribution.

  • Make the S3 bucket private: We already talked about this, but it's worth repeating. If you add a new layer on top of an existing resource, make sure malicious actors can't access that resource directly. In this case, it's not just making it private, but also setting a bucket policy so only the CloudFront distribution can access it (remember not to trust anything, even if it's running inside your AWS account).

  • Set up WAF in CloudFront: Just like with API gateway, we can also set up a WAF Web ACL associated with our CloudFront distribution, to protect from common exploits.

Reliability

  • Configure CloudFront error pages: You can customize error pages in the CloudFront distribution, so users have a better user experience when encountering errors. Not actually relevant to our solution, since CloudFront is only accessed by our JavaScript code, but I thought I should mention it.

  • Enable versioning on the S3 bucket: With versioning enabled, S3 stores old versions of the content. That way you're protected in case you accidentally push the wrong version of a video, or accidentally delete the video from the S3 bucket.

Performance Efficiency

  • Optimize content caching: We sticked with the defaults, but you should configure the cache behavior settings in your CloudFront distribution to balance between content freshness and the reduced latency of cached content. Consider how often content changes (not often, in this case!).

  • Compress content: Enable automatic content compression in CloudFront to reduce the size of the content served to users, which can help improve performance and reduce data transfer costs.

Cost Optimization

  • Configure CloudFront price class: CloudFront offers several price classes, based on the geographic locations of the edge locations it uses. Figure out where are your users, and choose the price class accordingly. Don't pick global “just in case”, remember that any user can access any location, it's just going to take them 200 ms longer. Analyze the tradeoffs of cost vs user experience for those users.

  • Enable CloudFront access logs: Enable CloudFront access logs to analyze usage patterns and identify opportunities for cost optimization, such as adjusting cache settings or updating the CloudFront price class. Basically, if you want to make a data-driven decision, this gets you the data you need.

  • Consider S3 storage classes and lifecycle policies: Storage classes make it cheaper to store infrequently accessed content (and more expensive to serve, but since it's infrequently accessed, you end up saving money). Lifecycle policies are rules to automate transitioning objects to storage classes. If you're creating courses and you find nobody's accessing your course, I'd rather put these efforts into marketing the course, or retire it if it's too old. But I figured this was a good opportunity to mention storage classes and lifecycle policies.

There's been a few updates to the Well-Architected Framework that you might want to check out.

Want to know what I do for fun? I read AWS Reference Architectures, then write this newsletter.

Did you like this issue?

Login or Subscribe to participate in polls.

Join the conversation

or to participate.