DynamoDB Database Design in 2025

DynamoDB is not a relational database, and you can't design for it like one. Hopefully this isn't news to you, it must be the 12th time I say it. But I also understand this isn't super useful: “Don't design like a relational DB” doesn't really tell you what to do.

In this guide, we're going deep into DynamoDB data modeling. We'll cover how to think about your data, how to structure it for the access patterns your application actually needs, and how recent features (like multi-attribute composite keys in GSIs and the updated guidance on single-table vs. multi-table design) can make your life easier. This isn't a "Getting Started with DynamoDB" tutorial, I'll be assuming you already know the basics. We're here to learn how to design your data properly.

Why Database Design Matters More in DynamoDB

With Postgres or MySQL, you can design your schema, normalize it, and then figure out your queries later. Sure, you might need to add an index here and there, but the database will figure out how to execute your query. It might not be fast, but it'll work, and you can almost always optimize later.

DynamoDB doesn't work that way. There's no query optimizer that figures out how to satisfy your request. Either your data is structured to answer a specific question efficiently, or it's not. And if it's not, you're paying for a full table scan, which is the difference between milliseconds and minutes (and between reasonable costs and a terrifying AWS bill).

So here's the fundamental difference: in relational databases, you design your schema first and figure out queries later. In DynamoDB, you design your schema around the queries you need to run. Access patterns come first, schema comes second.

Understanding the Basics of DynamoDB

Before we get into the design process, let's make sure we're on the same page about how DynamoDB stores and retrieves data. (If you already know this, feel free to skip ahead.)

Partition Keys and Sort Keys

Every item in DynamoDB needs a primary key. This can be either:

  • Simple primary key: Just a partition key (PK). Each item must have a unique partition key value.

  • Composite primary key: A partition key (PK) plus a sort key (SK). The combination must be unique.

The partition key determines which physical partition your data lives on. DynamoDB hashes the partition key and uses that hash to route your request to the right server. This is how DynamoDB scales horizontally: it adds more partitions as your data grows. (You don't see this happening, which is kind of the whole point of a managed service.)

The sort key, when present, determines the order of items within a partition. Items with the same partition key are stored together (this is called an item collection), sorted by the sort key. This co-location is incredibly useful because it lets you retrieve multiple related items with a single query.

Global Secondary Indexes (GSIs) and Local Secondary Indexes (LSIs)

Sometimes you need to query your data in ways that your primary key doesn't support. That's where indexes come in.

Local Secondary Indexes use the same partition key as your table but a different sort key. They must be created when you create the table, you can't add them later, and they share the partition's throughput capacity. LSIs are useful when you need an alternative sort order within the same partition.

Global Secondary Indexes can have a completely different partition key and sort key from the base table, and you can create or delete them at any time. GSIs have their own throughput capacity, separate from the base table. They're more flexible but also more expensive: you're essentially duplicating your data into a new structure (and you're paying for the write operations to do that).

Here's a recent development worth knowing: as of November 2025, DynamoDB supports multi-attribute composite keys in GSIs. Instead of concatenating values into synthetic keys like TOURNAMENT#WINTER2024#REGION#NA-EAST, you can now use up to four attributes each for the partition key and sort key. This makes your data model cleaner and eliminates the need to backfill synthetic keys when adding new indexes. More on this later.

The Access Patterns Approach

Here's how you should approach DynamoDB design:

  1. Identify your entities. What are the "things" your application deals with? Customers, orders, products, invoices, etc.

  2. Map the relationships. How do these entities relate to each other? One-to-many? Many-to-many?

  3. List your access patterns. This is the critical step. What questions does your application need to answer? Be specific. Not "get orders" but "get all orders for a customer in the last 30 days."

  4. Design your keys to satisfy those patterns. This is where the actual schema design happens.

  5. Add indexes for patterns your primary key can't handle. But only if you actually need them.

If you want to visualize your design before writing code, NoSQL Workbench (free tool created by AWS) lets you model your table and test queries against sample data. I find it invaluable for validating designs before committing to them.

Let's work through a real example. We're building an e-commerce application. Here are our entities:

  • Customers: People who buy stuff

  • Products: Stuff people buy

  • Orders: When a customer buys stuff

  • Order Items: The individual products within an order

  • Invoices: Payment records for orders

And here are our access patterns:

  1. Get customer by customerId

  2. Get product by productId

  3. Get order by orderId

  4. Get all products in an order

  5. Get the invoice for an order

  6. Get all orders containing a specific product (for a date range)

  7. Get an invoice by invoiceId

  8. Get all invoices for a customer (for a date range)

  9. Get all products ordered by a customer (for a date range)

Now we design our schema to satisfy these patterns.

Single-Table vs. Multi-Table Design

Before we dive into the actual design, we need to talk about a philosophical question in DynamoDB: should you put everything in one table, or use multiple tables?

Single-table design stores multiple entity types in the same table. You use generic attribute names like PK and SK for your keys (yes, you literally name your PK PK, and your SK SK), and an EntityType attribute to distinguish between customers, orders, products, etc. This approach was heavily promoted by AWS for years, because it lets you fetch related entities with a single query.

Multi-table design is more like traditional database design: one table per entity type. It's simpler to understand and implement, especially if you're coming from a relational background.

My recommendation is a hybrid: Single table per domain. Single-table design works well when you frequently need to query multiple entity types together, and multi-table design is fine when your access patterns don't require cross-entity queries. So my recommendation is to group in a single table entities that need to be queried together, and end up with a handful of tables, each of which can resolve the queries of a single domain (meaning a group of entities that exist together).

For our e-commerce example, we're going to use single-table design because several of our access patterns need to fetch related entities together (like getting an order with all its items), and because it's simpler. In a larger application, we'd end up with multiple “single” tables.

Designing the E-Commerce Schema

Let's implement our access patterns one by one. I'll show you how the table evolves as we add support for each pattern.

Access Patterns 1 & 2: Get Customer or Product by ID

These are the simplest patterns. We store customers with PK=c#<customerId> and SK=c#<customerId>. Products get PK=p#<productId> and SK=p#<productId>.

Why the prefixes? Because we're storing multiple entity types in the same table, and we need the keys to be unambiguous. Also, this makes it easy to identify the entity type at a glance.

PK

SK

EntityType

Name

Email

c#12345

c#12345

customer

Alice

[email protected]

c#67890

c#67890

customer

Bob

[email protected]

p#001

p#001

product

Widget

p#002

p#002

product

Gadget

To get a customer: Query where PK = "c#12345" AND SK = "c#12345"

Access Patterns 3 & 4: Get Order and Its Products

Here's where single-table design starts to shine. An order and its items share the same partition key (o#<orderId>), but they have different sort keys. The order itself has SK=o#<orderId>, while each order item has SK=p#<productId>.

PK

SK

EntityType

CustomerId

Quantity

Price

o#001

o#001

order

c#12345

o#001

p#001

orderItem

2

29.99

o#001

p#002

orderItem

1

49.99

To get everything about an order (the order itself plus all items): Query where PK = "o#001"

To get just the products in an order: Query where PK = "o#001" AND SK begins_with "p#"

This is the power of item collections. All items with the same partition key are stored together, so DynamoDB can return them all in a single query. No joins needed.

Here's what this looks like in code:

import boto3
from boto3.dynamodb.conditions import Key

dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('ECommerce')

# Get the entire order (order metadata + all items)
response = table.query(
    KeyConditionExpression=Key('PK').eq('o#001')
)

# Separate order from items based on EntityType
order = None
items = []
for item in response['Items']:
    if item['EntityType'] == 'order':
        order = item
    elif item['EntityType'] == 'orderItem':
        items.append(item)

# Or get just the products in the order
response = table.query(
    KeyConditionExpression=Key('PK').eq('o#001') & Key('SK').begins_with('p#')
)

One query, multiple entity types, no round trips. That's the payoff for the design complexity.

Still with me? This is where it starts to get interesting.

Access Pattern 5: Get Invoice for an Order

The invoice belongs to the order, so it goes in the same item collection:

PK

SK

EntityType

Amount

PaymentMethod

o#001

i#001

invoice

109.97

credit_card

To get the invoice for an order: Query where PK = "o#001" AND SK begins_with "i#"

Access Pattern 6: Get Orders for a Product (Date Range)

This is our first pattern that requires a GSI. We need to query by product ID and filter by date, but our base table doesn't support that: products are partition keys for products, not for order items. (We've hit the wall that every DynamoDB designer eventually hits.)

We'll create a GSI (let's call it GSI1) where:

  • Partition key: productId

  • Sort key: orderDate

Wait—but we're using single-table design with generic key names. How does this work?

We add attributes to our order items: GSI1PK (the product ID) and GSI1SK (the order date). DynamoDB will index only items that have these attributes, creating what's called a sparse index. Items without these attributes won't appear in the GSI, and you don't need to pay to store them twice.

PK

SK

GSI1PK

GSI1SK

EntityType

o#001

p#001

p#001

2025-03-15

orderItem

o#001

p#002

p#002

2025-03-15

orderItem

To get all orders containing product p#001 in March 2025: Query GSI1 where GSI1PK = "p#001" AND GSI1SK BETWEEN "2025-03-01" AND "2025-03-31"

Access Patterns 7 & 8: Get Invoice by ID, Get Payments for Invoice

Invoice by ID is interesting. Our invoice items are currently keyed by order ID, but we also need to retrieve them by invoice ID. Time for another entry in our GSI.

We'll add GSI1PK=i#<invoiceId> and GSI1SK=i#<invoiceId> to our invoice items:

PK

SK

GSI1PK

GSI1SK

EntityType

o#001

i#001

i#001

i#001

invoice

For payments, we could create separate payment items, but since our access pattern is "get all payments for an invoice" (not individual payments), we can store payments as a JSON attribute on the invoice. This is a valid denormalization, we're trading flexibility for simplicity.

Access Patterns 8 & 9: Customer Invoice and Product History

These patterns need customer ID as the partition key, which our current structure doesn't support. Time for a second GSI (GSI2):

PK

SK

GSI2PK

GSI2SK

EntityType

o#001

i#001

c#12345

i#2025-03-15

invoice

o#001

p#001

c#12345

p#2025-03-15

orderItem

The sort key includes both a prefix (to distinguish invoices from products) and a date (for range queries).

To get all invoices for customer c#12345 in Q1 2025: Query GSI2 where GSI2PK = "c#12345" AND GSI2SK BETWEEN "i#2025-01-01" AND "i#2025-03-31"

To get all products ordered by customer c#12345 in Q1 2025: Query GSI2 where GSI2PK = "c#12345" AND GSI2SK BETWEEN "p#2025-01-01" AND "p#2025-03-31"

If this feels like a lot of moving parts, that's because it is—but you're almost through the hard part.

Making It Cleaner with Multi-Attribute Keys

Remember when I mentioned multi-attribute composite keys in GSIs? Here's where they can help. (This feature shipped in November 2025, so it's relatively new.)

Without multi-attribute keys, we created synthetic keys like GSI2SK = "i#2025-03-15" by concatenating a prefix with a date. This works, but it has drawbacks:

  • You need to build and parse these strings in your application code (error-prone and tedious)

  • Adding a new GSI requires backfilling synthetic keys across all existing items

  • The data model is harder to understand at a glance

With multi-attribute keys, you can define a GSI where the sort key is composed of multiple actual attributes. For example, instead of GSI2SK = "i#2025-03-15", you could have:

  • Sort key attribute 1: entityType (value: "invoice")

  • Sort key attribute 2: orderDate (value: "2025-03-15")

The query becomes more natural: Query where GSI2PK = "c#12345" AND entityType = "invoice" AND orderDate BETWEEN "2025-01-01" AND "2025-03-31"

You can use up to four attributes in the partition key and four in the sort key.

The Critical Query Constraint

Here's what you absolutely need to know: multi-attribute sort key queries must be left-to-right. You must specify conditions in order, and you cannot skip attributes in the middle.

Valid query patterns:

  • ✅ First attribute only: entityType = 'invoice'

  • ✅ First two attributes: entityType = 'invoice' AND orderDate = '2025-03-15'

  • ✅ First attribute equality + second attribute range: entityType = 'invoice' AND orderDate BETWEEN '2025-01-01' AND '2025-03-31'

Invalid query patterns:

  • ❌ Skipping first attribute: orderDate = '2025-03-15' (can't skip entityType)

  • ❌ Range on first, equality on second: entityType > 'invoice' AND orderDate = '2025-03-15' (inequality must be last)

  • ❌ Querying out of order: orderDate = '2025-03-15' AND entityType = 'invoice'

This constraint directly affects how you order attributes in your key definition. Put the attributes you'll always filter on first, with the most common range query attribute last.

This feature is particularly useful when:

  • You're adding a GSI to an existing table (no backfilling required)

  • Your data naturally has multiple hierarchical dimensions

  • You want your schema to be more self-documenting

One thing to keep in mind: multi-attribute keys only work with GSIs, not with the base table or LSIs. And you still need to think carefully about data distribution: a multi-attribute partition key that results in a few "hot" partitions will cause throttling just like a single-attribute key would.

The Final Schema

Here's what our complete design looks like.

Base Table:

PK

SK

EntityType

CustomerId

ProductId

OrderDate

Amount

...

c#12345

c#12345

customer

p#001

p#001

product

o#001

o#001

order

c#12345

2025-03-15

o#001

p#001

orderItem

c#12345

p#001

2025-03-15

o#001

i#001

invoice

c#12345

2025-03-15

109.97

GSI1 (for product-based and invoice-based queries):

GSI1PK

GSI1SK

...

p#001

2025-03-15

...

i#001

i#001

...

GSI2 (for customer-based queries):

GSI2PK

GSI2SK

...

c#12345

i#2025-03-15

...

c#12345

p#2025-03-15

...

Access Pattern Summary:

#

Access Pattern

Key Condition

1

Get customer by ID

PK = "c#12345", SK = "c#12345"

2

Get product by ID

PK = "p#001", SK = "p#001"

3

Get order by ID

PK = "o#001", SK = "o#001"

4

Get products in order

PK = "o#001", SK begins_with "p#"

5

Get invoice for order

PK = "o#001", SK begins_with "i#"

6

Get orders for product

GSI1: GSI1PK = "p#001", GSI1SK between dates

7

Get invoice by ID

GSI1: GSI1PK = "i#001", GSI1SK = "i#001"

8

Get invoices for customer

GSI2: GSI2PK = "c#12345", GSI2SK between "i#..."

9

Get products for customer

GSI2: GSI2PK = "c#12345", GSI2SK between "p#..."

Design Principles to Remember

After walking through this example, let me distill some principles that apply to any DynamoDB design. These are worth bookmarking.

1. Start with access patterns. Never start by drawing an ERD and then trying to fit it into DynamoDB. List your queries first, then design a schema that supports them. This feels backwards if you're used to relational databases, but it's the right approach for DynamoDB.

2. Use composite keys to create item collections. Items that are often queried together should share a partition key. The sort key distinguishes them and controls their order.

3. Add indexes only when necessary. Each GSI duplicates your data and adds cost. Make sure you actually need the access pattern before creating an index. A surprising number of "necessary" indexes turn out to be premature optimization.

4. Embrace denormalization. Storing the same data in multiple places (like customer ID on both orders and order items) is normal and expected. Storage is cheap, round trips are expensive.

5. Use sparse indexes. If a GSI only needs to index certain item types, only add the GSI key attributes to those items. This keeps the index smaller and cheaper.

6. Plan for growth. Choose partition keys that will distribute load evenly as your data grows. A partition key with low cardinality (like "status" with values "active/inactive") will create hot partitions. You'll regret it later.

7. Consider using multi-attribute keys. Especially for GSIs on existing tables, they can simplify your schema and eliminate backfill operations. This is one of those features that, once you use it, you wonder how you lived without it.

8. Remember transactions cost double. DynamoDB transactions (TransactWriteItems, TransactGetItems) consume 2x the capacity units of regular operations. If your access patterns involve frequent transactional writes across multiple items, factor this into your capacity planning and cost estimates.

9. Use TTL for time-bound data. For time-series data, session records, logs, or anything with natural expiration, enable Time to Live. DynamoDB will automatically delete expired items, and these deletes don't consume write capacity. You'll need a TTL attribute (typically a Unix timestamp), so plan for it in your schema.

Recent Features That Affect Design Decisions

DynamoDB has evolved significantly in the past year or so. Here are some features worth knowing about, they might change how you approach certain design decisions.

Warm Throughput

DynamoDB now exposes a "warm throughput" value that shows how many read and write operations your table can readily handle. More importantly, you can pre-warm your table to handle anticipated traffic spikes. This is useful if you know a traffic surge is coming (say, a product launch or Black Friday) and want to avoid throttling during the critical window.

Configurable Maximum Throughput for On-Demand Tables

On-demand tables now support configurable maximum throughput. This lets you set a ceiling on how much capacity a table can consume, which is helpful for cost control and for protecting downstream services from being overwhelmed.

Multi-Region Strong Consistency

Global tables now support multi-region strong consistency. Previously, global tables only offered eventual consistency: you'd write in one region, and it would eventually replicate to others. With multi-region strong consistency, you can get strong consistency across regions, which is crucial for applications like inventory management or financial transactions. This is a big deal for applications that previously couldn't use global tables due to consistency requirements.

Configurable PITR Periods

Point-in-time recovery (PITR) now lets you configure the recovery period from 1 to 35 days (it used to be fixed at 35 days). This is helpful for compliance scenarios that require shorter data retention. The change came in January 2025, and it's one of those "finally!" features.

Zero-ETL Integrations

DynamoDB now has zero-ETL integrations with Amazon Redshift and SageMaker Lakehouse. Changes to your DynamoDB table are automatically replicated to these analytics services, so you can run complex queries and ML workloads without building ETL pipelines. If you've ever struggled with the "DynamoDB is great for OLTP but how do I run analytics?" question, this is your answer.

Common Questions

Let me anticipate some questions you might have. These come up every time I discuss DynamoDB design.

Why use one table instead of multiple ones?

Because related data stays together. In DynamoDB, there's no join operation: if you need data from multiple tables, you need to make multiple API calls. Single-table design lets you fetch related entities in one query.

That said, multi-table design is perfectly valid when your access patterns don't require cross-entity queries. AWS now explicitly supports both approaches. Don't let anyone tell you it's "wrong."

Is there only one correct design?

No. There are usually multiple valid designs, and the "best" one depends on your specific access patterns, expected data volume, and team familiarity with DynamoDB. A good design satisfies all your access patterns efficiently, but there's rarely only one way to achieve that.

How do I handle new access patterns?

If the new pattern can be supported by your existing keys or indexes, great, you're done. If not, you'll need to add a GSI. With multi-attribute composite keys, you might not need to backfill data, you can use existing attributes as key components.

If your data model fundamentally can't support the new pattern (like needing to filter by an attribute that doesn't exist), you might need to add that attribute and backfill it across existing items. This is annoying but manageable.

Should I use DynamoDB for everything?

No. DynamoDB is excellent for high-scale, low-latency access patterns that you can define in advance. It's not great for ad-hoc analytics, complex reporting, or applications where you don't know your access patterns upfront. If your stakeholders say "we'll figure out the queries later," run towards relational and don't look back.

For analytics on DynamoDB data, consider the zero-ETL integrations with Redshift or exporting to S3 and using Athena.

Is DynamoDB really this complex?

Not really. What's complex is unlearning relational database habits. Once you internalize the "access patterns first" approach, DynamoDB design becomes surprisingly methodical. You're not normalizing and then optimizing, you're designing for your specific queries from the start.

The good news: once your design is done, operating DynamoDB is significantly simpler than managing a relational database cluster. No sharding, no replication configuration, no maintenance windows. Well, none that you manage, AWS handles all of it.

Wrapping Up

DynamoDB database design is fundamentally different from relational database design. Instead of normalizing data and writing queries later, you start with the queries and design your schema to support them.

The key concepts are:

  • Partition keys for data distribution and direct lookups

  • Sort keys for ordering and range queries within a partition

  • Item collections for co-locating related entities

  • GSIs for alternative access patterns

  • Single-table design for data locality (when your patterns benefit from it)

Recent features like multi-attribute composite keys, configurable throughput, and warm throughput make DynamoDB more flexible and easier to operate. But the core design principles remain the same: know your access patterns, structure your data to serve them, and embrace denormalization.

Did you like this issue?

Login or Subscribe to participate in polls.

Reply

or to participate.