Transactions in Amazon DynamoDB
Showing that Amazon DynamoDB is ACID-compliant

Guille Ojeda
May 10, 2023
Transactions in Amazon DynamoDB
Scenario
We're building an e-commerce app with DynamoDB for the database, pretty similar to the one we built for the DynamoDB Database Design issue. No need to go read that issue (though I think it came up great), here's how our database works:
Customers are stored with a Customer ID starting with c# (for example c#123) as the PK and SK.
Products are stored with a Product ID starting with p# (for example p#123) as the PK and SK, and with an attribute of type number called 'stock', which contains the available stock.
Orders are stored with an Order ID starting with o# (for example o#123) for the PK and the Product ID as the SK.
When an item is purchased, we need to check that the Product is in stock, decrease the stock by 1 and create a new Order.
Payment, shipping and any other concerns are magically handled by the power of "that's out of scope for this issue" and "it's left as an exercise for the reader".
There are more attributes in all entities, but let's ignore them.
Services
DynamoDB: A NoSQL database that supports ACID transactions, just like any SQL-based database.
Solution
The trick here is that we need to read the value of stock and update it atomically. Atomicity is a property of a set of operations, where that set of operations can't be divided: it's either applied in full, or not at all. If we just ran the GetItem and PutItem actions separately, we could have a case where two customers are buying the last item in stock for that product, our scalable backend processes both requests simultaneously, and the events go down like this:
Customer123 clicks Buy
Customer456 clicks Buy
Instance1 receives request from Customer123
Instance1 executes GetItem for Product111, receives a stock value of 1, continues with the purchase
Instance2 receives request from Customer456
Instance2 executes GetItem for Product111, receives a stock value of 1, continues with the purchase
Instance1 executes PutItem for Product111, sets stock to 0
Instance2 executes PutItem for Product111, sets stock to 0
Instance1 executes PutItem for Order0046
Instance1 receives a success, returns a success to the frontend.
Instance2 executes PutItem for Order0047
Instance2 receives a success, returns a success to the frontend.

The process without transactions
The data doesn't look corrupted, right? Stock for Product111 is 0 (it could end up being -1, depends on how you write the code), both orders are created, you received the money for both orders (out of scope for this issue), and both customers are happily awaiting their product. You go to the warehouse to dispatch both products, and find that you only have one in stock. Where did things go wrong?
The problem is that steps 4 and 7 were executed separately, and Instance2 got to read the stock of Product111 (step 6) in between them, and made the decision to continue with the purchase based on a value that hadn't been updated yet, but should have. Steps 4 and 7 need to happen atomically, in a transaction.
First, install the packages from the AWS SDK V3 for JavaScript:
npm install @aws-sdk/client-dynamodb @aws-sdk/lib-dynamodb
This is the code in Node.js to run the steps as a transaction (you should add this to the code imaginary you already has for the service):
const { DynamoDBClient } = require('@aws-sdk/client-dynamodb');
const { DynamoDBDocumentClient } = require('@aws-sdk/lib-dynamodb');
const dynamoDBClient = new DynamoDBClient({ region: 'us-east-1' });
const dynamodb = DynamoDBDocumentClient.from(dynamoDBClient);
//The code imaginary you already has
//This is just some filler code to make this example valid. Imaginary you should already have this solved
const newOrderId = 'o#123' //Must be unique
const productId = 'p#111' //Comes in the request
const customerId = 'c#123' //Comes in the request
const transactItems = {
TransactItems: [
{
ConditionCheck: {
TableName: 'SimpleAwsEcommerce',
Key: { id: productId },
ConditionExpression: 'stock > :zero',
ExpressionAttributeValues: {
':zero': 0
}
}
},
{
Update: {
TableName: 'SimpleAwsEcommerce',
Key: { id: productId },
UpdateExpression: 'SET stock = stock - :one',
ExpressionAttributeValues: {
':one': 1
}
}
},
{
Put: {
TableName: 'SimpleAwsEcommerce',
Item: {
id: newOrderId,
customerId: customerId,
productId: productId
}
}
}
]
};
const executeTransaction = async () => {
try {
const data = await dynamodb.transactWrite(transactItems);
console.log('Transaction succeeded:', JSON.stringify(data, null, 2));
} catch (error) {
console.error('Transaction failed:', JSON.stringify(error, null, 2));
}
};
executeTransaction();
//Rest of the code imaginary you already has
Here's how things may happen with these changes, if both customers click Buy at the same time:
Customer123 clicks Buy
Customer456 clicks Buy
Instance1 receives request from Customer123
Instance2 receives request from Customer456
Instance1 executes a transaction:
ConditionCheck for Product111, stock is greater than 0 (actual value is 1)
PutItem for Product111, set stock to 0
PutItem for Order0046
Transaction succeeds, it's committed.
Instance1 receives a success, returns a success to the frontend.
Instance2 executes a transaction:
ConditionCheck for Product111, stock is not greater than 0 (actual value is 0)
Transaction fails, it's aborted.
Instance2 receives an error, returns an error to the frontend.

The process with transactions
Solution explanation
DynamoDB is so scalable because it's actually a distributed database, where you're presented with a single resource called Table, but behind the scenes there's multiple nodes that store the data and process queries. DynamoDB is highly available (meaning it can continue working if an Availability Zone goes down) because nodes are distributed across availability zones, and data is replicated. You don't need to know this to use DynamoDB, but now that you do, you see that transactions in DynamoDB are actually distributed transactions.
DynamoDB implements distributed transactions using Two-Phase Commit (2PC). This strategy is pretty simple: All nodes are requested to evaluate the transaction to determine whether they're capable of executing it, and only after all nodes report that they're able to successfully execute their part, the central controller sends the order to commit the transaction, and each node does the actual writing, affecting the actual data. For this reason, all operations done in a DynamoDB transaction consume twice as much capacity.
Transactions can span multiple tables, but they can't be performed on indexes. Also, propagation of the data to Global Secondary Indexes and DynamoDB Streams always happens after the transaction, and isn't part of it.
Transaction isolation (the I in ACID) is achieved through optimistic concurrency control. This means that multiple transactions can be executed concurrently, but if DynamoDB detects a conflict, one of the transactions will be rolled back and the caller will need to retry the transaction.
Discussion
The whole point of this issue (which I've been trying to make for the past couple of weeks) is that SQL databases shouldn't be your default. I'll make one concession though: If all your dev team knows is SQL databases, just go with that unless you have a really strong reason not to.
So far I've shown you that DynamoDB can handle an e-commerce store just fine, including ACID-compliant transactions. This one's gonna blow your mind: You can actually query DynamoDB using SQL! Or more specifically, a SQL-compatible language called PartiQL. Amazon developed PartiQL as an internal tool, and it was made generally available by AWS. It can be used on SQL databases, semi-structured data, or NoSQL databases, so long as the engine supports it.
With PartiQL you could theoretically change your Postgres database for a DynamoDB database without rewriting any queries. In reality, you need to consider all of these points:
Why are you even changing? It's not going to be easy.
How are you going to migrate all the data?
You need to make sure no queries are triggering a Scan in DynamoDB, because we know those are slow and very expensive. You can use an IAM policy to deny full-table Scans.
Again, why are you even changing?
I'm not saying there isn't a good reason to change, but I'm going to assume it's not worth the effort, and you'll have to prove me otherwise. Remember that replicating the data somewhere else for a different access pattern is a perfectly valid strategy (in fact, that's exactly how DynamoDB GSIs work). We'll discuss it further in a future issue.
Best Practices
Operational Excellence
Monitor transaction latencies: Monitor latencies of your DynamoDB transactions to identify performance bottlenecks and address them. Use CloudWatch metrics and AWS X-Ray to collect and analyze performance data.
Error handling and retries: Handle errors and implement exponential backoff with jitter for retries in case of transaction conflicts.
Security
Fine-grained access control: Assign an IAM Role to your backend with an IAM Policy that only allows the specific actions that it needs to perform, only on the specific tables that it needs to access. You can even do this per record and per attribute. This is least privilege.
Reliability
Consider a Global Table: You can make your DynamoDB table multi-region using a Global Table. Making the rest of your app multi-region is more complicated than that, but at least the DynamoDB part is easy.
Performance Efficiency
Optimize provisioned throughput: If you're using Provisioned Mode, you'll need to set your Read and Write Capacity Units appropriately. You can also set them to auto-scale, but it's not instantaneous. Remember the last issue on using SQS to throttle writes.
Cost Optimization
Optimize transaction sizes: Minimize the number of items and attributes involved in a transaction to reduce consumed read and write capacity units. Remember that transactions consume twice as much capacity, so optimizing the operations in a transaction is doubly important.
Resources
I thought mindset coaching was bull****, until I met Richard Donovan. His advice is easy enough to apply that it overcomes my natural resistance to these things, and it's practical enough to give me benefits now, instead of years down the line. Go follow him, and also watch this interview.
We all have that friend who wants to get in the software industry, and asks us to "teach them everything". We're so passionate about this that we talk to them for 6 hours straight, in an honest effort to help, and then we're surprised when they tell us they didn't understand jack.
There's a better way: the book Code Your Future: A Guide to Career Change and Success in Software Engineering, written by Eduardo Vedes. Too expensive? Send your friend to the free Code Your Future newsletter first.
You know LeetCode challenges, right? Fun if you're bored (and are weird like me), but completely unrelated to the actual job of building software. Check these code challenges instead. Even more fun, and you practice stuff that's actually relevant and useful.
Did you like this issue? |