Simple AWS
Posts
Microservices vs. Agentic AI (Part 2): Communication, State, Patterns, and Predictability

Microservices vs. Agentic AI (Part 2): Communication, State, Patterns, and Predictability

Guille Ojeda
April 17, 2025

In Part 1 of this series I talked about the origins and foundational principles distinguishing Microservices from modern Agentic AI. I argued that microservices stem from software engineering needs, decomposing large applications along business domain lines to enhance agility and scalability. Agentic AI, fueled by Large Language Model (LLM) breakthroughs, focuses on decomposing complex tasks to orchestrate autonomous reasoning and action. Their different motivations shape their approaches to decomposition, specialization, and even the focus of autonomy itself (Team/Deployment vs. Operational/Decision-Making).

With that foundation laid, we now shift focus to how these architectural differences play out in runtime behavior. How do services communicate, versus how agents interact and collaborate? How do we manage the persistent records of a business versus the operational context of a reasoning engine? What patterns have emerged to solve common runtime problems on both sides, and how do they compare? And how does the fundamental nature of their core components impact system predictability? Exploring and contrasting these runtime dynamics is necessary to understand the practical trade-offs and design considerations inherent in each approach, which is what architecture is all about.

Interaction Styles of Microservices and Agentic AI

Both microservices and AI agents are distributed system, and in any distributed system the key is how those components communicate with each other. This isn't necessarily dictated by what the component do, but rather by the granularity of their responsibilities, and the expectations set upon the interactions.

Microservice Communication: Defined APIs and Simple Transport

Microservices interact primarily through well-defined Application Programming Interfaces (APIs). The common mechanisms include:

Synchronous Communication: A service makes a request to another service's API and waits for a response. REST over HTTP/S is simple and stateless. For higher performance or scenarios requiring bidirectional streaming, gRPC (using HTTP/2 and Protocol Buffers) is a frequent choice, with the main alternative being websockets. This pattern is good for interactions where the caller requires immediate data or confirmation to proceed, though it has the disadvantage of introducing temporal coupling between the services, and can make some failure scenarios harder to recover from.
Asynchronous Communication: This is a more complex pattern, but usually preferred in microservice architectures. Services communicate indirectly without waiting for an immediate response, typically via messaging systems. This promotes loose coupling and improves resilience, as components don't need to be available simultaneously. Common patterns and their associated AWS services include:
- Message Queues (Amazon SQS): One service sends a message to a queue; another service polls the queue and processes the message later. Excellent for decoupling tasks and load leveling (throttling).
- Publish/Subscribe (Amazon SNS): A service publishes a message (event) to a topic, and multiple subscribing services receive a copy independently. Ideal for broadcasting state changes or events, and for fanning out an event to multiple components. Can be used in conjunction with queues.
- Event Buses (Amazon EventBridge): A central bus receives events from various sources (other microservices, or even AWS services) and routes them to target services based on defined rules. Facilitates building event-driven architectures.
- Streaming Platforms (Amazon MSK for Apache Kafka, Amazon Kinesis Data Streams): For handling high-volume, ordered streams of events, often used for real-time data processing or feeding analytics pipelines.

A central philosophy guiding microservice communication is "smart endpoints, dumb pipes." The "smarts," the business logic, resides entirely within the microservices themselves (the endpoints). The communication infrastructure (network protocols and message brokers, the "pipes") should ideally act as simple, reliable transport mechanisms. This principle arose partly as a reaction against older SOA patterns where heavyweight Enterprise Service Buses (ESBs) sometimes contained significant business logic, transformation rules, or complex routing, creating a central point of complexity and potential failure. Intermediaries like an API Gateway (Amazon API Gateway) manage external API exposure, request routing, authentication, and rate limiting. Load Balancers (like AWS Elastic Load Balancer, both Application Load Balancers and Network Load Balancers) distribute traffic across service instances. Service Meshes (like AWS App Mesh or Istio/Linkerd, and to some extent AWS ECS Service Connect) handle network-level concerns like reliable service-to-service communication, discovery, observability, retries, and circuit breaking. These intermediaries primarily provide infrastructure capabilities, aiming to keep application logic out of the communication fabric itself.

Agentic Communication: Context, Tools, and Orchestration

Agentic AI systems exhibit more diverse interaction patterns, reflecting the agent's need to reason, access information, and act upon the world:

Agent-Tool Interaction: This is a fundamental part of agentic architectures. An agent, guided by its reasoning process (an LLM call), determines it needs to perform an action or get external information and invokes a predefined Tool. Technically, this is often implemented as a standard API call (REST or RPC) to a function (like an AWS Lambda function) or another service. The agent's LLM often needs to generate the correct parameters for the API call based on the task context and the tool's description, and that's a whole other discussion.
Agent-Knowledge Base Interaction: Agents often need external information not present in their training data. Retrieval-Augmented Generation (RAG) is the common pattern. The agent (or a process acting on its behalf) takes the user query or intermediate thought, converts it to an embedding, searches a vector database (like those powered by Amazon OpenSearch Service with k-NN, or Amazon Aurora with the pgvector extension, or specialized services like Amazon Kendra) for relevant text chunks, retrieves those chunks, and then includes them as context within the prompt sent to the LLM to generate a more informed response. This is a specialized information retrieval interaction.
Agent-Agent Collaboration: In systems with multiple agents (which are what we usually call Agentic AI), coordination and ensuring the right agents are called is critical. This often involves patterns more complex than simple API calls:
- Orchestrator/Supervisor: A central agent manages the workflow, assigns tasks to worker agents, gathers results, and decides the next steps, often communicating via internal function calls or messages.
- Broker: A routing agent directs messages or tasks between specialized agents based on content or required skills, typically using asynchronous messaging. This function can also be performed by the Supervisor.
- Event-Driven: Agents publish events reflecting their state changes or completed actions (e.g. to Amazon EventBridge), and other agents subscribe and react autonomously.
- Frameworks like LangGraph allow defining these interactions as state machines or graphs, managed by the framework's runtime. Platforms like Amazon Bedrock Agents provide managed orchestration capabilities.

Critically, the communication payload in agentic architectures often includes rich context, intermediate reasoning steps, goals, or detailed instructions, not just simple data. The "dumb pipes" philosophy doesn't neatly apply. The agent's reasoning before calling other agents injects intelligence and business logic into the communication. Intermediaries like Orchestrators or frameworks explicitly contain workflow logic. The interaction is less about simple data transport and more about orchestrating an intelligent, context-aware process.

Managing State in Microservices and AI Agents

State is anything that is used for more than one request, and that should be persisted across requests and accessible to all nodes (a critical thing for high availability and scalability). This is true for both microservices and AI agents, but the way these architectures handle state and data persistence is another area of significant divergence.

Microservices: Decentralized Business Data & Eventual Consistency

The mantra here is decentralized data management. Each microservice owns its data, tailored to its needs (sometimes using different data stores like Amazon RDS for relational data, Amazon DynamoDB for NoSQL, etc.). This avoids the coupling and schema evolution nightmares of shared databases, and decouples other services that consume this data from how the data is stored. State primarily represents the authoritative, persistent records of business entities and transactions.

The major challenge this approach introduces is maintaining data consistency across these independent databases when a single business process modifies data owned by multiple services. Because distributed transactions using protocols like two-phase commit are complex and tend to make overall availability a bit more difficult, the prevalent approach is eventual consistency. This acknowledges that updates across services won't be instantaneous but will converge over time, requiring applications to tolerate temporary inconsistencies (e.g. an order status might briefly differ between the Order service and a Reporting service).

Handling this requires specific patterns:

Saga Pattern: This pattern manages distributed business transactions through a sequence of local transactions within each participating service. If any step fails, predefined compensating transactions are executed to undo the effects of preceding successful steps, effectively rolling back the business operation. For example, if booking a flight succeeds but charging the credit card fails, the compensation action is to cancel the flight booking. Sagas can be implemented via:
- Choreography: Services react to events published by others (e.g. using Amazon SNS or EventBridge). This is highly decoupled but can be hard to track.
- Orchestration: A central coordinator (potentially implemented using AWS Step Functions) explicitly tells each service what local transaction to execute and handles failure/compensation logic. This is easier to manage but introduces a coordinator dependency.
Transactional Outbox Pattern: This pattern ensures that events corresponding to a database change are published reliably. The service atomically commits both the data change and an event record describing the change to its local database within the same transaction. A separate process then reads these event records from the "outbox" table and reliably publishes them to a message broker (like Amazon SNS, EventBridge, or MSK). This prevents the state where the database is updated but the corresponding event notification fails to send.

Agentic AI: Memory, Knowledge, and Contextual Coherence

Agentic AI's focus is different. It's less about owning persistent business data and more about managing the operational state and context needed for reasoning and task execution. Key elements include:

Memory: This is essential for maintaining context during interactions.
- Short-Term Memory: Holds information relevant to the current conversation turn or task execution sequence (e.g. recent user messages, agent's intermediate thoughts or plans). This might be managed in memory or stored transiently. Context window limitations of LLMs are a major factor here, and even though they've grown a lot lately (Gemini 2.5 pro has a 1M tokens context window), you still need to be mindful of input tokens and the associated cost.
- Long-Term Memory: Allows agents to recall information across multiple interactions or sessions (e.g. user preferences, past conversation summaries, learned facts). This requires persistence, potentially using databases like DynamoDB or, increasingly, vector databases for semantic retrieval of relevant past experiences.
Knowledge Bases (for RAG): Agents frequently access external information repositories via Retrieval-Augmented Generation. The RAG process typically involves embedding a query, searching a vector database (OpenSearch, Amazon Aurora with pgvector, Kendra) to find relevant document chunks, and injecting these chunks into the LLM prompt as context. This external knowledge isn't state the agent owns, but state it accesses to generate informed responses. Updating these knowledge bases isn't really part of the agentic architecture itself, but it's worth mentioning that they are updated.

The primary "consistency" challenge here is ensuring the agent's internal state (memory) remains coherent and relevant, and that the external knowledge bases are accurate and up-to-date. An agent reasoning with inconsistent memory or outdated knowledge will produce poor results. While multi-agent systems modifying shared state could face distributed consistency issues, patterns for managing inter-agent state consistency seem less standardized than microservice data patterns, likely because it's a much less prevalent pattern (at least according to my own experience, and likely because agents are so new). However, when an agent uses a tool that interacts with a traditional database or microservice, it must engage with that system's consistency mechanisms. For instance, if an agent orchestrates actions via tools that modify multiple backend microservices, implementing or invoking a Saga might be necessary to ensure the overall business operation completes reliably or compensates correctly.

Caching Strategies

Both architectures benefit from caching, and it's interesting to explore how they use it. Microservices use caches like Amazon ElastiCache (Redis, Memcached) primarily to store frequently accessed domain data or API responses, reducing latency and database load. Agentic AI caching focuses on reducing latency and cost (especially LLM token costs). Common targets include prompts (check out Amazon Bedrock Prompt Caching), LLM responses (if inputs are identical and some level of determinism is acceptable), knowledge base lookup results (which are the same idea as caching database responses), and outputs from idempotent tool calls (same idea as caching microservice responses). The performance and cost-saving goals are similar, but the nature of the cached content differs. Still, caching is caching, and there's a lot to learn from 15 years of microservice experience, and 50 or so years of cache experience (the earliest reference I could find is from Structured Computer Organization by Andrew S. Tanenbaum, 1976).

Shared and Differing Design Patterns

Both Microservice and Agentic AI architectures, being distributed systems, inevitably face similar fundamental challenges. However, the different nature of the components (deterministic code vs. probabilistic reasoning engines) and the core objectives (application structure vs. task automation) lead to divergent, though sometimes conceptually related, patterns. Let's explore how each architecture addresses a few shared problems, and consider whether they also share solutions.

Dynamic Component Discovery

In any dynamic distributed environment, particularly those leveraging cloud elasticity or container orchestration, component instances are ephemeral. Services or agents are created, destroyed, scaled up or down, and their network locations (like IP addresses and ports) are not fixed. This presents a fundamental problem: how does a component reliably find the correct network address of another component it needs to communicate with at runtime?

The microservice ecosystem relies heavily on Service Discovery mechanisms operating at the network infrastructure level. Typically, this involves a Service Registry (like AWS Cloud Map, Consul, etcd, or integrated Kubernetes service discovery). When a microservice instance starts, it registers its network location and health status with the registry. When another service needs to call it, it queries the registry using a stable logical service name to resolve the current, healthy IP address(es) and port(s). Load balancers often integrate directly with these registries to know where to distribute traffic. The focus is squarely on resolving logical service names to physical network endpoints.

Agentic systems also need discovery, but often at a different level of abstraction. While an agent might need to discover the network endpoint of a Tool (which could use standard service discovery), a more common challenge is “discovering” the right tool or agent for a specific task. This frequently involves the core LLM's reasoning capabilities. The agent is provided with descriptions of available tools (including their purpose and parameters). Based on its current goal or sub-task, the LLM analyzes these descriptions to semantically determine the most appropriate tool to invoke. Agentic Frameworks might also provide registries or mechanisms for tool/agent lookup, but the selection process often involves this layer of semantic understanding or functional matching rather than just resolving a network address. In multi-agent systems, finding another agent might involve explicit routing by an Orchestrator or Broker, or discovery based on capabilities advertised within the framework.

Both architectures solve the "finding the right component" problem, but operate at different conceptual levels. Microservices know which component they're looking for, and they primarily rely on network-level service discovery to resolve logical names to physical addresses for potentially identical service instances. AI Agents first need to determine which component they're looking for, relying on semantic reasoning or framework-level capabilities to discover the functionally appropriate tool or specialized agent needed for a specific step in a task, and only after that they fall into the same problem of microservice discovery.

Managing Interaction Complexity & Routing

As the number of independent components (services or agents) grows, direct peer-to-peer communication becomes increasingly complex to manage, test, and secure. Systems need ways to route requests efficiently, manage external access, and control internal interaction flows without creating a tangled mess.

In a microservices architecture this is addressed through patterns that manage traffic flow and centralize certain concerns, primarily at the network edge or infrastructure level. The API Gateway pattern provides a single, managed entry point for external clients, handling concerns like request routing to appropriate backend services, authentication/authorization, rate limiting, request and/or response transformation, and sometimes response aggregation. Internally, Load Balancers distribute traffic across available instances based on load and health. Service Meshes can provide more sophisticated internal traffic management, like fine-grained routing rules or traffic splitting for canary releases. These solutions focus on managing network traffic and enforcing policies at architectural boundaries.

Managing interaction complexity in Agentic AI architectures often involves embedding logic within specialized agents or frameworks that control the workflow. The Orchestrator/Supervisor agent centralizes the control logic for a multi-step task or multi-agent collaboration. The Broker agent (which may be a separate agent, or may be the same Supervisor agent) intelligently routes messages or tasks to other agents based on content or required skills. Routing Workflow patterns explicitly define paths for different types of tasks to reach specialized handlers or agents. These patterns manage the logical flow of execution and collaboration, often making decisions based on the state of the task or the content of messages.

Both paradigms use intermediary patterns to manage interaction complexity. However, microservice intermediaries (API Gateway, LB) primarily function as infrastructure components, routing network traffic and enforcing generic policies without deep application awareness. Agentic intermediaries (Orchestrator, Broker) often encapsulate significant application or workflow logic, actively directing the sequence of operations and making content-aware routing decisions. The intelligence sits closer to, or within, the interaction management layer in agentic systems.

Ensuring Reliability & Handling Failure

Partial failures are a fact of life in distributed systems. Network connections can drop, components can crash or become slow, external dependencies can fail. Systems must be designed to tolerate these faults and continue operating reliably, or at least fail gracefully.

In microservices, reliability focuses heavily on infrastructure resilience and handling failures between deterministic code units. Key patterns include: Redundancy (running multiple instances), Health Checks (detecting unhealthy instances for removal from load balancing), Timeouts (preventing indefinite waits on dependencies), Retries (handling transient network errors), Circuit Breakers (stopping requests to failing services to prevent cascading failures and allow recovery), and Bulkheads (isolating resources to limit the blast radius of dependency failures). Asynchronous communication also contributes significantly to resilience by decoupling components. Data consistency (or at least detection of inconsistencies) is also an important aspect, but I already talked about it. The key point is that these patterns primarily address component unavailability or network issues.

Reliability in Agentic AI is twofold. First, it must handle the same infrastructure failures as microservices, especially when Tools interact with external systems (requiring robust tool design and standard patterns like Retries and Timeouts applied either by the agent logic or within the tool). Second, and uniquely, it must handle AI-specific or cognitive failures: LLM hallucinations, poor reasoning or planning, inability to use a tool correctly, misinterpreting results. Solutions for this include AI-specific patterns like Reflection/Self-Correction (where an agent reviews and attempts to fix its own output or plan based on predefined criteria or checks), Evaluations (external validation of agent performance, either by code or by another LLM call), adaptive Re-planning (finding alternative paths if a step fails), and robust Error Handling within the agent's core logic to catch and manage exceptions from both tools and the LLM. Guardrails act as safety nets against undesirable outputs, and they can be applied either to the final response or to intermediate responses.

The Predictability Divide: Deterministic Logic vs. Probabilistic Reasoning

This difference perhaps most profoundly shapes the operational reality and design philosophy of Generative AI applications in general, not limited to Agents.

Microservices, like most traditional software, are built upon a foundation of determinism. Given a specific input and state, a correctly functioning service is expected to produce the exact same output and undergo the exact same state transition every single time. This predictability is fundamental to how we:

Test: Unit tests, integration tests, and end-to-end tests largely rely on asserting that actual outputs match expected outputs for given inputs.
Debug: Reproducing a bug typically involves recreating the specific inputs and state conditions that reliably trigger the erroneous behavior.
Reason about Reliability: We build confidence by testing predictable components and using patterns (like retries for transient network errors) that manage failures within an otherwise deterministic system.

It's worth noting that distributed systems are never truly 100% deterministic. Unexpected network and infrastructure failures can and will occur, and they will impact the system. These are mostly solved problems, even if the solutions are rather complex (distributed transactions, eventual consistency, etc). Moreover, we can test for these unpredictable failures using chaos engineering, for example with AWS Fault Injection Simulator. In almost any form of traditional software we adapt the application layer to handle these failures and test for them at the infrastructure layer.

Agentic AI: Embracing (and Managing) Non-Determinism

Agentic systems leveraging LLMs introduce inherent non-determinism. This stems primarily from the probabilistic nature of how LLMs generate text: they predict the most likely next token (or word part) based on the preceding context, often employing sampling strategies (like adjusting the 'temperature' parameter) to introduce variability and avoid repetitive outputs. This means:

The same prompt might yield slightly different phrasing, sentence structure, or even minor variations in the reasoning path on subsequent runs.
In rare cases, it can lead to hallucinations: confident-sounding but factually incorrect or nonsensical outputs.

This non-determinism is not just unavoidable, it's often entirely desirable. Any form of creativity by an LLM is due to the same factors that cause hallucinations, and the whole reason why we build Generative AI applications and use LLMs is because of these factors. However, even if we're explicitly searching for this non-determinism, its presence fundamentally changes how we operate applications:

Testing: Exact output matching is often impossible or irrelevant. Testing must shift focus to behavioral validation: Does the agent achieve the intended goal? Does the output satisfy key criteria (checked via rules, another LLM, or human evaluation)? Is the agent robust to slight variations in input phrasing? Testing often requires evaluating performance over multiple runs or using statistical measures, and determining correctness via semantic analysis instead of direct comparison. Equivalence classes for inputs are often impossible to predict just by analysis, and likely impractical even to calculate via statistical observations. For these reasons, significant investment in robust evaluation frameworks becomes necessary, and I promise to write an article about it soon.
Debugging: When an agent misbehaves, pinpointing the cause is harder. Was it the prompt, the specific context window contents, retrieved RAG data, the inherent randomness of the model for that query, or a genuine hallucination? Reproducing the error can be difficult, even if you manage to recreate the exact state. Debugging necessitates capturing and analyzing the entire reasoning context: the exact prompt sent to the LLM, the retrieved knowledge snippets, any tool calls and responses, and the intermediate "thoughts" or plans generated by the agent (requiring detailed logging or specialized observability tools). To make matters even more fun, there are no tools that can really observe the internal state of a model, so we can only capture what the model outputs.
Reliability: Building reliable agentic systems means managing this uncertainty. It requires good input validation, strong output validation (checking against rules or desired formats, and very robust evals), incorporating self-correction mechanisms (like the Reflection pattern), implementing clear fallback behaviors when high confidence cannot be achieved, and using Guardrails to prevent harmful or undesirable actions. Human oversight (called human-in-the-loop) for critical tasks is often a necessary reliability component, even if introduces delays and causes more work.

Conclusion: Runtime Realities Shape Design

The runtime behaviors of Microservices and Agentic AI systems present some similarities, but also differences due to the fundamental nature of their components. Microservices communicate via relatively simple transport mechanisms between endpoints containing deterministic business logic, manage persistent business data with established patterns for eventual consistency, and rely on infrastructure-focused resilience patterns. Agentic AI involves more complex, context-rich communication flows often managed by intelligent intermediaries, focuses state management on operational context and memory, requires unique patterns for cognitive reliability, and must contend with inherent non-determinism resulting from its AI core.

In part 1 we concluded the different focus (especially the why) of these two architecture patterns results in different ways to solve the same problems, due to their different priorities. Here in part 2 we found that some solutions apply equally well to both patterns, but in many cases AI Agents add one more layer to the problem, necessitating newer solutions on top of the known patterns. In part 3 we'll analyze deployment, observability and security, and consider hybrid implementations that combine both patterns.

Did you like this issue?

Loved it! 💖 | It was good 🙂 | No bueno 😑

Reply

or to participate.