Orchestrating Complex Tasks with Microsoft Agent Framework Workflows
In our previous
article, we built
a practical IT Helpdesk agent using the Microsoft Agent
Framework.
We saw how the ChatAgent could intelligently interact with users, use
tools to check system statuses, and manage a conversation's state. It was
a perfect example of a smart, conversational assistant, capable of
understanding user intent and taking immediate, reactive steps.
But real-world business processes are rarely so simple. The agent we built is good at handling in-the-moment tasks, but what happens when a process needs to span several days, like a manager's approval for a new software license? How do you handle a task that must pause, wait for a human decision, and then reliably resume without losing context? What about a multi-step data pipeline where each step must complete successfully before the next one begins, and the entire sequence needs to be auditable?
This is where the conversational ChatAgent paradigm meets its limits,
and a more structured, robust concept is needed. Enter Microsoft Agent
Framework
Workflows.
A ChatAgent handles ad-hoc requests dynamically, deciding what to do based on the input it receives. A Workflow takes a different approach: it defines a fixed, graph-based sequence of steps that execute in a predictable order. This matters when you need to guarantee that specific steps always run, that approvals happen at the right points, and that you can trace exactly what happened after the fact. Workflows are designed for complex, long-running processes where reliability and auditability are requirements, not nice-to-haves.
In this deep dive, we'll move beyond the chat interface and explore the orchestration engine at the heart of the framework. We'll learn how to build, run, and manage these durable constructs, including the all-important "human-in-the-loop" pattern that tries to make enterprise AI not just working, arguably a bit more complex, but trustworthy.
Agents vs. Workflows: Choosing the Right Tool for the Job
Before we dive into writing code, it's important to understand
a fundamental architectural decision within the Microsoft Agent Framework:
when to use a conversational ChatAgent and when to build a structured
Workflow. They are not interchangeable. Each is designed to solve
a different class of problems, and choosing the right tool is the first
step toward building a robust and maintainable system.
ChatAgent: The Conversationalist
As we saw in our previous guide, the ChatAgent is very good at
open-ended, dynamic interaction. Its behavior is primarily steered by the
Large Language Model at its core. You give it a set of instructions,
a collection of tools, and a user prompt, and the agent uses its reasoning
capabilities to decide the best course of action.
The path it takes is not pre-determined, but it emerges from the dialogue.
-
Best for:
- User-facing conversational interfaces (chatbots, copilots).
- Ad-hoc task execution where the user's intent guides the process.
- Scenarios where flexibility and natural language understanding are more important than a rigid, repeatable process.
-
Driving Logic: The LLM's turn-by-turn reasoning. It plans, executes a tool, observes the result, and re-plans in a continuous loop.
-
State Management: The agent's state is encapsulated within the
AgentThread, which holds the conversation history. This is ideal for managing the context of a single, continuous interaction but is not inherently designed for long-term persistence across system restarts without additional engineering. -
Key Strength: Flexibility. It can adapt to unexpected user requests and navigate complex conversations without a predefined script.
Workflow: The Structured Orchestrator
A Workflow, by contrast, is a deterministic, graph-based process defined by the developer. If an agent is a smart employee given a goal, a workflow is the documented business process they are required to follow.
The control flow is explicit, defined as a series of nodes (Executors)
connected by edges. While an LLM can be used within a node to perform an
intelligent task, it does not control the overall direction of the
process.
This explicit structure is what - from our point of view - enables the framework’s best enterprise features.
-
Best for:
- Automating established business processes (e.g., expense approvals, user onboarding).
- Long-running tasks that need to be paused and resumed (e.g., waiting for human input, running a multi-hour data job).
- Scenarios where auditability, reliability, and a predictable execution path are non-negotiable.
-
Driving Logic: A developer-defined execution graph. The flow moves from one
Executorto the next based on pre-defined connections and conditional logic. -
State Management: Built for durability. Workflows are designed to be checkpointed - their state can be serialized and saved to persistent storage (like a database) at any point. This allows them to survive restarts and wait indefinitely for external events.
-
Key Strength: Reliability. The process is predictable, auditable, and resilient to interruption.
The Best of Both Worlds: Embedding Agents in Workflows
This isn't an "either/or" decision. The most sophisticated solutions often
combine both patterns. A Workflow can orchestrate the high-level
process, and one of its nodes can be a ChatAgent tasked with handling
a specific, agent-like sub-task.
Consider an intelligent document processing pipeline:
- Workflow Node 1 (Executor): A simple function fetches a new PDF from a SharePoint folder. This step is deterministic and reliable.
- Workflow Node 2 (Agent): The PDF is passed to a
ChatAgentwith the instruction: "You are a legal analyst. Read this document, summarize the key clauses, and extract the names of all involved parties into a JSON object." This step uses the LLM's advanced reasoning for a complex, unstructured task. - Workflow Node 3 (Executor): The structured JSON output from the agent is then taken and saved into a database. This is another deterministic, reliable step.
Note: We see many projects, where people simply use an agent for
exactly the above-mentioned process. Why not slap a pdf reading tool and
database writing tool onto an agent and call it a day? Well, because
LLMs are so undeterministic when it comes to hundreds and thousands of
executions. The more decisions you leave to the LLM, the higher the risk
of failure. And it will fail. By using a Workflow to orchestrate the
overall process, you ensure that the critical steps (fetching the PDF,
saving to the database) are always executed reliably, while still using
the power of the ChatAgent for the complex task of document
understanding.
This hybrid approach gives you the best of both worlds: the robust, auditable orchestration of a Workflow combined with the flexible reasoning of an Agent for specific, well-contained tasks.
Here is a simple rule of thumb for making the choice:
| Use a ChatAgent when... | Use a Workflow when... |
|---|---|
| The primary interface is a conversation with a user. | You are automating a back-end, multi-step business process. |
| The sequence of steps is unpredictable and LLM-driven. | The sequence of steps is known and should be predictable. |
| The task is relatively short-lived (seconds to minutes). | The process could be long-running (hours or days). |
| You need maximum flexibility to handle diverse inputs. | You need maximum reliability, auditability, and control. |
Note: As always, treat such tables as guidelines, not hard rules.
The Core Concepts of Workflows: Nodes, Edges, and Executors
To build a workflow, you construct an executable directed acyclic graph (DAG). The Microsoft Agent Framework provides a clear and robust set of components for defining the nodes, edges, and data flow of this graph. Understanding these components is quite essential to designing modular and maintainable automated processes, therefore let's explore the core concepts.
Executors: The Nodes of the Graph
An Executor is the fundamental unit of computation in a workflow. It represents a single, self-contained node in the execution graph. Each executor should be designed with a single responsibility: to receive an input, perform a specific operation, and produce an output. This design promotes modularity, testability, and clear separation of concerns.
The framework supports two implementation patterns for executors:
- Class-based Executors: For components with complex logic, internal
state, or significant configuration, you can define a class that
inherits from
agent_framework.Executor. This object-oriented approach is ideal for encapsulating non-trivial business logic. - Function-based Executors: For stateless, single-purpose
transformations, you can define an
asyncfunction and apply the@executordecorator. This is a lightweight pattern for simple data mapping, filtering, or routing nodes.
Handlers (@handler): The Execution Entry Point
Within a class-based Executor, the @handler decorator designates the
specific async method that the workflow engine will invoke. This method
serves as the entry point for the node's logic.
For function-based executors, the function itself is implicitly the
handler, making the @executor decorator sufficient. The use of
@handler is a convention to explicitly declare the execution logic
within a class structure.
WorkflowBuilder and Edges: Defining the Graph Topology
A collection of executors is not a workflow until its structure is
defined. The WorkflowBuilder is the fluent API used to define the
graph's topology - the directed edges that dictate the flow of control and
data between executors.
The API provides clear, declarative methods for constructing the graph:
set_start_executor(executor): Specifies the entry point node for the workflow.add_edge(source_executor, destination_executor): Establishes a directed edge. When thesource_executoremits a message, the workflow engine routes it as input to thedestination_executor.add_conditional_edge(...): Allows for creating branches in the graph, routing data based on the content of the message itself. This is fundamental for implementing business rules and conditional logic.
This declarative approach separates the orchestration logic (the graph structure) from the business logic (the executor implementations), which is a core tenet of building maintainable systems.
WorkflowContext: The Interface to the Workflow Engine
Executors do not interact with each other directly. Instead, they are
fully decoupled and communicate through the workflow engine via the
WorkflowContext object (ctx), which is passed as an argument to
every handler.
This context object provides two essential methods for controlling data flow:
ctx.send_message(data): This method is used to emit output from the current executor. The engine intercepts this call and routes thedatapayload to all downstream nodes connected by outgoing edges. This is the standard mechanism for passing data between nodes.ctx.yield_output(data): This method is used by terminal nodes to publish a final result for the entire workflow. Any data passed toyield_outputis collected and returned to the external client that initiated the workflow run. A workflow can have multiple terminal nodes and can yield multiple outputs.
Hands-On Part 1: Building a Multi-Step IT Request Workflow
Theory is essential, but the best way to understand the use of workflows is to build one. We'll now create a practical, multi-step workflow that automates the initial processing, enrichment, and logging of an IT support request.
The Goal: Our workflow will be a four-step pipeline:
- Categorize: It will first use an LLM to analyze a raw user request and extract structured details like the problem area and priority.
- Enrich: Next, it will take the extracted category and "enrich" the ticket by looking up the appropriate IT support team from a predefined knowledge base.
- Format: It will then combine all this information into a clean JSON object, ready for an API.
- Create Ticket: Finally, it will simulate a call to an external ticketing system API and yield a final confirmation message.
We will first build and run this entire workflow from the command line to see the end-to-end process. Then, we'll see how the framework's built-in Dev Web UI can provide a visual interface for debugging and interaction.
Prerequisites
Before we begin, ensure your environment is set up with all necessary components, including those for the web UI.
-
Install the necessary packages:
-
Configure Azure OpenAI Environment Variables: The workflow will use the
AzureOpenAIResponsesClient. Ensure the following are set:AZURE_OPENAI_ENDPOINTAZURE_OPENAI_CHAT_DEPLOYMENT_NAMEAZURE_OPENAI_RESPONSES_DEPLOYMENT_NAMEAZURE_OPENAI_API_KEY
Step 1: Define the Data Models and Executors
Our workflow will pass strongly-typed data between nodes using Pydantic models. We will define four distinct executors for each step of our pipeline.
Create a file named executors.py.
Step 2: Build and Run the Workflow from the Console
With our four executors defined, we can now wire them together and run the workflow directly to test the end-to-end logic.
Create a main.py file with the following content.
Now, run this from your terminal: python main.py. You will see the log
output from each of the four nodes in sequence, confirming the workflow
executed correctly.
Expected Console Output:
Step 3: Visualizing the Workflow with the DevUI
While running from the console is effective for validation, the framework provides a very nice experience for development and debugging: the Dev Web UI. It allows you to visualize the execution graph, inspect the data flowing between nodes, and interactively run the workflow.
Let's adapt our main.py to launch this UI.
1. Create the Workflow Object: First, we'll define a reusable workflow object outside of our main execution block.
2. Modify main.py to Serve the UI: Replace the run_workflow
function and the main execution block with the code to serve the UI.
Now, run this updated script: python main.py. A browser window will open
to http://localhost:8091.
First, you'll see your workflow in an interactive graph view.
Workflow Graph View
By clicking on the "Configure and Run" button, you can input a sample user request and execute the workflow.
Run Workflow in DevUI
Upon execution, you can observe the live data flowing through each node, inspect the inputs and outputs, and verify that each step behaves as expected.
Workflow Execution in DevUI
Hands-On Part 2: Adding Conditional Logic for High-Priority Alerts
Linear workflows are a good starting point, but real business processes often require branching. A critical support request should not follow the same path as a routine one. The Microsoft Agent Framework handles this with conditional edges, allowing the workflow to route data based on its content.
The Goal: We will enhance our IT Triage Workflow to handle high-priority tickets differently.
- If a ticket is classified as "High" priority, the workflow will branch.
- Instead of just creating a ticket, it will also trigger a separate "Send Alert" notification (which we'll simulate).
- Low and Medium priority tickets will follow the standard ticket creation path.
This introduces a decision point into our graph, making our automation smarter and more responsive to the urgency of the situation.
Step 1: Update the Executors and Add a Condition
We need a new executor for sending alerts and a function to define the
routing logic. We'll add these to our existing executors.py file.
1. Add the New SendAlertExecutor
This will be a new terminal node for our high-priority branch.
2. Create the Condition Function
This is a simple boolean function that inspects the data flowing through an edge and decides if that path should be taken.
Update your executors.py file with the following additions:
Step 2: Build the Branched Workflow
Now we update main.py to construct the new, non-linear graph. We will
use add_edge with the condition parameter to create the two branches
after the enrich_node.
- The
enrich_nodenow becomes our decision point. - If
is_high_priorityreturnsTrue, the data flows tosend_alert_executor. - If
is_normal_priorityreturnsTrue, the data flows to the originalformat_ticket_executorpath.
Modify your main.py to reflect this new structure.
Step 3: Testing Both Branches in the DevUI
Run the updated main.py script. The DevUI will now display your new,
branched graph.
Branched Workflow Graph
View
Test Case 1: Normal Priority
- Input:
I need to reset my password for the HR portal, I forgot the old one. - Action: Click "Run".
- Expected Behavior: The LLM should classify this as "Low" or "Medium"
priority. You will see the workflow execute the right-hand branch in
the DevUI graph:
Enrich -> Format -> Create Ticket. The final output will be the standard ticket creation confirmation.
Test Case 2: High Priority
- Input:
The entire payment processing service is down! We cannot process any customer credit cards! - Action: Click "Run".
- Expected Behavior: The LLM will classify this as "High" priority.
You will see the workflow execute the left-hand branch:
Enrich -> Send Alert. The final output in the UI will be the high-priority alert message.
By adding conditional edges, we have significantly increased the intelligence of our automation. The workflow is no longer a simple pipeline but a dynamic process that can adapt its behavior based on the data it is processing. This is a fundamental pattern for building automations that can handle the complexity and variability of real-world scenarios.
Wrapping Up: From Pipelines to Processes
In this guide, we have moved beyond the conversational paradigm of
a ChatAgent to the structured orchestration capabilities of Workflows.
We began by constructing a simple, linear data processing pipeline,
demonstrating how to chain multiple executors to transform unstructured
user input into enriched, structured data ready for a downstream system.
We then enhanced this design by introducing conditional logic. By adding branching based on the data's content - in our case, ticket priority - we transformed a simple pipeline into an intelligent business process capable of adapting its behavior to different scenarios. This ability to define explicit, auditable, and conditional execution paths is a cornerstone of building reliable enterprise automation.
However, the automations we've built so far are still entirely self-contained and execute from start to finish in a single run. The most critical enterprise processes often don't work this way. They need to persist state across system restarts, pause for extended periods to wait for external events, and, most importantly, incorporate human judgment for critical decisions.
In the next part of this series, we will tackle these advanced, production-critical requirements directly. We will explore:
- Durability and State Management: How to configure workflows with checkpointing to make them durable. This allows a workflow to be paused, its state persisted to a database, and then resumed hours or even days later, ensuring resilience against system interruptions.
- Human-in-the-Loop Integration: We will implement a proper approval
step, where the workflow pauses and waits for an external signal
- simulating a manager's decision - before proceeding. This is the key to building trustworthy AI systems that combine the speed of automation with the oversight of human governance.
See you in the next part!
Official Resources
- Official Documentation on Workflows: The definitive source for deep dives into workflow concepts, advanced patterns, and API references.
- GitHub Repository: Explore the source code, find more examples (including the ones used in this article), and contribute to the project.
- Introductory Blog Post: Read the original announcement from Microsoft for more context on the vision and strategy behind the framework.
Get our Newsletter!
The latest on AI, RAG, and data
Interested in building high-quality AI agent systems?
We prepared a comprehensive guide based on cutting-edge research for how to build robust, reliable AI agent systems that actually work in production. This guide covers:
- Understanding the 14 systematic failure modes in multi-agent systems
- Evidence-based best practices for agent design
- Structured communication protocols and verification mechanisms
Further Reading
- Building Enterprise-Grade AI Agents with Microsoft's Agent
Framework: The
first article in this series, a perfect starting point for understanding
the
ChatAgentpattern and the framework's core enterprise features. - AI Agents From Scratch: A foundational guide to understanding the core concepts of how AI agents work, from planning to tool execution, which provides context for the agentic components you can embed within workflows.
- Langfuse: The Open Source Observability Platform: Observability is critical for production systems. Dive deeper into the tools and techniques for monitoring and debugging complex LLM applications and workflows.
- Building a RAG Pipeline with Azure AI Search: Discover how to power your agents and workflows with internal knowledge by building a retrieval-augmented generation system on Azure, a perfect complement to the tools used in this guide.
- AI Agents with n8n: Explore a different, low-code approach to building agentic workflows, providing a valuable comparison of methodologies for process automation.