Durable Functions best practices and diagnostic tools
This article details some best practices when using Durable Functions. It also describes various tools to help diagnose problems during development, testing, and production use.
Best practices
Use the latest version of the Durable Functions extension and SDK
There are two components that a function app uses to execute Durable Functions. One is the Durable Functions SDK that allows you to write orchestrator, activity, and entity functions using your target programming language. The other is the Durable extension, which is the runtime component that actually executes the code. With the exception of .NET in-process apps, the SDK and the extension are versioned independently.
Staying up to date with the latest extension and SDK ensures your application benefits from the latest performance improvements, features, and bug fixes. Upgrading to the latest versions also ensures that Microsoft can collect the latest diagnostic telemetry to help accelerate the investigation process when you open a support case with Azure.
- See Upgrade durable functions extension version for instructions on getting the latest extension version.
- To ensure you're using the latest version of the SDK, check the package manager of the language you're using.
Adhere to Durable Functions code constraints
The replay behavior of orchestrator code creates constraints on the type of code that you can write in an orchestrator function. An example of a constraint is that your orchestrator function must use deterministic APIs so that each time it’s replayed, it produces the same result.
Note
The Durable Functions Roslyn Analyzer is a live code analyzer that guides C# users to adhere to Durable Functions specific code constraints. See Durable Functions Roslyn Analyzer for instructions on how to enable it on Visual Studio and Visual Studio Code.
Familiarize yourself with your programming language's Azure Functions performance settings
Using default settings, the language runtime you select may impose strict concurrency restrictions on your functions. For example: only allowing 1 function to execute at a time on a given VM. These restrictions can usually be relaxed by fine tuning the concurrency and performance settings of your language. If you're looking to optimize the performance of your Durable Functions application, you will need to familiarize yourself with these settings.
Below is a non-exhaustive list of some of the languages that often benefit from fine tuning their performance and concurrency settings, and their guidelines for doing so.
Guarantee unique Task Hub names per app
Multiple Durable Function apps can share the same storage account. By default, the name of the app is used as the task hub name, which ensures that accidental sharing of task hubs won't happen. If you need to explicitly configure task hub names for your apps in host.json, you must ensure that the names are unique. Otherwise, the multiple apps will compete for messages, which could result in undefined behavior, including orchestrations getting unexpectedly "stuck" in the Pending or Running state.
The only exception is if you deploy copies of the same app in multiple regions; in this case, you can use the same task hub for the copies.
Follow guidance when deploying code changes to running orchestrators
It's inevitable that functions will be added, removed, and changed over the lifetime of an application. Examples of common breaking changes include changing activity or entity function signatures and changing orchestrator logic. These changes are a problem when they affect orchestrations that are still running. If deployed incorrectly, code changes could lead to orchestrations failing with a non-deterministic error, getting stuck indefinitely, performance degradation, etc. Refer to recommended mitigation strategies when making code changes that may impact running orchestrations.
Keep function inputs and outputs as small as possible
You can run into memory issues if you provide large inputs and outputs to and from Durable Functions APIs.
Inputs and outputs to Durable Functions APIs are serialized into the orchestration history. This means that large inputs and outputs can, over time, greatly contribute to an orchestrator history growing unbounded, which risks causing memory exceptions during replay.
To mitigate the impact of large inputs and outputs to APIs, you may choose to delegate some work to sub-orchestrators. This helps load balance the history memory burden from a single orchestrator to multiple ones, therefore keeping the memory footprint of individual histories small.
That said the best practice for dealing with large data is to keep it in external storage and to only materialize that data inside Activities, when needed. When taking this approach, instead of communicating the data itself as inputs and/or outputs of Durable Functions APIs, you can pass in some lightweight identifier that allows you to retrieve that data from external storage when needed in your Activities.
Keep Entity data small
Just like for inputs and outputs to Durable Functions APIs, if an entity's explicit state is too large, you may run into memory issues. In particular, an Entity state needs to be serialized and de-serialized from storage on any request, so large states add serialization latency to each invocation. Therefore, if an Entity needs to track large data, it's recommended to offload the data to external storage and track some lightweight identifier in the entity that allows you to materialize the data from storage when needed.
Fine tune your Durable Functions concurrency settings
A single worker instance can execute multiple work items concurrently to increase efficiency. However, processing too many work items concurrently risks exhausting resources like CPU capacity, network connections, etc. In many cases, this shouldn’t be a concern because scaling and limiting work items are handled automatically for you. That said, if you’re experiencing performance issues (such as orchestrators taking too long to finish, are stuck in pending, etc.) or are doing performance testing, you could configure concurrency limits in the host.json file.
Note
This is not a replacement for fine-tuning the performance and concurrency settings of your language runtime in Azure Functions. The Durable Functions concurrency settings only determine how much work can be assigned to a given VM at a time, but it does not determine the degree of parallelism in processing that work inside the VM. The latter requires fine-tuning the language runtime performance settings.
Use unique names for your external events
As with activity functions, external events have an at-least-once delivery guarantee. This means that, under certain rare conditions (which may occur during restarts, scaling, crashes, etc.), your application may receive duplicates of the same external event. Therefore, we recommend that external events contain an ID that allows them to be manually de-duplicated in orchestrators.
Note
The MSSQL storage provider consumes external events and updates orchestrator state transactionally, so in that backend there should be no risk of duplicate events, unlike with the default Azure Storage storage provider. That said, it is still recommended that external events have unique names so that code is portable across backends.
Invest in stress testing
As with anything performance related, the ideal concurrency settings and architechture of your app ultimately depends on your application's workload. Therefore, it's recommended that users to invest in a performance testing harness that simulates their expected workload and to use it to run performance and reliability experiments for their app.
Avoid sensitive data in inputs, outputs, and exceptions
Inputs and outputs (including exceptions) to and from Durable Functions APIs are durably persisted in your storage provider of choice. If those inputs, outputs, or exceptions contain sensitive data (such as secrets, connection strings, personally identifiable information, etc.) then anyone with read access to your storage provider's resources would be able to obtain them. To safely deal with sensitive data, it is recommended for users to fetch that data within activity functions from either Azure Key Vault or environment variables, and to never communicate that data directly to orchestrators or entities. That should help prevent sensitive data from leaking into your storage resources.
Note
This guidance also applies to the CallHttp
orchestrator API, which also persists its request and response payloads in storage. If your target HTTP endpoints require authentication, which may be sensitive, it is recommended that users implement the HTTP Call themselves inside of an activity, or to use the built-in managed identity support offered by CallHttp
, which does not persist any credentials to storage.
Tip
Similarly, avoid logging data containing secrets as anyone with read access to your logs (for example in Application Insights), would be able to obtain those secrets.
Diagnostic tools
There are several tools available to help you diagnose problems.
Durable Functions and Durable Task Framework Logs
Durable Functions Extension
The Durable extension emits tracking events that allow you to trace the end-to-end execution of an orchestration. These tracking events can be found and queried using the Application Insights Analytics tool in the Azure portal. The verbosity of tracking data emitted can be configured in the logger
(Functions 1.x) or logging
(Functions 2.0) section of the host.json file. See configuration details.
Durable Task Framework
Starting in v2.3.0 of the Durable extension, logs emitted by the underlying Durable Task Framework (DTFx) are also available for collection. See details on how to enable these logs.
Azure portal
Diagnose and solve problems
Azure Function App Diagnostics is a useful resource on Azure portal for monitoring and diagnosing potential issues in your application. It also provides suggestions to help resolve problems based on the diagnosis. See Azure Function App Diagnostics.
Durable Functions Orchestration traces
Azure portal provides orchestration trace details to help you understand the status of each orchestration instance and trace the end-to-end execution. When you look at the list of functions inside your Azure Functions app, you'll see a Monitor column that contains links to the traces. You need to have Applications Insights enabled for your app to get this information.
Durable Functions Monitor Extension
This is a Visual Studio Code extension that provides a UI for monitoring, managing, and debugging your orchestration instances.
Roslyn Analyzer
The Durable Functions Roslyn Analyzer is a live code analyzer that guides C# users to adhere to Durable Functions specific code constraints. See Durable Functions Roslyn Analyzer for instructions on how to enable it on Visual Studio and Visual Studio Code.
Support
For questions and support, you may open an issue in one of the GitHub repos below. When reporting a bug in Azure, including information such as affected instance IDs, time ranges in UTC showing the problem, the application name (if possible) and deployment region will greatly speed up investigations.