Workload management and automation

Article
06/21/2024

This article helps you understand the workload management and automation capability within the FinOps Framework and how to implement that in the Microsoft Cloud.

Definition

Workload management and automation refers to running resources only when necessary and at the level or capacity needed for the active workload.

Tag resources based on their up-time requirements. Review your resource usage patterns and determine if they can be scaled down or even shutdown (to stop billing) during off-peak hours. To reduce costs, consider cheaper alternatives.

An effective workload management and automation plan can significantly reduce costs by adjusting configuration to match supply to demand dynamically, ensuring the most effective utilization.

Getting started

When you first start working with a service, consider the following points:

Can the service be stopped (and if so, stop billing)?
- If the service can't be stopped, review alternatives to determine if there are any options that can be stopped to stop billing.
- Pay close attention to noncompute charges that might continue to be billed when a resource is stopped so you're not surprised. Storage is a common example of a cost that continues to be charged even if a compute resource that was using the storage is no longer running.
Does the service support serverless compute?
- Serverless compute tiers can reduce costs when not active. Some examples: Azure SQL Database, Azure SignalR Service, Cosmos DB, Synapse Analytics, Azure Databricks.
Does the service support autostop or autoshutdown functionality?
- Some services support autostop natively, like Microsoft Dev Box, Azure DevTest Labs, Azure Lab Services, and Azure Load Testing.
- If you use a service that supports being stopped, but not autostopping, consider using a lightweight flow in Power Automate or Logic Apps.
Does the service support autoscaling?
- If the service supports autoscaling, configure it to scale based on your application's needs.
- Autoscaling can work with autostop behavior for maximum efficiency.
To avoid unnecessary costs, consider automatically stopping and manually starting nonproduction resources during work hours.
Avoid automatically starting nonproduction resources that aren't used every day.
If you choose to autostart, be aware of vacations and holidays where resources might get started automatically but not be used.
Consider tagging manually stopped resources. To ensure all resources are stopped, Save a query in Azure Resource Graph or a view in the All resources list and pin it to the Azure portal dashboard.
Consider architectural models such as containers and serverless to only use resources when they're needed, and to drive maximum efficiency in key services.

Building on the basics

At this point, you have setup autoscaling and autostop behaviors. As you move beyond the basics, consider the following points:

Automate the process of automatically scaling or stopping resources that don't support it or have more complex requirements.
- Consider using automation services, like Azure Automation or Azure Functions.
Assign an "Env" or Environment tag to identify which resources are for development, testing, staging, production, etc.
- Prefer assigning tags at a subscription or resource group level. Then enable the tag inheritance policy for Azure Policy and Cost Management tag inheritance to cover resources that don't emit tags with usage data.
- Consider setting up automated scripts to stop resources with specific up-time profiles (for example, stop developer VMs during off-peak hours if they weren't used in 2 hours).
- Document up-time expectations based on specific tag values and what happens when the tag isn't present.
- Use Azure Policy to track compliance with the tag policy.
- Use Azure Policy to enforce specific configuration rules based on environment.
- Consider using "override" tags to bypass the standard policy when needed. To ensure accountability, track the cost and report them to stakeholders.
Consider establishing and tracking KPIs for low-priority workloads, like development servers.

Learn more at the FinOps Foundation

This capability is a part of the FinOps Framework by the FinOps Foundation, a non-profit organization dedicated to advancing cloud cost management and optimization. For more information about FinOps, including useful playbooks, training and certification programs, and more, see the Workload management and automation capability article in the FinOps Framework documentation.

You can also find related videos on the FinOps Foundation YouTube channel:

Related FinOps capabilities:

Share via

Workload management and automation

Definition

Getting started

Building on the basics

Learn more at the FinOps Foundation

Feedback

Feedback

Additional resources

Share via

Workload management and automation

Definition

Getting started

Building on the basics

Learn more at the FinOps Foundation

Related content

Feedback

Feedback

Additional resources