Deploy observability resources and set up logs
Observability provides visibility into every layer of your Azure IoT Operations configuration. It gives you insight into the actual behavior of issues, which increases the effectiveness of site reliability engineering. Azure IoT Operations offers observability through custom curated Grafana dashboards that are hosted in Azure. These dashboards are powered by Azure Monitor managed service for Prometheus and by Container Insights. This guide shows you how to set up Azure Managed Prometheus and Grafana and enable monitoring for your Azure Arc cluster.
Complete the steps in this article before deploying Azure IoT Operations to your cluster.
Prerequisites
- An Arc-enabled Kubernetes cluster.
- Azure CLI installed on your cluster machine. For instructions, see How to install the Azure CLI.
- Helm installed on your cluster machine. For instructions, see Install Helm.
- Kubectl installed on your cluster machine. For instructions, see Install Kubernetes tools.
Create resources in Azure
Register providers with the subscription where your cluster is located.
Note
This step only needs to be run once per subscription. To register resource providers, you need permission to do the
/register/action
operation, which is included in subscription Contributor and Owner roles. For more information, see Azure resource providers and types.az account set -s <SUBSCRIPTION_ID> az provider register --namespace Microsoft.AlertsManagement az provider register --namespace Microsoft.Monitor az provider register --namespace Microsoft.Dashboard az provider register --namespace Microsoft.Insights az provider register --namespace Microsoft.OperationalInsights
Install Azure CLI extensions for Metrics collection for Azure Arc-enabled clusters and Azure Managed Grafana.
az extension add --name k8s-extension az extension add --name amg
Create an Azure Monitor workspace to enable metric collection for your Azure Arc-enabled Kubernetes cluster.
az monitor account create --name <WORKSPACE_NAME> --resource-group <RESOURCE_GROUP> --location <LOCATION> --query id -o tsv
Save the Azure Monitor workspace ID from the output of this command. You use the ID when you enable metrics collection in the next section.
Create an Azure Managed Grafana instance to visualize your Prometheus metrics.
az grafana create --name <GRAFANA_NAME> --resource-group <RESOURCE_GROUP> --query id -o tsv
Save the Grafana ID from the output of this command. You use the ID when you enable metrics collection in the next section.
Create a Log Analytics workspace for Container Insights.
az monitor log-analytics workspace create -g <RESOURCE_GROUP> -n <LOGS_WORKSPACE_NAME> --query id -o tsv
Save the Log Analytics workspace ID from the output of this command. You use the ID when you enable metrics collection in the next section.
Enable metrics collection for the cluster
Update the Azure Arc cluster to collect metrics and send them to the previously created Azure Monitor workspace. You also link this workspace with the Grafana instance.
az k8s-extension create --name azuremonitor-metrics --cluster-name <CLUSTER_NAME> --resource-group <RESOURCE_GROUP> --cluster-type connectedClusters --extension-type Microsoft.AzureMonitor.Containers.Metrics --configuration-settings azure-monitor-workspace-resource-id=<AZURE_MONITOR_WORKSPACE_ID> grafana-resource-id=<GRAFANA_ID>
Enable Container Insights logs for logs collection.
az k8s-extension create --name azuremonitor-containers --cluster-name <CLUSTER_NAME> --resource-group <RESOURCE_GROUP> --cluster-type connectedClusters --extension-type Microsoft.AzureMonitor.Containers --configuration-settings logAnalyticsWorkspaceResourceID=<LOG_ANALYTICS_WORKSPACE_ID>
Once these steps are completed, you have both Azure Monitor and Grafana set up and linked to your cluster for observability and metric collection.
Deploy OpenTelemetry Collector
Define and deploy an OpenTelemetry (OTel) Collector to your Arc-enabled Kubernetes cluster.
Create a file called
otel-collector-values.yaml
and paste the following code into it to define an OpenTelemetry Collector:mode: deployment fullnameOverride: aio-otel-collector image: repository: otel/opentelemetry-collector tag: 0.107.0 config: processors: memory_limiter: limit_percentage: 80 spike_limit_percentage: 10 check_interval: '60s' receivers: jaeger: null prometheus: null zipkin: null otlp: protocols: grpc: endpoint: ':4317' http: endpoint: ':4318' exporters: prometheus: endpoint: ':8889' resource_to_telemetry_conversion: enabled: true add_metric_suffixes: false service: extensions: - health_check pipelines: metrics: receivers: - otlp exporters: - prometheus logs: null traces: null telemetry: null extensions: memory_ballast: size_mib: 0 resources: limits: cpu: '100m' memory: '512Mi' ports: metrics: enabled: true containerPort: 8889 servicePort: 8889 protocol: 'TCP' jaeger-compact: enabled: false jaeger-grpc: enabled: false jaeger-thrift: enabled: false zipkin: enabled: false
In the
otel-collector-values.yaml
file, make a note of the following values that you use in theaz iot ops create
command when you deploy Azure IoT Operations on the cluster:- fullnameOverride
- grpc.endpoint
- check_interval
Save and close the file.
Deploy the collector by running the following commands:
kubectl get namespace azure-iot-operations || kubectl create namespace azure-iot-operations helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts helm repo update helm upgrade --install aio-observability open-telemetry/opentelemetry-collector -f otel-collector-values.yaml --namespace azure-iot-operations
Configure Prometheus metrics collection
Configure Prometheus metrics collection on your cluster.
Create a file named
ama-metrics-prometheus-config.yaml
and paste the following configuration:apiVersion: v1 data: prometheus-config: |2- scrape_configs: - job_name: otel scrape_interval: 1m static_configs: - targets: - aio-otel-collector.azure-iot-operations.svc.cluster.local:8889 - job_name: aio-annotated-pod-metrics kubernetes_sd_configs: - role: pod relabel_configs: - action: drop regex: true source_labels: - __meta_kubernetes_pod_container_init - action: keep regex: true source_labels: - __meta_kubernetes_pod_annotation_prometheus_io_scrape - action: replace regex: ([^:]+)(?::\\d+)?;(\\d+) replacement: $1:$2 source_labels: - __address__ - __meta_kubernetes_pod_annotation_prometheus_io_port target_label: __address__ - action: replace source_labels: - __meta_kubernetes_namespace target_label: kubernetes_namespace - action: keep regex: 'azure-iot-operations' source_labels: - kubernetes_namespace scrape_interval: 1m kind: ConfigMap metadata: name: ama-metrics-prometheus-config namespace: kube-system
Apply the configuration file by running the following command:
kubectl apply -f ama-metrics-prometheus-config.yaml
Deploy dashboards to Grafana
Azure IoT Operations provides a sample dashboard designed to give you many of the visualizations you need to understand the health and performance of your Azure IoT Operations deployment.
Complete the following steps to install the Azure IoT Operations curated Grafana dashboards.
Clone or download the azure-iot-operations repository to get the sample Grafana Dashboard json file locally: https://github.com/Azure/azure-iot-operations.
Sign in to the Grafana console. You can access the console through the Azure portal or use the
az grafana show
command to retrieve the URL.az grafana show --name <GRAFANA_NAME> --resource-group <RESOURCE_GROUP> --query url -o tsv
In the Grafana application, select the + icon.
Select Import dashboard.
Browse to the sample dashboard directory in your local copy of the Azure IoT Operations repository, azure-iot-operations > samples > grafana-dashboard, then select the
aio.sample.json
dashboard file.When the application prompts, select your managed Prometheus data source.
Select Import.