Monitoring and logging
From: Developing big data solutions on Microsoft Azure HDInsight
There are several ways to monitor an HDInsight cluster and its operation. These include:
- Using the Azure cluster status page
- Accessing the Hadoop status portals
- Accessing the Hadoop-generated log files
- Accessing Azure storage metrics
- Considerations
Using the Azure cluster status page
The page for an HDInsight cluster in the Azure web management portal displays rudimentary information for the cluster. This includes a dashboard showing information such as the number of map and reduce jobs executed over the previous one or four hours, a range of settings and information about the cluster, and a list of the linked resources such as storage accounts. The cluster status page also contains monitoring information such as the accumulated, maximum, and minimum data for the storage containers and running applications.
The configuration section of the portal page enables you to turn on and off Hadoop services for this cluster and establish a remote desktop connection using RDP to the head node of the cluster. In addition, the portal page contains a link to open the management page for the cluster.
The management page contains three sections. The Hive Editor provides a convenient way to experiment with HiveQL commands and statements such as queries, and view the results. It may prove useful if you are just exploring some data or developing parts of a more comprehensive solution. The Job History section displays a list of jobs you have executed, and some basic information about each one. The File Browser section allows you to view the files stored in the cluster’s Azure blob storage.
Accessing the Hadoop status portals
Hadoop exposes status and monitoring information in two web portals installed on the cluster, accessible remotely and through links named Hadoop Name Node Status and Hadoop YARN Status located on the desktop of the remote cluster head node server. The Hadoop YARN Status portal provides a wide range of information generated by the YARN resource manager, including information about each node in the cluster, the applications (jobs) that are executing or have finished, job scheduler details, current configuration of the cluster, and access to log files and metrics.
The portal also exposes a set of metrics that indicate in great detail the status and performance of each job. These metrics can be used to monitor and fine tune jobs, and to locate errors and issues with your solutions.
Accessing the Hadoop-generated log files
HDInsight stores its log files in both the cluster file system and in Azure storage. You can examine log files in the cluster by opening a remote desktop connection to the cluster and browsing the file system or by using the Hadoop YARN Status****portal on the remote head node server. You can examine the log files in Azure storage using any of the tools that can access and download data from Azure storage. Examples are AZCopy, CloudXplorer, and the Visual Studio Server Explorer. You can also use PowerShell and the Azure Storage Client libraries, or the Azure .NET SDKs, to access data in Azure blob storage.
For a list of suitable tools and technologies for accessing Azure storage see Appendix A - Tools and technologies reference. For examples of accessing Azure storage from custom tools see Building custom clients in the section Consuming and visualizing data from HDInsight of this guide.
Accessing Azure storage metrics
Azure storage can be configured to log storage operations and access. You can use these logs, which contain a wealth of information, for capacity monitoring and planning, and for auditing requests to storage. The information includes latency details, enabling you to monitor and fine tune performance of your solutions.
You can use the .NET SDK for Hadoop to examine the log files generated for the Azure storage that holds the data for an HDInsight cluster. The HDInsight Log Analysis Toolkit is a command-line tool with utilities for downloading and analyzing Azure storage logs. For more information see Microsoft .NET SDK For Hadoop. A series of blogs from the Azure storage team also contains useful information, examples, and a case study—see posts tagged “analytics - logging & metrics” for more details.
Considerations
When implementing monitoring and logging for your solutions, consider the following points:
- As with any remote service or application, managing and monitoring its operation may appear to be more difficult than for a locally installed equivalent. However, remote management and monitoring technologies are widely available, and are an accepted part of almost all administration tasks. In many cases the extension of these technologies to cloud-hosted services and applications is almost seamless.
- Establish a monitoring and logging strategy that can provide useful information for detecting issues early, debugging problematic jobs and processes, and for use in planning. For example, as well as collecting runtime data and events, consider measuring overall performance, cluster load, and other factors that will be useful in planning for data growth and future requirements. The YARN portal in HDInsight, accessible remotely, can provide a wide range of information about performance and events for jobs and for the cluster as a whole.
- Configure logging and manage the log files for all parts of the process, not just the jobs within Hadoop. For example, monitor and log data ingestion and data export where the tools support this, or consider changing to a tool that can provide the required support for logging and monitoring. Many tools and services, such as SSIS and Azure storage, will need to be configured to provide an appropriate level of logging.
- Consider maintaining data lineage tracking by adding an identifier to each log entry, or through other techniques. This allows you to trace back the original source of the data and the operation, and follow it through each stage to understand its consistency and validity.
- Consider how you can collect logs from the cluster, or from more than one cluster, and collate them for purposes such as auditing, monitoring, planning, and alerting. You might use a custom solution to access and download the log files on a regular basis, and combine and analyze them to provide a dashboard-like display with additional capabilities for alerting for security or failure detection. Such utilities could be created using PowerShell, the HDInsight SDKs, or code that accesses the Azure Service Management API.
- Consider if a monitoring solution or service would be a useful benefit. A management pack for HDInsight is available for use with Microsoft System Center (see the Microsoft Download Center for more details). In addition, you can use third-party tools such as Chukwa and Ganglia to collect and centralize logs. Many companies offer services to monitor Hadoop-based big data solutions—some examples are Centerity, Compuware APM, Sematext SPM, and Zettaset Orchestrator.
The following table illustrates how monitoring and logging considerations apply to each of the use cases and models described in this guide.
Use case |
Considerations |
---|---|
Iterative data exploration |
In this model you are typically experimenting with data and do not have a long-term plan for its use, or for the techniques you will discover for finding useful information in the data. Therefore, monitoring is not likely to be a significant concern when using this model. However, you may need to use the logging features of HDInsight to help discover the optimum techniques for processing the data as you refine your investigation, and to debug jobs. |
Data warehouse on demand |
In this model you are likely to have established a regular process for uploading, processing, and consuming data. Therefore, you should consider implementing a monitoring and logging strategy that can detect issues early and assist in resolving them. Typically, if you intend to delete and recreate the cluster on a regular bases, this will require a custom solution using tools that run on the cluster or on-premises rather than using a commercial monitoring service. |
ETL automation |
In this model you may be performing scheduled data transfer operations, and so it is vital to establish a robust monitoring and logging mechanism to detect errors and to measure performance. |
BI integration |
This model is usually part of an organization’s core business functions, and so it is vital to design a strategy that incorporates robust monitoring and logging features, and that can detect failures early as well as providing ongoing data for forward planning. Monitoring for security purposes, alerting, and auditing are likely to be important business requirements in this model. |