Automating cluster management in a .NET application
From: Developing big data solutions on Microsoft Azure HDInsight
When you need to integrate cluster management into an application or service, you can use the .NET SDK for HDInsight to provision and delete clusters as required. Adding the Microsoft Azure HDInsight NuGet package to a project makes classes and interfaces in the Microsoft.WindowsAzure.Management.HDInsight namespace available, and you can use these to provision and manage HDInsight clusters.
Many of the techniques used to initiate jobs from a .NET application require the use of an Azure management certificate to authenticate the request. To obtain a certificate you can:
- Use the makecert command in a Visual Studio command line to create a certificate and upload it to your subscription in the Azure management portal as described in Create and Upload a Management Certificate for Azure.
- Use the Get-AzurePublishSettingsFile and Import-AzurePublishSettingsFile Windows PowerShell cmdlets to generate, download, and install a new certificate from your Azure subscription as described in the section How to: Connect to your subscription of the topic How to install and configure Azure PowerShell. If you want to use the same certificate on more than one client computer you can copy the Azure publishsettings file to each one and use the Import-AzurePublishSettingsFile cmdlet to import it.
After you have created and installed your certificate, it will be stored in the Personal certificate store on your computer. You can view the details by using the certmgr.msc console.
To create a cluster programmatically, you must create an instance of the ClusterCreateParameters class, specifying the following information:
- A globally unique name for the cluster.
- The geographical region where you want to create the cluster.
- The default Azure storage account to be used by the cluster.
- The access key for the storage account.
- The blob container in the storage account to be used by the cluster.
- The number of data nodes to be created in the cluster.
- The credentials to be used for administrative access to the cluster.
- The version of HDInsight to be used.
After you have created the initial ClusterCreateParameters class, you can optionally customize the default HDInsight configuration settings by using the following properties:
- AdditionalStorageAccounts: Use this property to enable the cluster to access to additional Azure storage accounts if required.
- CoreConfiguration: Specify a ConfigValuesCollection object that contains custom Hadoop configuration settings as key/value pairs.
- HdfsConfiguration: Specify a ConfigValuesCollection object that contains custom HDFS configuration settings as key/value pairs.
- HiveConfiguration: Specify a ConfigValuesCollection object that contains custom Hive configuration settings as key/value pairs.
- HiveMetastore: Specify a custom Azure SQL Database instance in which to store Hive metadata.
- MapReduceConfiguration: Specify a ConfigValuesCollection object that contains custom map/reduce configuration settings as key/value pairs.
- OozieConfiguration: Specify a ConfigValuesCollection object that contains custom Oozie configuration settings as key/value pairs.
- OozieMetastore: Specify a custom Azure SQL Database instance in which to store Oozie metadata.
- YarnConfiguration: Specify a ConfigValuesCollection object that contains custom YARN configuration settings as key/value pairs.
When you are ready to create the cluster, you must use a locally stored Azure management certificate to create an HDInsightCertificateCredential object and then use this object with the HDInsightClient static class to connect to Azure and create a client object based on the IHDInsightClient interface. The IHDInsightClient interface provides the CreateCluster method that you can use to create an HDInsight cluster synchronously, and a CreateClusterAsync method you can use to create the cluster asynchronously.
The following code example shows a simple console application that creates an HDInsight cluster using an existing Azure storage account and container. The example is deliberately kept simple by including the credentials in the code so that you can copy and paste it while you are experimenting with HDInsight. In a production system you must protect credentials, as described in “Securing credentials in scripts and applications” in the Security section of this guide.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Security.Cryptography.X509Certificates;
using Microsoft.WindowsAzure.Management.HDInsight;
using Microsoft.WindowsAzure.Management.HDInsight.ClusterProvisioning;
namespace ClusterMgmt
{
class Program
{
static void Main(string[] args)
{
string subscriptionId = "subscription-id";
string certFriendlyName = "certificate-friendly-name";
string clusterName = "unique-cluster-name";
string storageAccountName = "storage-account-name";
string storageAccountKey = "storage-account-key";
string containerName = "container-name";
string userName = "user-name";
string password = "password";
string location = "Southeast Asia";
int clusterSize = 4;
// Get the certificate object from certificate store
// using the friendly name to identify it.
X509Store store = new X509Store();
store.Open(OpenFlags.ReadOnly);
X509Certificate2 cert = store.Certificates.Cast<X509Certificate2>()
.First(item => item.FriendlyName == certFriendlyName);
// Create an HDInsightClient object.
HDInsightCertificateCredential creds = new HDInsightCertificateCredential(new Guid(subscriptionId), cert);
var client = HDInsightClient.Connect(creds);
// Supply cluster information.
ClusterCreateParameters clusterInfo = new ClusterCreateParameters()
{
Name = clusterName,
Location = location,
DefaultStorageAccountName = storageAccountName + ".blob.core.windows.net",
DefaultStorageAccountKey = storageAccountKey,
DefaultStorageContainer = containerName,
UserName = userName,
Password = password,
ClusterSizeInNodes = clusterSize,
Version = "3.0"
};
// Create the cluster.
Console.WriteLine("Creating the HDInsight cluster ...");
ClusterDetails cluster = client.CreateCluster(clusterInfo);
Console.WriteLine("Created cluster: {0}.", cluster.ConnectionUrl);
Console.WriteLine("Press a key to end.");
Console.Read();
}
}
}
Note that this example uses a pre-existing Azure storage account and container, which must be hosted in the same geographical region as the cluster (in this case, Southeast Asia).
To delete a cluster you can use the DeleteCluster method of the HDInsightClient class.
For more information about using the .NET SDK for HDInsight to provision and delete HDInsight clusters see HDInsight SDK Reference Documentation.