Automating cluster management with PowerShell

patterns & practices Developer Center

From: Developing big data solutions on Microsoft Azure HDInsight

You can use Windows PowerShell to create an HDInsight cluster by executing PowerShell commands interactively, or by creating a PowerShell script that can be executed when required.

Before you use PowerShell to work with HDInsight you must configure the PowerShell environment to connect to your Azure subscription. To do this you must first download and install the Azure PowerShell module, which is available through the Microsoft Web Platform Installer. For more details see How to install and configure Azure PowerShell.

Creating a cluster with the default configuration

When using PowerShell to create an HDInsight cluster, you use the New-AzureHDInsightCluster cmdlet and specify the following configuration settings to create a cluster with the default settings for Hadoop services:

  • A globally unique name for the cluster.
  • The geographical region where you want to create the cluster.
  • The Azure storage account to be used by the cluster.
  • The access key for the storage account.
  • The blob container in the storage account to be used by the cluster.
  • The number of data nodes to be created in the cluster.
  • The credentials to be used for administrative access to the cluster.
  • The version of HDInsight to be used.

If you do not intend to use an existing Azure storage account, you can create a new one with a globally unique name using the New-AzureStorageAccount cmdlet, and then create a new blob container with the New-AzureStorageContainer cmdlet. Many Azure services require a globally unique name. You can determine if a specific name is already in use by an Azure service by using the Test-AzureName cmdlet.

The following code example creates an Azure storage account and an HDInsight cluster in the Southeast Asia region (note that each command should be on a single, unbroken line). The example is deliberately kept simple by including the credentials in the script so that you can copy and paste the code while you are experimenting with HDInsight. In a production system you must protect credentials, as described in “Securing credentials in scripts and applications” in the Security section of this guide.

$storageAccountName = "unique-storage-account-name"
$containerName = "container-name"
$clusterName = "unique-cluster-name"
$userName = "user-name"
$password = ConvertTo-SecureString "password" -AsPlainText -Force
$location = "Southeast Asia"
$clusterNodes = 4

# Create a storage account.
Write-Host "Creating storage account..."
New-AzureStorageAccount -StorageAccountName $storageAccountName -Location $location
$storageAccountKey = Get-AzureStorageKey $storageAccountName | %{ $_.Primary }
$destContext = New-AzureStorageContext -StorageAccountName $storageAccountName -StorageAccountKey $storageAccountKey  

# Create a Blob storage container.
Write-Host "Creating container..."
New-AzureStorageContainer -Name $containerName -Context $destContext

# Create a cluster.
Write-Host "Creating HDInsight cluster..."
$credential = New-Object System.Management.Automation.PSCredential ($userName, $password)
New-AzureHDInsightCluster -Name $clusterName -Location $location -DefaultStorageAccountName "$storageAccountName.blob.core.windows.net" 
  -DefaultStorageAccountKey $storageAccountKey -DefaultStorageContainerName $containerName 
  -ClusterSizeInNodes $clusterNodes -Credential $credential -Version 3.0
Write-Host "Finished!"

Notice that this script uses the Convert-To-SecureString function to encrypt the password in memory. The password and the user name are passed to the New-Object cmdlet to create a PSCredential object for the cluster credentials. Notice also that the access key for the storage account is obtained using the Get-AzureStorageKey cmdlet.

Creating a cluster with a customized configuration

The previous example creates a new HDInsight cluster with default configuration settings. If you require a more customized cluster configuration, you can use the New-AzureHDInsightClusterConfig cmdlet to create a base configuration for a cluster with a specified number of nodes. You can then use the following cmdlets to define the settings you want to apply to your cluster:

  • Set-AzureHDInsightDefaultStorage: Specify the storage account and blob container to be used by the cluster.
  • Add-AzureHDInsightStorage: Specify an additional storage account that the cluster can use.
  • Add-AzureHDInsightMetastore: Specify a custom Azure SQL Database instance to host Hive and Oozie metadata.
  • Add-AzureHDInsightConfigValues: Add specific configurations settings for HDFS, map/reduce, Hive, Oozie, or other Hadoop technologies in the cluster.

After you have added the required configuration settings, you can pass the cluster configuration variable returned by New-AzureHDInsightClusterConfig to the New-AzureHDInsightCluster cmdlet to create the cluster.

You can also specify a folder to store shared libraries and upload these so that they are available for use in HDInsight jobs. Examples include UDFs for Hive and Pig, or custom SerDe components for use in Avro. For more information see the section “Create cluster with custom Hadoop configuration values and shared libraries” in the topic Microsoft .NET SDK For Hadoop on the CodePlex website.

For more information about using PowerShell to manage an HDInsight cluster see the HDInsight PowerShell Cmdlets Reference Documentation.

Deleting a cluster

When you have finished with the cluster you can use the Remove-AzureHDInsightCluster cmdlet to delete it. If you are also finished with the storage account, you can delete it after the cluster has been deleted by using the Remove-AzureStorageAccount cmdlet.

The following code example shows a PowerShell script to delete an HDInsight cluster and the storage account it was using.

$storageAccountName = "storage-account-name"
$clusterName = "cluster-name"

# Delete HDInsight cluster.
Write-Host "Deleting $clusterName HDInsight cluster..."
Remove-AzureHDInsightCluster -Name $clusterName

# Delete storage account.
Write-Host "Deleting $storageAccountName storage account..."
Remove-AzureStorageAccount -StorageAccountName $storageAccountName

Next Topic | Previous Topic | Home | Community