HDInsight Storm Topology Submission Via VNet

1. Introduction

To submit a Storm topology to an HDInsight cluster, a user can RDP to the headnode of the cluster and run storm command. This is not always convenient. It is actually possible to submit a Storm topology from outside of an HDInsight cluster. The idea is to create an HDInsight Storm cluster with a configured Virtual Networks (VNet), and submit Storm topology from a machine that is connected to the VNet.

There are many types of VNet so with VNet support we can actually submit topology from Azure VM, other Azure services, on-premises infrastructure or developer boxes.

To show case the idea, I’ll show you how to use an Azure VM to submit a Storm topology via VNet.

2. Step-by-step Instructions

1) Create a Cloud-Only VNet in Azure portal. You can use the “QUICK CREATE” button.

  

 

2) Create a VM using the created VNet, this will be the VM from which we submit Storm topology. You need to use “FROM GALLERY” button and on page 4 you need to choose the VNet that we just created.

 

3) Create a Storm cluster using the created VNet. Note you need to use “custom create” and specify the VNet name in Region/Virtual Network section.

  

4) Find out the FQDN of the active head node of HDInsight Storm cluster using REST API.

This is a Powershell script to help you get the FQDN of the active head node:

function Get-ActiveFQDN(
    [String]
    [Parameter( Position=0, Mandatory=$true )]
    $ClusterDnsName,
    [String]
    [Parameter( Position=1, Mandatory=$true )]
    $Username,
    [String]
    [Parameter( Position=2, Mandatory=$true )]
    $Password)
{
    $DnsSuffix = ".azurehdinsight.net"
    $ClusterFQDN = $ClusterDnsName + $DnsSuffix
    $webclient = new-object System.Net.WebClient
    $webclient.Credentials = new-object System.Net.NetworkCredential($Username, $Password)
    $Url = "https://" + $ClusterFQDN + "/clusteravailability/status"
    $Response = $webclient.DownloadString($Url)
    $JsonObject = $Response | ConvertFrom-Json
    Write-host $JsonObject.LeaderDnsName
}

  
This script will print out something like this:

headnode1.<clusterdnsname>.b1.internal.cloudapp.net

5) RDP to the Azure VM we just created. Copy Storm bits from HDInsight head node (c:\apps\dist\storm-xxx or %STORM_HOME%) to Azure VM (let’s say we copy to c:\storm folder); Install Java 1.7 runtime (either Oracle or OpenJDK is fine).

6) On the Azure VM, make sure the following configurations (environment variable and Storm configurations) are correctly set:

Environment variable:

    JAVA_HOME = "<your java installation path>"

storm.yaml (c:\storm\conf\storm.yaml):

    nimbus.host: headnode1.<clusterdnsname>.b1.internal.cloudapp.net

7) On the Azure VM, submit a Storm topology using storm.cmd command line like this:

C:\storm\bin>storm jar ..\contrib\storm-starter\storm-starter-<version>-jar-with-dependencies.jar storm.starter.WordCountTopology wordcountSampleTopology

Then on the Azure VM you can manage topology status using Storm UI web page (start IE and enter the address like this):

https://headnode1.<clusterdnsname>.b1.internal.cloudapp.net:8772/

This is how easy it is. Enjoy storming!