Submitting a map/reduce job from a .NET application

patterns & practices Developer Center

From: Developing big data solutions on Microsoft Azure HDInsight

You can use the classes in the Microsoft Azure HDInsight NuGet package to submit map/reduce jobs to an HDInsight cluster from a .NET application. After adding this package you can use the MapReduceJobCreateParameters class to define a map/reduce job with a specified .jar file path and class name. You can then add any required arguments, such as paths for the source data and output directory.

Next you must add code to load your Azure management certificate and use it to create a credential for the HDInsight cluster. You use these credentials with the JobSubmissionClientFactory class to connect to the cluster and create a job submission client object that implements the IJobSubmissionClient interface, and then use the client object’s CreateMapReduceJob method to submit the job you defined earlier. When you submit a job, its unique job ID is returned.

You can leave the job to run, or write code to await its completion and display progress status by examining the JobStatusCode of a JobDetails object retrieved using the job ID. In the following code example the client application checks the job progress every ten seconds until it has completed.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Threading;
using System.IO;
using System.Security.Cryptography.X509Certificates;
using Microsoft.WindowsAzure.Storage;
using Microsoft.WindowsAzure.Storage.Blob;
using Microsoft.WindowsAzure.Management.HDInsight;
using Microsoft.Hadoop.Client;

namespace MRClient
{
  class Program
  {
    static void Main(string[] args)
    {
      // Azure variables.
      string subscriptionID = "subscription-id";
      string certFriendlyName = "certificate-friendly-name";
      string clusterName = "cluster-name";

      // Define the MapReduce job.
      MapReduceJobCreateParameters mrJobDefinition = new MapReduceJobCreateParameters()
      {
        JarFile = "wasb:///mydata/jars/mymapreducecode.jar",
        ClassName = "mymapreduceclass"
      };
      mrJobDefinition.Arguments.Add("wasb:///mydata/source");
      mrJobDefinition.Arguments.Add("wasb:///mydata/Output");

      // Get the certificate object from certificate store
      // using the friendly name to identify it.
      X509Store store = new X509Store();
      store.Open(OpenFlags.ReadOnly);
      X509Certificate2 cert = store.Certificates.Cast<X509Certificate2>()
        .First(item => item.FriendlyName == certFriendlyName);
      JobSubmissionCertificateCredential creds = new JobSubmissionCertificateCredential(
        new Guid(subscriptionID), cert, clusterName);

      // Create a Hadoop client to connect to HDInsight.
      var jobClient = JobSubmissionClientFactory.Connect(creds);

      // Run the MapReduce job.
      JobCreationResults mrJobResults = jobClient.CreateMapReduceJob(mrJobDefinition);

      // Wait for the job to complete.
      Console.Write("Job running...");
      JobDetails jobInProgress = jobClient.GetJob(mrJobResults.JobId);
      while (jobInProgress.StatusCode != JobStatusCode.Completed 
        && jobInProgress.StatusCode != JobStatusCode.Failed)
      {
        Console.Write(".");
        jobInProgress = obClient.GetJob(jobInProgress.JobId);
        Thread.Sleep(TimeSpan.FromSeconds(10));
      }
      // Job is complete.
      Console.WriteLine("!");
      Console.WriteLine("Job complete!");
      Console.WriteLine("Press a key to end.");
      Console.Read();
    }
  }
}

Notice the variables required to configure the Hadoop client. These include the unique ID of the subscription in which the cluster is defined (which you can view in the Azure management portal), the friendly name of the Azure management certificate to be loaded (which you can view in certmgr.msc), and the name of your HDInsight cluster.

The Microsoft Azure HDInsight NuGet package also includes a StreamingMapReduceJobCreateParameters class, which you can use to submit a streaming map/reduce job that uses .NET executable assemblies to implement the mapper and reducer for the job.

Next Topic | Previous Topic | Home | Community