Uploading data with the Microsoft .NET Framework

patterns & practices Developer Center

From: Developing big data solutions on Microsoft Azure HDInsight

The .NET API for Hadoop WebClient is a component of the .NET SDK for HDInsight that you can add to a project using NuGet. The library includes a range of classes that enable integration with HDInsight and the Azure blob store that hosts the HDFS folder structure that HDInsight uses.

One of these classes is the WebHDFSClient class, which you can use to upload local files to Azure storage for processing by HDInsight. The WebHDFSClient class enables you to treat blob storage like an HDFS volume, navigating blobs within an Azure blob container as if they are directories and files.

The following code example shows how you can use the WebHDFSClient class to upload locally stored data files to Azure blob storage. The example is deliberately kept simple by including the credentials in the code so that you can copy and paste it while you are experimenting with HDInsight. In a production system you must protect credentials, as described in “Securing credentials in scripts and applications” in the Security section of this guide.

using System;
using System.Collections.Generic;
using System.Text;
using System.Threading.Tasks;
using System.IO;

using Microsoft.Hadoop.WebHDFS;
using Microsoft.Hadoop.WebHDFS.Adapters;

namespace DataUploader
{
  class Program
  {
    static void Main(string[] args)
    {
      UploadFiles().Wait();
      Console.WriteLine("Upload complete!");
      Console.WriteLine("Press a key to end");
      Console.Read();
    }

    private static async Task UploadFiles()
    {
      var localDir = new DirectoryInfo(@".\data");
      var hdInsightUser = "user-name";
      var storageName = "storage-account-name";
      var storageKey = "storage-account-key";
      var containerName = "container-name";
      var blobDir = "/data/";

      var hdfsClient = new WebHDFSClient(hdInsightUser,
              new BlobStorageAdapter(storageName, storageKey, containerName, false));

      await hdfsClient.DeleteDirectory(blobDir);

      foreach (var file in localDir.GetFiles())
      {
        Console.WriteLine("Uploading " + file.Name + " to " + blobDir + file.Name + " ...");
        await hdfsClient.CreateFile(file.FullName, blobDir + file.Name);
      }
    }
  }
}

Note that the code uses the DeleteDirectory method to delete all existing blobs in the specified path, and then uses the CreateFile method to upload each file in the local data folder. All of the methods provided by the WebHDFSClient class are asynchronous, enabling you to upload large volumes of data to Azure without blocking the client application.

Next Topic | Previous Topic | Home | Community