Building custom clients
From: Developing big data solutions on Microsoft Azure HDInsight
In some cases you may want to implement a custom solution to consume and visualize the big data processing results generated by HDInsight, instead of using an existing or third-party tool. For example, using custom code to consume data from HDInsight is common in scenarios where you need to integrate big data processing into an existing application or service, or where you just want to explore data by using simple scripts. This section of the guide provides some examples of custom clients built using PowerShell and the .NET Framework.
Building custom clients with Windows PowerShell scripts
The Azure module for Windows PowerShell includes a range of cmdlets that you can use to access data generated by HDInsight. You can use these cmdlets to consume data by querying Hive tables or by downloading output files from Azure blob storage, as demonstrated by the following examples:
Considerations for using Windows PowerShell scripts
The following table describes specific considerations for using PowerShell in the HDInsight use cases and models described in this guide.
Use case |
Considerations |
---|---|
Iterative data exploration |
For one-time analysis or iterative exploration of data, PowerShell provides a flexible, easy to use scripting framework that you can use to upload data and scripts, initiate jobs, and consume the results. |
Data warehouse on demand |
Data warehouses are usually queried by reporting clients such as Excel or SQL Server Reporting Services. However, PowerShell can be useful as a tool to quickly test queries. |
ETL automation |
The target of the ETL processes is typically a relational database. While you may use PowerShell to upload source data to Azure blob storage and to initiate the HDInsight jobs that encapsulate the ETL process, it’s unlikely that PowerShell would be an appropriate tool to consume the results. |
BI integration |
In an enterprise BI solution, users generally use established tools such as Excel or SQL Server Reporting Services to visualize data. However, in a similar way to the data warehouse scenario, you may use PowerShell to test queries against Hive tables. |
In addition, consider the following:
- You can run PowerShell scripts interactively in a Windows command line window or in a PowerShell-specific command line console. Additionally, you can edit and run PowerShell scripts in the Windows PowerShell Interactive Scripting Environment (ISE), which provides IntelliSense and other user interface enhancements that make it easier to write PowerShell code.
- You can schedule the execution of PowerShell scripts using Windows Scheduler, SQL Server Agent, or other tools as described in Building end-to-end solutions using HDInsight.
- Before you use PowerShell to work with HDInsight you must configure the PowerShell environment to connect to your Azure subscription. To do this you must first download and install the Azure PowerShell module, which is available through the Web Platform Installer. For more details see How to install and configure Azure PowerShell.
Building custom clients with the .NET Framework
When you need to integrate big data processing into an application or service, you can use .NET Framework code to consume the results of jobs executed in HDInsight. The .NET SDK includes numerous classes for writing custom code that interacts with HDInsight. The following examples demonstrate some common scenarios:
- Using the Microsoft Hive ODBC Driver in a .NET client
- Using LINQ To Hive in a .NET client
- Retrieving job output files with the .NET Framework
Considerations for using the .NET Framework
The following table describes specific considerations for using the .NET Framework to implement custom client applications in the HDInsight use cases and models described in this guide.
Use case |
Considerations |
---|---|
Iterative data exploration |
For one-time analysis or iterative exploration of data, writing a custom client application may be an inefficient way to consume the data unless the team analyzing the data have existing .NET development skills and plan to implement a custom client for a future big data processing solution. |
Data warehouse on demand |
In some cases a big data solution consists of a data warehouse based on HDInsight and a custom application that consumes data from the data warehouse. For example, the goal of a big data project might be to incorporate data from an HDInsight-based data warehouse into an ASP.NET web application. In this case, using the .NET Framework libraries for HDInsight is an appropriate choice. |
ETL automation |
The target of the ETL processes is typically a relational database. You might use a custom .NET application to upload source data to Azure blob storage and initiate the HDInsight jobs that encapsulate the ETL process. |
BI integration |
In an enterprise BI solution, users generally use established tools such as Excel or SQL Server Reporting Services to visualize data. However, you may use the .NET libraries for HDInsight to integrate big data into a custom BI application or business process. |
More information
For information on using PowerShell with HDInsight see HDInsight PowerShell Cmdlets Reference Documentation.
For information on using the HDInsight SDK see HDInsight SDK Reference Documentation and the incubator projects on the CodePlexwebsite.