Your First HDInsight Cluster–Step by Step
Small Bites of Big Data from AZURECAT
Big Data Tech Training Series #1
Cindy Gross | Murshed Zaman
Sometimes it is just hard to get started. Have you been putting off your first foray into Hadoop? Are you not sure where to begin? Let’s get really basic.
Prerequisites:
- Azure subscription (free trials available)
- Install Azure PowerShell Tools and Windows Azure HDInsight PowerShell
- Read Get Started with Windows Azure Cmdlets
Log on to the Windows Azure Portal https://manage.windowsazure.com
Go to storage https://manage.windowsazure.com/#Workspaces/StorageExtension/storage. Create a storage account in a location that is available to HDInsight (as of November 2013 that’s East US, West US, and North Europe). Do NOT choose an affinity group. If you choose to “Enable Geo-Replication” there will be an extra charge – it’s probably not necessary for a demo/test account as you have a limited amount of credit in the trial subscription. In the portal choose the STORAGE icon on the left. Then click on +NEW at the bottom. That opens a QUICK CREATE window. Enter a unique name for your storage, such as sqlcatwomanrules. It only allows lower case letters and numbers.
Now click on the HDInsight icon just below the storage icon storage. Choose QUICK CREATE. Enter a unique name for your HDInsight cluster. For a demo choose 4 data nodes. Enter a password that contains upper and lower case letters, a number, and a special character. Choose the storage account you created above. Once you click on “CREATE HDINSIGHT CLUSTER” it will take several minutes for the cluster to be deployed.
Once it completes you are ready to use your cluster!
If you won’t be using the cluster right away, go ahead and delete it (look for the icon at the bottom of the portal) to save compute time and money. You can easily recreate it when you need it.
Look for more blogs soon on customizing your cluster with CUSTOM CREATE or PowerShell and on automating deployment and jobs with PowerShell. In the meantime see if you can get Invoke-Hive working from PowerShell for some simple Hive commands such as:
Invoke-Hive "select * from hivesampletable limit 10"
Big Data Technical Series: