N How to create HDInsight Interactive query cluster with aditional storage account?

Federico Sardo 91 Reputation points
2023-02-14T15:59:33.2366667+00:00

Hi community,

I am new with HDInsight, I am asking for help regarding this situation:

Pre conditions:

  • I have a data lake gen 2 (hierarchical namespace enable) with my business data. ( csv and parquet files)

I need to create 2 clusters. Interactive Query and Spark. I read that clusters need a primary storage so I created a new Azure Storage and a container, so far so go. One question here is what is the best approach? Azure storage or azure data lake storage gen2 ???).

Other question here is (and this is the most important). How do I link my existing data lake to create hive tables from it? I was not able to dit from here:

User's image

Regards

Federico

Azure HDInsight
Azure HDInsight
An Azure managed cluster service for open-source analytics.
204 questions
{count} votes

Accepted answer
  1. PRADEEPCHEEKATLA-MSFT 84,531 Reputation points Microsoft Employee
    2023-02-15T06:18:47.7033333+00:00

    Hello @Federico Sardo

    Thanks for the question and using MS Q&A platform.

    Azure Data Lake Storage Gen2 is a set of capabilities dedicated to big data analytics, built on Azure Blob Storage.

    Case1: Access the defualt storage account created while creating the HDInsight cluster.

    There are several ways you can access the files in Data Lake Storage Gen2 from an HDInsight cluster.

    • Using the fully qualified name. With this approach, you provide the full path to the file that you want to access: abfs://<containername>@<accountname>.dfs.core.windows.net/<file.path>/
    • Using the shortened path format. With this approach, you replace the path up to the cluster root with:: abfs:///<file.path>/
    • Using the relative path. With this approach, you only provide the relative path to the file that you want to access: /<file.path>/

    Case2: Access the additional storage account

    If you want to access the data residing on the external storage. Then you will have to add that storage as additional storage in the HDInsight cluster.

    Steps to add storage accounts to the existing clusters via Ambari UI:

    Step 1: From a web browser, navigate to https://CLUSTERNAME.azurehdinsight.net, where CLUSTERNAME is the name of your cluster.

    Step 2: Navigate to HDFS -->Config -->Advanced, scroll down to Custom core-site

    Step 3: Select Add Property and enter your storage account name and key in following manner

     Key => fs.azure.account.key.(storage_account).blob.core.windows.net  
     Value => (Storage Access Key)  
    

    236739-image.png

    
    

    Step 4: Observe the keys that begin with fs.azure.account.key. The account name will be a part of the key as seen in this sample image:

    236740-image.png

    
    

    For more details, refer to Use Azure Data Lake Storage Gen2 with Azure HDInsight clusters and Add additional storage accounts to HDInsight

    Hope this helps. Do let us know if you any further queries.


    Please don’t forget to Accept Answer wherever the information provided helps you, this can be beneficial to other community members.

    1 person found this answer helpful.

0 additional answers

Sort by: Most helpful