Spark cluster to read Hive on differnt HDI cluster

2021-02-10T17:09:14.947+00:00

I have two different HDI clusters say Cluster A , Cluster B . One HDInsight (Cluster A) is spark cluster and another one(Cluster B) is provisioned with hive. I need to run spark processing in Cluster A and need to connect to hive which is in Cluster B . I did required configuration like setting spark.datasource.hive.warehouse.metastoreUri in Cluster A . But I am unable to read hive tables from Spark cluster( Cluster A). Is there any specific configuration needed ?
Thanks for the response

Azure HDInsight
Azure HDInsight
An Azure managed cluster service for open-source analytics.
207 questions
0 comments No comments
{count} votes

Accepted answer
  1. PRADEEPCHEEKATLA-MSFT 85,346 Reputation points Microsoft Employee
    2021-02-11T04:07:25.453+00:00

    Hello @Nallaperumal, Natarajan (Cognizant) ,

    Welcome to the Microsoft Q&A platform.

    Yes, Apache Hive Warehouse Connector (HWC) is required to integrate Apache Spark and Apache Hive clusters.

    The Apache Hive Warehouse Connector (HWC) is a library that allows you to work more easily with Apache Spark and Apache Hive. It supports tasks such as moving data between Spark DataFrames and Hive tables. Also, by directing Spark streaming data into Hive tables. Hive Warehouse Connector works like a bridge between Spark and Hive. It also supports Scala, Java, and Python as programming languages for development.

    66720-image.png

    Reference: Integrate [Apache Spark and Apache Hive with Hive Warehouse Connector in Azure HDInsight][2].

    Hope this helps. Do let us know if you any further queries.



1 additional answer

Sort by: Most helpful
  1. 2021-02-26T10:57:43.397+00:00

    @PRADEEPCHEEKATLA-MSFT - Thanks for the response . We have provisioned Interactive Query Cluster and now Spark is able to connect with LLAP cluster and read hive tables. Here I have set spark2-default configurations in Spark cluster from the values of LLAP cluster (staging dir, hiveserver2 jdbc url ,Hiveserver2 metastore uri etc ...) . What is we have more than one Hive instances (LLAP Clusters ) and one spark cluster ?

    0 comments No comments