204 questions with Azure HDInsight tags

Sort by: Updated
0 answers

Linking Blob storage with azure-hdinsight table

Hi All, I am new to Azure data lake. My requirement is like I need to store image and each image refers to advertisement (could be of string type). For this I have stored images in azure storage account and advertisement is stored in azure-hdinsight…

Azure HDInsight
Azure HDInsight
An Azure managed cluster service for open-source analytics.
204 questions
asked 2020-08-19T06:34:03.86+00:00
Archana Vaidya 1 Reputation point
commented 2020-09-10T19:58:25.96+00:00
HimanshuSinha-msft 19,376 Reputation points Microsoft Employee
1 answer

Security Recommendations for Azure Data and Analytics Services

I am working on Securing Data and Analytics Services on Azure. I want to know what security controls i can apply after creating of services and what i can apply only during the service creation. Below are the recommendation i have found as of now. Could…

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,630 questions
Azure HDInsight
Azure HDInsight
An Azure managed cluster service for open-source analytics.
204 questions
Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,047 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
10,028 questions
asked 2020-09-02T20:20:43.98+00:00
Akash Verma 21 Reputation points
commented 2020-09-09T11:11:32.767+00:00
PRADEEPCHEEKATLA-MSFT 84,531 Reputation points Microsoft Employee
1 answer

Link HDInsight Cluster in VSCode to Show Hive Tables

Dear All I am using the latest release of VSCode 1.48.2 to connect to my HDInsight Spark Cluster (HDI 3.6, Spark 2.3). It successfully lists the Hive Databases available in my cluster when I browse to the 'Hive Databases' section insight the…

Azure HDInsight
Azure HDInsight
An Azure managed cluster service for open-source analytics.
204 questions
asked 2020-08-31T09:12:53.597+00:00
Christoph Kiefer 141 Reputation points
commented 2020-09-08T04:52:52.977+00:00
PRADEEPCHEEKATLA-MSFT 84,531 Reputation points Microsoft Employee
1 answer One of the answers was accepted by the question author.

HDInsight Spark Cluster Customization with Boostrapping and Custom Action Scripts

Hello All We use both bootstrapping (via ARM templates) and action scripts to provision our HDInsight Spark Cluster (HDI 3.6, Spark 2.3). We face several challenges (in no particular order): First, some of the bootstrapping statements are not…

Azure HDInsight
Azure HDInsight
An Azure managed cluster service for open-source analytics.
204 questions
asked 2020-08-27T18:29:32.387+00:00
Christoph Kiefer 141 Reputation points
commented 2020-09-03T05:35:49.9+00:00
PRADEEPCHEEKATLA-MSFT 84,531 Reputation points Microsoft Employee
2 answers

Networking Issue on Azure HDInsight Spark Cluster with ESP

Dear All We encounter an issue with networking / DNS on our Azure HDInsight Spark cluster. The cluster is joined to our AAD (i.e., it's a cluster with ESP enabled). The cluster gets automatically created with a PS runbook and ARM template file. This…

Azure HDInsight
Azure HDInsight
An Azure managed cluster service for open-source analytics.
204 questions
asked 2020-08-24T09:42:23.157+00:00
Christoph Kiefer 141 Reputation points
commented 2020-09-01T11:05:27.24+00:00
PRADEEPCHEEKATLA-MSFT 84,531 Reputation points Microsoft Employee
0 answers

Configure HDFS Storage for Zeppelin Notebooks on HDInsight Spark Clusters with ESP

Dear All We followed this step-by-step tutorial to configure HDFS storage for Zeppelin notebooks on our ESP-enabled HDInsight Spark Cluster (HDI 3.9, Spark 2.3):…

Azure HDInsight
Azure HDInsight
An Azure managed cluster service for open-source analytics.
204 questions
asked 2020-08-28T09:55:45.47+00:00
Christoph Kiefer 141 Reputation points
commented 2020-09-01T08:18:44.543+00:00
PRADEEPCHEEKATLA-MSFT 84,531 Reputation points Microsoft Employee
1 answer One of the answers was accepted by the question author.

HDInsight Zeppelin Notebook Not Working

Hi All We are running an ESP-enabled HDInsight Spark cluster in Azure. We have no clue why some of our domain users are not able to use Zeppelin notebooks (usint the pyspark interpreter in our case). This is the very simple code that results in…

Azure HDInsight
Azure HDInsight
An Azure managed cluster service for open-source analytics.
204 questions
asked 2020-08-12T11:46:21.087+00:00
Christoph Kiefer 141 Reputation points
accepted 2020-08-28T05:44:19.06+00:00
Christoph Kiefer 141 Reputation points
0 answers

How to alter kafka topic config in ESP enabled HDInsight

Hi All, I am not able to alter topic config using kafka-configs.sh binary. I am passing jaas config file and I am sure the user has sufficient permissions. But I am always getting "org.apache.zookeeper.KeeperException$NoAuthException:…

Azure HDInsight
Azure HDInsight
An Azure managed cluster service for open-source analytics.
204 questions
asked 2020-08-14T15:37:30.83+00:00
mvidya 1 Reputation point
commented 2020-08-24T21:40:37.467+00:00
HimanshuSinha-msft 19,376 Reputation points Microsoft Employee
1 answer

Ad integration pass through for HDInsight

HI, I know HDInsight with ESP feature enables AD integration while connecting to cluster. Also teh access to underlying Hive tables can be controlled using Apache Ranger. But i would like to know if the access permission on storage account or datalake…

Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
1,411 questions
Azure HDInsight
Azure HDInsight
An Azure managed cluster service for open-source analytics.
204 questions
asked 2020-07-20T10:50:21.813+00:00
Hari GS 21 Reputation points
commented 2020-08-17T22:01:24.007+00:00
HimanshuSinha-msft 19,376 Reputation points Microsoft Employee
1 answer One of the answers was accepted by the question author.

Unable to run spark.sql queries on hive table using spark-shell (Class org.apache.hadoop.fs.adl.HdiAdlFileSystem not found)

I am trying to run below query on spark-shell on HDInsight cluster: val df=spark.sql("select * from hivesampletable") But it is giving below error repeatedly (irrespective of the query): 2020-08-10 07:29:21 WARN ObjectStore:568 -…

Azure HDInsight
Azure HDInsight
An Azure managed cluster service for open-source analytics.
204 questions
asked 2020-08-10T09:00:42.843+00:00
Surya Agarwal 21 Reputation points
commented 2020-08-12T04:11:55.933+00:00
PRADEEPCHEEKATLA-MSFT 84,531 Reputation points Microsoft Employee
1 answer

Proper Cores/Executors Configuration in HD-Insight

Proper Cores/Executors Configuration in HD-Insight And for this cluster i've this configuration Which is the best way to make a proper configuration in order to run efficiently a job in Spark. Is it Ok this configuration? Thanks!

Azure HDInsight
Azure HDInsight
An Azure managed cluster service for open-source analytics.
204 questions
asked 2020-08-04T15:19:31.557+00:00
Joaquin Chemile 41 Reputation points
commented 2020-08-10T03:41:14.82+00:00
PRADEEPCHEEKATLA-MSFT 84,531 Reputation points Microsoft Employee
2 answers

Is there any kind of Powerbi connector for Hadoop?

I've been testing some visualization tools and next is PowerBi. Some tools made me use apache drill, but it seems that Powerbi is full of connectors. Is there a way to connect naturally to hadoop (not hdinsight) or an easy workaround?

Azure HDInsight
Azure HDInsight
An Azure managed cluster service for open-source analytics.
204 questions
asked 2020-08-05T00:43:32.19+00:00
Geiber Arturo Ugalde Gutierrez 1 Reputation point
commented 2020-08-10T03:40:20.1+00:00
PRADEEPCHEEKATLA-MSFT 84,531 Reputation points Microsoft Employee
1 answer

Using Spark action in HDInsight Hue

I have created a Spark 2.4 cluster using HDInsights in Azure. I have installed Hue over it using Script actions. Also did the necessary steps for SSH tunneling and connecting to Hue UI. However, on the Hue UI I am able to only see Pig and Hive…

Azure HDInsight
Azure HDInsight
An Azure managed cluster service for open-source analytics.
204 questions
Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,047 questions
asked 2020-07-23T09:32:26.923+00:00
Abhijeet Bane 1 Reputation point
commented 2020-08-04T11:17:42.693+00:00
PRADEEPCHEEKATLA-MSFT 84,531 Reputation points Microsoft Employee
1 answer

What are Recommended Solutions to work with RStudio and HDInsight Spark Cluster

Dear All We are currently implementing in-house / on-prem machine learning solutions in R (RStudio). We are in the process of moving our data to the cloud by the means of a sophisticated ingestion process through Apache Nifi. Currently the data lands…

Azure HDInsight
Azure HDInsight
An Azure managed cluster service for open-source analytics.
204 questions
Azure R Server for HDInsight
Azure R Server for HDInsight
An Azure service that provides predictive analytics, machine learning, and statistical modeling for big data.
13 questions
asked 2020-07-01T20:37:32.087+00:00
Christoph Kiefer 141 Reputation points
commented 2020-07-27T08:36:26.973+00:00
Christoph Kiefer 141 Reputation points
1 answer

Not able to edit core-site.xml file

Hello, I am using free version of azure.i am trying to add some entry in core-site.xml for HDInsight configuration Which is at location /etc/hadoop/conf location. But,that file is read-only.how i can change permissions of that file So that i can…

Azure HDInsight
Azure HDInsight
An Azure managed cluster service for open-source analytics.
204 questions
asked 2020-07-02T19:18:07.353+00:00
Jitendra7337 1 Reputation point
commented 2020-07-20T11:32:50.92+00:00
PRADEEPCHEEKATLA-MSFT 84,531 Reputation points Microsoft Employee
2 answers One of the answers was accepted by the question author.

How to change Azure HDInsight Hadoop to Azure Private endpoint.

I already create a Azure HDInsight Hadoop cluster but in public network. Now,I want to change it into Private network using Azure Private Link to private endpoint. How can I change "https://<CLUSTERNAME>.azurehdinsight.net" to "…

Azure HDInsight
Azure HDInsight
An Azure managed cluster service for open-source analytics.
204 questions
asked 2020-05-08T23:16:35.12+00:00
KranthiPakala-MSFT 46,437 Reputation points Microsoft Employee
answered 2020-07-20T03:35:32.467+00:00
Zhenyu Zhou 1 Reputation point Microsoft Employee
0 answers

java.io.IOException: Stream is closed! Error in HDInsight with ADLS Gen 2

I am currently using Hail for the pyspark library to perform varying operations on Genomic data in ADLS Gen 2 with an HDInsight 4.0, Spark 2.4 cluster. I have been in touch with the development team regarding this error I have been getting when running a…

Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
1,411 questions
Azure HDInsight
Azure HDInsight
An Azure managed cluster service for open-source analytics.
204 questions
Azure Data Lake Analytics
asked 2020-06-29T14:55:04.947+00:00
EagleByte 1 Reputation point
commented 2020-07-10T17:49:55.523+00:00
HimanshuSinha-msft 19,376 Reputation points Microsoft Employee
3 answers

Unable to access Azure Blob storage from HDInsight cluster

Hi, I have spun up a HDInsight Spark cluster and am trying to access blob storage on the cluster as follows, but getting an exception: hdfs dfs -ls wasbs://deploy@nisumstorageaccount2.blob.core.windows.net/ 20/06/16 19:28:39 ERROR…

Azure Blob Storage
Azure Blob Storage
An Azure service that stores unstructured data in the cloud as blobs.
2,578 questions
Azure HDInsight
Azure HDInsight
An Azure managed cluster service for open-source analytics.
204 questions
asked 2020-06-17T00:52:46.397+00:00
azureuser586 1 Reputation point
commented 2020-07-10T05:44:31.297+00:00
PRADEEPCHEEKATLA-MSFT 84,531 Reputation points Microsoft Employee
1 answer One of the answers was accepted by the question author.

Unable to change spark.executor.heartbeatInterval parameter

I try to run a Jupyter Notebook on HDInsight with Spark; after some time (observed: 15, 17, 30 minutes), execution fails with error message: Error with 400 StatusCode: "requirement failed: Session isn't active." According to Stack Overflow…

Azure HDInsight
Azure HDInsight
An Azure managed cluster service for open-source analytics.
204 questions
asked 2020-06-22T06:27:09.24+00:00
Kurmann Simon (Helbling Technik) 21 Reputation points
accepted 2020-06-24T06:17:19.367+00:00
Kurmann Simon (Helbling Technik) 21 Reputation points
1 answer

Configuring yarn alerts in Azure Monitor

Hi All, Is there any customized alerts to monitor the HDInsight yarn memory from Azure monitor.

Azure HDInsight
Azure HDInsight
An Azure managed cluster service for open-source analytics.
204 questions
asked 2020-06-13T18:29:29.653+00:00
Revanth Thavidishetty 1 Reputation point
commented 2020-06-17T17:47:23.807+00:00
Revanth Thavidishetty 1 Reputation point