203 questions with Azure HDInsight tags

Sort by: Updated
1 answer

Migration from AWS EMR to Azure

We are trying to move our spark steps code from AWS EMR cluster to AZURE. we are using the add-steps option with command-runner.jar in EMR. Each step inits a python script which uses large text file in S3 storage and manipulating it with Spark. Example…

Azure HDInsight
Azure HDInsight
An Azure managed cluster service for open-source analytics.
203 questions
Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,045 questions
asked 2023-03-05T09:37:29.0066667+00:00
BoazD 0 Reputation points
commented 2023-03-13T05:27:50.4933333+00:00
PRADEEPCHEEKATLA-MSFT 84,456 Reputation points Microsoft Employee
3 answers One of the answers was accepted by the question author.

Can I use a Student subscription in Azure to create an HDInsight Spark cluster?

Hi all, I am trying to create a Spark cluster in HDInsight (the name of the resource is Azure HDInsight) with my Student subscription. I have tried googling but couldn't find clues in the Microsoft documentation. I have my $100 unused, but when I go…

Azure HDInsight
Azure HDInsight
An Azure managed cluster service for open-source analytics.
203 questions
asked 2020-11-28T18:43:18.027+00:00
Pablo J 21 Reputation points
edited the question 2023-03-08T07:37:59.0566667+00:00
PRADEEPCHEEKATLA-MSFT 84,456 Reputation points Microsoft Employee
1 answer One of the answers was accepted by the question author.

N How to create HDInsight Interactive query cluster with aditional storage account?

Hi community, I am new with HDInsight, I am asking for help regarding this situation: Pre conditions: I have a data lake gen 2 (hierarchical namespace enable) with my business data. ( csv and parquet files) I need to create 2 clusters. Interactive…

Azure HDInsight
Azure HDInsight
An Azure managed cluster service for open-source analytics.
203 questions
asked 2023-02-14T15:59:33.2366667+00:00
Federico Sardo 91 Reputation points
accepted 2023-03-02T18:02:13.48+00:00
Federico Sardo 91 Reputation points
0 answers

Configuration related exception while trying to run a spark app in HDInsight 5.0 cluster

I am migrating from HDInsight 4.0 to 5.0. Locally, it works. However, when I ran spark jobs in HDInsight cluster, I got the below error. Any idea why "spark.nonjvm.error.forwarding.enabled" is registered multiple times? Command to run spark…

Azure HDInsight
Azure HDInsight
An Azure managed cluster service for open-source analytics.
203 questions
asked 2023-02-22T22:08:19.9266667+00:00
Ben Asmare 0 Reputation points
commented 2023-03-01T07:11:15.49+00:00
PRADEEPCHEEKATLA-MSFT 84,456 Reputation points Microsoft Employee
1 answer

How to execute Hive queries in Synapse spark

Hello! I am replacing a HDI cluster with Azure Synapse. My current HDI spark cluster executes some HIVE queries for data transformation. Is it possible to execute the same HIVE queries into Azure Synapse spark pool? Thanks, DR

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,621 questions
Azure HDInsight
Azure HDInsight
An Azure managed cluster service for open-source analytics.
203 questions
asked 2023-02-03T02:56:53.8133333+00:00
Dharmesh Rathod 0 Reputation points
commented 2023-02-20T06:50:03.16+00:00
PRADEEPCHEEKATLA-MSFT 84,456 Reputation points Microsoft Employee
1 answer One of the answers was accepted by the question author.

Differences between HD Insight and Azure Data bricks?

I know that HDInsight has several types of clusters whereas Databricks is only for Spark type of cluster. I believe there must be some significant differences which will influence what to be chosen for implementation. [Note: As we migrate from MSDN,…

Azure HDInsight
Azure HDInsight
An Azure managed cluster service for open-source analytics.
203 questions
asked 2020-05-08T23:12:21.613+00:00
KranthiPakala-MSFT 46,437 Reputation points Microsoft Employee
edited the question 2023-02-13T18:52:26.1066667+00:00
HimanshuSinha-msft 19,376 Reputation points Microsoft Employee
1 answer

Does Azure has service to migrate data from AWS MSK to Azure Kafka HDInsight

I am looking for way to migrate data from AWS MSK to Azure Kafka. Is there any service available to do that are what are its pre-migration Prerequisites?

Azure Migrate
Azure Migrate
A central hub of Azure cloud migration services and tools to discover, assess, and migrate workloads to the cloud.
744 questions
Azure HDInsight
Azure HDInsight
An Azure managed cluster service for open-source analytics.
203 questions
asked 2020-11-18T13:57:36.94+00:00
Sarvesh Pandey 141 Reputation points
commented 2023-01-29T01:35:24.1133333+00:00
Srihareendra Bodduluri 1 Reputation point Microsoft Employee
1 answer

Azure HDinsight

What is Resource Provider connection in Azure HDinsight? (In portal, when deploying HDinsight Cluster it gives 2 option first Inbound that has no privatelink tickbox and other is Outbound that has privatelink tickbox). I want to know about both with…

Azure HDInsight
Azure HDInsight
An Azure managed cluster service for open-source analytics.
203 questions
asked 2022-12-20T11:56:08.33+00:00
Ishan Kapoor 1 Reputation point
commented 2022-12-22T17:00:48.407+00:00
BhargavaGunnam-MSFT 28,606 Reputation points Microsoft Employee
0 answers

How to fix error in a pipeline with hdi activity?

I try to run a pipeline with a hive activity, I get the Error: Response status code indicates server error: 500 (InternalServerError), with the code 2300. I couldn't find that error in the solution guide, so I don't really know how to go from here. …

Azure HDInsight
Azure HDInsight
An Azure managed cluster service for open-source analytics.
203 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
10,015 questions
asked 2022-11-18T19:31:19.24+00:00
Valeria Ortiz Cervantes 1 Reputation point
commented 2022-11-28T22:50:03.61+00:00
BhargavaGunnam-MSFT 28,606 Reputation points Microsoft Employee
0 answers

convert a result of collect_list into json using spark with scala

please find the sample below - after using below code-- val df4 = df3.groupBy("shop_id").agg(collect_list(map($"variant_id",$"variants1")) as ("variants")) and got data like -- …

Azure HDInsight
Azure HDInsight
An Azure managed cluster service for open-source analytics.
203 questions
Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,045 questions
asked 2022-11-11T15:55:04.023+00:00
vijendra singh 1 Reputation point
commented 2022-11-12T00:52:35.043+00:00
vijendra singh 1 Reputation point
1 answer One of the answers was accepted by the question author.

HDInsight startup yields linked service error: The storage connection string is invalid.

I'm getting an error when trying to run the demo spark word count in the data factory using HDInsight and a spark activity. All services were successfully created and tested. But when the spark pipeline is triggered, the following error is displayed: …

Azure HDInsight
Azure HDInsight
An Azure managed cluster service for open-source analytics.
203 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
10,015 questions
asked 2022-09-23T15:42:24.073+00:00
Scoot-3223 91 Reputation points
accepted 2022-11-08T14:37:39.45+00:00
Scoot-3223 91 Reputation points
0 answers

Json SerDe hive query failing with HDInsight version 4.0

Hi All, I have a hive query which is as follows: ADD JAR ${hiveconf:JsonSerde}; set hive.execution.engine=tez; DROP TABLE IF EXISTS Test; CREATE EXTERNAL TABLE Test( Results array< struct<requestid:string,result:array< …

Azure HDInsight
Azure HDInsight
An Azure managed cluster service for open-source analytics.
203 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
10,015 questions
asked 2022-10-13T14:30:10.807+00:00
Priya Jha 871 Reputation points
commented 2022-11-07T23:21:30.393+00:00
Phelipe Oberst 1 Reputation point
1 answer

How can I make user defined parameters required inside a pipeline

If I have a parameter that I defined, how can I make it required like this

Azure HDInsight
Azure HDInsight
An Azure managed cluster service for open-source analytics.
203 questions
Active Directory Federation Services
Active Directory Federation Services
An Active Directory technology that provides single-sign-on functionality by securely sharing digital identity and entitlement rights across security and enterprise boundaries.
1,219 questions
asked 2022-10-28T18:26:08.86+00:00
N2120 81 Reputation points
answered 2022-10-31T18:19:01.84+00:00
N2120 81 Reputation points
1 answer

HDInsight cluster creation- error on configuration+pricing

Hi, I have been trying to create an HDInsight Cluster from Canada, but it fails at the configuration step. I subscribed for PAYG, I tried by selecting different nodes but none of them is working it gives me the error " You have reached your…

Azure HDInsight
Azure HDInsight
An Azure managed cluster service for open-source analytics.
203 questions
asked 2022-10-10T23:29:36.653+00:00
ANJU RANI 1 Reputation point
commented 2022-10-17T06:48:35.18+00:00
PRADEEPCHEEKATLA-MSFT 84,456 Reputation points Microsoft Employee
1 answer

How do I find Hive server information

I'm trying to create a pipeline to copy data from csv to a DataBricks table. To do so, I believe I need to set up a HIVE linked service. However, I'm not sure where I can find the necessary information to fill out the LS form - we had a…

Azure HDInsight
Azure HDInsight
An Azure managed cluster service for open-source analytics.
203 questions
asked 2022-09-14T14:27:33.613+00:00
Peter Ott 1 Reputation point
commented 2022-10-10T07:00:21.857+00:00
PRADEEPCHEEKATLA-MSFT 84,456 Reputation points Microsoft Employee
0 answers

Azure HDInsight Spark job is failing with Logger Error

Hello Team, Our jobs are recently failing with this error- ERROR RawSocketSender [MdsLoggerSenderThread]: org.fluentd.logger.sender.RawSocketSender java.net.SocketException: Broken pipe (Write failed) All these pyspark jobs were running fine…

Azure HDInsight
Azure HDInsight
An Azure managed cluster service for open-source analytics.
203 questions
asked 2022-09-28T16:10:09.787+00:00
Aishwarya Gopikrishnan 1 Reputation point
commented 2022-10-04T16:32:49.827+00:00
MartinJaffer-MSFT 26,051 Reputation points
1 answer One of the answers was accepted by the question author.

Real Case Scenarios

Hello, Where can I find case scenarios or real life use cases of for example cloud models or high availability and scalability. What I mean is, for example, hybrid cloud is used by banks because they want to control the database and security. …

Azure SQL Database
Azure Functions
Azure Functions
An Azure service that provides an event-driven serverless compute platform.
4,567 questions
Azure HDInsight
Azure HDInsight
An Azure managed cluster service for open-source analytics.
203 questions
Azure Role-based access control
Azure Role-based access control
An Azure service that provides fine-grained access management for Azure resources, enabling you to grant users only the rights they need to perform their jobs.
709 questions
Microsoft Entra ID
Microsoft Entra ID
A Microsoft Entra identity service that provides identity management and access control capabilities. Replaces Azure Active Directory.
20,355 questions
asked 2022-09-28T09:43:19.933+00:00
Rawan Ghalayini 21 Reputation points
commented 2022-10-03T07:20:57.96+00:00
ShaktiSingh-MSFT 14,276 Reputation points Microsoft Employee
1 answer One of the answers was accepted by the question author.

How to use UA Managed Identity in Data factory On Demand HD Insight Linked Service

When creating an on-demand HD Insight linked service, there's missing detail for how to configure a User Assigned managed identity instead of a service principal. Steps are shown on how to add a UA managed identity to the Data Factory, but what values…

Azure HDInsight
Azure HDInsight
An Azure managed cluster service for open-source analytics.
203 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
10,015 questions
asked 2022-09-21T22:36:31.707+00:00
Scoot-3223 91 Reputation points
commented 2022-09-23T01:37:38.683+00:00
BhargavaGunnam-MSFT 28,606 Reputation points Microsoft Employee
2 answers

Spark Dataframe writing issue in azure from spark: One of the request inputs is not valid

I am able to read data from azure blob storage but when writing back to azure storage then it throws below error . I am running this program in my local machine. Can someone help me out on this please. Program val config = new SparkConf(); …

Azure HDInsight
Azure HDInsight
An Azure managed cluster service for open-source analytics.
203 questions
Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,045 questions
asked 2022-08-25T04:24:33.64+00:00
Abdul Hafiz A.G A ID(RITM0203509) 11 Reputation points
answered 2022-09-14T00:02:23.46+00:00
Junjie Cao 1 Reputation point Microsoft Employee
0 answers

How to Add a subqueue in yarn

I already have queues setup on Yarn on HdInsight, they were setup with the Ambari UI. I have a queue for sqoop that takes up 70% of the cluster. However I have a few huge sqoop jobs with a lot of mappers that take up 100% of the queue and block…

Azure HDInsight
Azure HDInsight
An Azure managed cluster service for open-source analytics.
203 questions
asked 2022-08-10T19:52:34.937+00:00
Vamsi Anamaneni 1 Reputation point
commented 2022-09-08T19:06:11.337+00:00
HimanshuSinha-msft 19,376 Reputation points Microsoft Employee