Spark jobs not running in a notebook
I am currently running the "1.Reading Data - CSV" notebook from the "Read and write data in Azure Databricks" module on Microsoft learn. When I tried to run the cell "# A reference to our tab-separated-file", the Spark jobs…
Upsert data in to SQL from delta table
Hello Team, we have scenario where we have to get the data from lake , process it and then store in SQL database . This is what we are doing Read the entity from Lake Store that in delta table _staging Do merge between delta table and…
Issue in accessing delta table in datalake gen2 storage account with databricks cluster (latest stable version)
Recently, i am encountering an issue in the databricks cluster where it could not accessing the delta table (unmanaged delta table) which parquet files are stored in the azure datalake gen2 storage account. The issue is it could not read/update from the…
Install third party libraries in Azure Databricks
Hello, I am trying to install a library "pythonnet" in Azure Databricks. I tried installing it through PyPI, through Python Wheel option and also the JAR option. None of these works for me. I need to connect databricks notebook to Azure…
I/O operations with Azure Databricks REST Jobs API
I have experienced problems with the delivery of arguments via Jobs API. I've outlined the experienced problems in details on Stack Overflow: https://stackoverflow.com/questions/62758094/i-o-operations-with-azure-databricks-rest-jobs-api I would…
Machine Learning Model Deployment
I am new to ML model and am researching using Azure Databricks and MLFlow to train a model. My question is once the model is created, is there a way to host the model that can be downloaded and inferenced remotely ? I am looking for options other than…
Azure Web Application with computationally intenstive tasks in Dask and Tensorflow
Hello, I'm developing a data analysis tool for the processing of data from Hydrogen-Deuterium exchange mass spectrometry. We would like to accompany our publication with a deployment of the code on Microsoft Azure so that other researchers can quickly…
Spark Connector in ADF
Hi, I have created a spark connector to connect to azure data bricks. In copy activity source is spark connector and sink is Azure SQL DB. In spark Connector query, CreatedDate is being converted to String and throwing error where as it is timestamp…
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 7.0 failed 4 times, most recent failure: java.lang.NoClassDefFoundError: Could not initialize class
Hi, I am getting this error despite defining the class. When I execute the notebook first time it works fine but when I execute the same notebook without code change it started throwing this error. As per the error class not defined but trust me class…
data bricks scala : data frame column endoing from UTF 8 to windows 1252
HI I am working with data bricks where i have the data in parque and i am generating smaller files out of it , i have a column in this which is string and it has different characters and i have to encode this string value to windows 1252 or windows…
Third party Python package installed on Databricks cluster gives different results than other Python stacks
We get a Python package developed by a third party. The package implements a standard mathematical model, no machine learning, no randomization. The model turned out to return incorrect results when installed on a Databricks cluster. We tried different…
Databricks Notebook Activity parameter problem
I feel this is a bug but not sure if it is with ADF or Databricks. I am running a notebook using ADF notebook activity. My notebook has a widget for which I pass the value from ADF. As I need to manually enter the parameter name while configuring…
Spark SQL How to get the 5th column from the Spark SQL Query
Hi, I have a headerless file which I am reading in the spark.read to create a data frame now I want to get the value of the 5th column from the file.File is comma seperated. How to achieve it. I know it is possible in the T-SQL but not sure how to…
SparkException: Job aborted due to stage failure: Task not serializable: java.io.NotSerializableException:
Hi, I am running this code but this is throwing this error: SparkException: Job aborted due to stage failure: Task not serializable: java.io.NotSerializableException:
Azure Databricks - Split column based on special characters in Databricks
I have a column in my csv file that possibly has value in below formats. "Q1_1__Value_-_10_counts" "Value_10_counts" "Q1_1__1__value_yes" This has to be split as below respectively "Value_-_10_counts" …
More convenient service to read avro files from Azure Data Lake Gen2
Hi, I have to read lots of avro files created by an Event Hub Capture in a Data Lake Gen2. Data must be filtered, processed and then applied to train a machine learning model. I'm considering Azure Databricks and the Azure Machine Learning service…
Azure IoT - Query Data from IoT Files
Hello, I am using Azure (Azure Databricks, IoT Hub) to stream unstructured data from IoT devices (i.e. wind turbine), in the form of thousands of files with millions of data captured over a period of 10 years. How do I extract a variety of metadata…
File(filePath).exists does not work in Azure databricks
Hi, How to find if file exists in a path in the data lake? Regards Rajaniesh
Accessing dataframe created in Scala from Python command
Is there a way to create a Spark dataframe in Scala command, and then access it in Python, without explicitly writing it to disk and re-reading? In Databricks I can do in Scala dfFoo.createOrReplaceTempView("temp_df_foo") and it then in…
Standard Configuration Conponents of the Azure Datacricks
Hello, Could you please tell me standard configuration components of the Azure Databricks. What are the Azure components (storage?) required for the configuration of the Azure Databricks? Thank you. Sincerely, Kenjiro Majima