Error when triggering Databricks notebook via datafactory while reading XML file

Vinay5 46 Reputation points
2021-04-30T14:12:54.36+00:00

I am reading xml file in spark databricks and below is the command I am using to read the file.

val readxml = spark.read.format("xml").option("rowTag","rty").option("inferschema","true").load("fielpath/"+ ip +"")

This works fine when I run the code in the databricks notebook.but when I try to run this code via adf pipeline, I am getting the below error.

java.lang.ClassNotFoundException: Failed to find data source: xml. Please find packages at http://spark.apache.org/third-party-projects.html

I have already attached and installed the the xml library in the cluster and I also ensured that the scala version is same in the Databricks linked service and the cluster.

Could someone please assist.
Thank you.

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,162 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Vinay5 46 Reputation points
    2021-05-03T16:08:53.72+00:00

    Hi Himanshu,

    I was able to resolve this issue by adding the missing library in the notebook activity (Append library).

    Thank you

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.