How to fix the below issue which I am facing while installing the maven library with coordinates as 'com.databricks:spark-xml_2.12:0.14.0'

Dathathraiah 0 Reputation points
2024-10-27T08:34:14.07+00:00

How to fix the below issue which I am facing while installing the maven library with coordinates as 'com.databricks:spark-xml_2.12:0.14.0' in ADF pipeline in azure databricks notebook activity

run failed with error message Library installation failed for library due to user error for maven { coordinates: "com.databricks:spark-xml_2.12:0.14.0" } Error messages: Library installation attempted on the driver node of cluster 1027-075217-qq4l2fsi and failed. Please refer to the following error message to fix the library or contact Databricks support. Error code: DRIVER_LIBRARY_INSTALLATION_FAILURE. Error message: Library resolution failed because unresolved dependency: com.databricks:spark-xml_2.12:0.14.0: not found

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,212 questions
{count} votes

2 answers

Sort by: Most helpful
  1. Vinodh247 22,951 Reputation points MVP
    2024-10-27T13:38:54.6066667+00:00

    Hi Dathathraiah,

    Thanks for reaching out to Microsoft Q&A.

    The error you’re encountering indicates that the com.databricks:spark-xml_2.12:0.14.0 library couldn’t be found or resolved in the Maven repository Databricks is accessing.

    Here are steps to troubleshoot and potentially resolve this:

    Verify the Maven Repository Availability:

    • Databricks may not have access to the repository hosting spark-xml_2.12:0.14.0. Check if the library is hosted on Maven Central or another repository. You can manually download the library .jar file from Maven Central or other repository sources if it’s available.

    Specify Repository Configuration:

    • Sometimes, Databricks clusters may need additional configuration to access non-default Maven repositories. If the library isn’t on Maven Central, specify the repository in the cluster's Advanced Options:
      • Go to Cluster configuration in Databricks.
      • Under Libraries > Advanced Options, you can add additional Maven repositories.

    Try an Alternative Version:

    • If the library isn’t available in version 0.14.0, you could try another version such as 0.13.0 or the latest available version for Spark XML.

    Install the Jar File Directly:

    • If you can download the .jar file manually, you can upload it to your Databricks workspace or DBFS and then attach it as a library to your cluster in ADF:
    • Use dbfs cp to move the .jar to the Databricks File System.
      • Attach the .jar file as a library in your Databricks notebook activity.

    Check for Dependency Conflicts:

    • Conflicts between Spark and Scala versions might cause issues. Ensure compatibility by confirming that your Spark version in Databricks matches the expected version for spark-xml (ex: Spark 2.x or 3.x series).

    If none of these resolve the issue, another option is to file a support ticket with Databricks, as they may provide more specific guidance on the DRIVER_LIBRARY_INSTALLATION_FAILURE error.

    Please feel free to click the 'Upvote' (Thumbs-up) button and 'Accept as Answer'. This helps the community by allowing others with similar queries to easily find the solution.


  2. Amira Bedhiafi 26,101 Reputation points
    2024-10-27T14:59:24.77+00:00

    Have you checked if com.databricks:spark-xml_2.12:0.14.0 exists in the Maven repository? The latest version of spark-xml may be different, and 0.14.0 might not be available or may have been deprecated. You can search for the package on Maven Central or Databricks documentation to confirm the correct version.

    Sometimes, certain packages are hosted in specific repositories (e.g., the Databricks Maven repository). You can add the repository link in the Maven configuration within the Databricks notebook activity in ADF. For Databricks-specific libraries, you can try adding:

    Repository URL: https://databricks.jfrog.io/artifactory/maven/

    If the library is unavailable through Maven, try downloading the jar file directly from a reliable source or building it manually. Once you have the jar file, upload it to Databricks and manually attach it to the cluster.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.