Azure Databricks fail to install Geospark libraries from Maven

Anuj, Singh (Cognizant) 50 Reputation points
2024-04-15T06:24:17.8033333+00:00

Hi Team , I am attempting to add below two geospark Maven libraries to my Azure Databricks interactive cluster with Runtime Version 14.3 LTS . Geospark_Library

However , I am getting below error

Library installation attempted on the driver node of cluster 0311-204237-y518gt4u and failed.

 

Please refer to the following error message to fix the library or contact Databricks support.

 

Error Code: DRIVER_LIBRARY_INSTALLATION_FAILURE.

 

Error Message: Library resolution failed because unresolved dependency: org.datasyslab:geospark-sql_2.3:1.3.1: not found unresolved dependency: org.datasyslab:geospark:1.3.1: not found

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,175 questions
{count} votes

Accepted answer
  1. PRADEEPCHEEKATLA-MSFT 89,466 Reputation points Microsoft Employee
    2024-05-22T04:42:50.06+00:00

    @Anuj, Singh (Cognizant) - I'm glad that you were able to resolve your issue with the help of support and I'm posting the solution so that others experiencing the same thing can easily reference this! Since the Microsoft Q&A community has a policy that "The question author cannot accept their own answer. They can only accept answers by others ", I'll repost your solution in case you'd like to accept the answer .

    Ask: Azure Databricks fail to install Geospark libraries from Maven?

    Solution: The issue is resolved with the help of support ticket and here is the resolution:

    OP's cluster VM is not able to reach out to the public Maven repository to download the necessary package and its dependencies. Thus, the failure. It was due to firewall restriction at your databricks VNET, most likely you are forwarding all VNET outbound traffic through some firewall and that firewall is not allowing the VNET to communicate with Maven public repo.

    We have recommended the following things.

    • You could either check with your networking team and if it is indeed a firewall restriction scenario as I have explained above, then please ask them to whitelist the maven public repository in your firewall.
    • Or you could also try to install the package by downloading the jar file from maven repo as shown below, then uploading to your databricks dbfs filesystem and installing it from there locally:
    • OP has followed the steps recommended step of installation of Sedona Package i.e. Install Sedona from the init script & we downloaded the JAR files manually into and uploaded into Volume since DBFS has been deprecated by data bricks. They were able to install the library successfully.

    If I missed anything please let me know and I'd be happy to add it to my answer, or feel free to comment below with any additional information.

    If you have any other questions, please let me know. Thank you again for your time and patience throughout this issue.


    Please don’t forget to Accept Answer and Yes for "was this answer helpful" wherever the information provided helps you, this can be beneficial to other community members.

    0 comments No comments

1 additional answer

Sort by: Most helpful
  1. PRADEEPCHEEKATLA-MSFT 89,466 Reputation points Microsoft Employee
    2024-04-15T07:06:28.4366667+00:00

    @Anuj, Singh (Cognizant) - Thanks for the question and using MS Q&A platform.

    The error message you are seeing indicates that Databricks is unable to download the Geospark library from Maven. This could be due to a network issue or a problem with the Maven repository.

    Here are some steps you can take to resolve this issue:

    1. Check your network connectivity: Ensure that your network connection is stable and that you are able to access the internet. You can try pinging the Maven repository to see if you are able to connect to it.
    2. Check the Maven repository: Verify that the Maven repository is up and running and that the GeoSpark library is available. You can try downloading the library manually from the repository to see if it is accessible.
    3. Check the Databricks cluster configuration: Ensure that the cluster is configured to use the correct Maven repository and that the repository is accessible from the cluster. You can check the cluster configuration by going to the cluster settings in the Databricks workspace.
    4. Try using a different Maven repository: If the issue persists, you can try using a different Maven repository to see if that resolves the issue. You can configure the cluster to use a different repository by updating the Maven settings in the cluster configuration.

    As per the repro, I had tried to install the maven library (org.datasyslab:geospark:1.3.1 & org.datasyslab:geospark-sql_2.3:1.3.1) on Cluster Details - Databricks Runtime Version: 14.3 LTS (includes Apache Spark 3.5.0, Scala 2.12) and able to successfully able to install without any issue.User's image

    In case, if you are still experiencing the same issue. I would suggest you share the steps which you are trying to install and also try to download the jar file directly from the maven site: https://mvnrepository.com/artifact/org.datasyslab and upload manally and install.

    Hope this helps. Do let us know if you any further queries.


    If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.