Serialization Exception after changing reading from attached ADLS account to the same ADLS Account but through a linked service.

Jose Gonzalez Gongora 25 Reputation points Microsoft Employee
2024-11-08T22:49:30.5733333+00:00

Currently we are running a Synapse Notebook that reads from the "Primary ADLS Gen2 account" and parse data to JSON. This job has been running stable for a while now. We need to run the same logic but read from a different ADLS account instead, so in order to achieve this we use a linked service using the following spark configuration:

spark.conf.set(f"spark.storage.synapse.$baseUrl%s.linkedServiceName","lnkAdls_keybased")
spark.conf.set(f"fs.azure.account.auth.type.$baseUrl%s", "SAS")
spark.conf.set(f"fs.azure.sas.token.provider.type.$baseUrl%s", "com.microsoft.azure.synapse.tokenlibrary.LinkedServiceBasedSASProvider")

But after changing from where the job reads the data, the job started to failed with the following stack trace: 

Caused by: java.io.NotSerializableException: com.microsoft.vegas.vfs.SecureVegasFileSystem  Serialization stack: 
    - object not serializable (class: ___com.microsoft.vegas.vfs.SecureVegasFileSystem___, value: 3.3.09) 
    - field (class: $iw, name: fileSys, type: class org.apache.hadoop.fs.FileSystem) 
    - object (class $iw, $iw@3ccef33c) - field (class: $iw, name: $iw, type: class $iw) 
    - object (class $iw, $iw@310c8203) ....

After searching around looks this issue is related to the use of Hive Metastore by spark sql but that's my best guess. But I don't know what to do with this information, is there a way to disable the use of Hive in a notebook level? Or is there something else I can check in order to figure out a solution or workaround?

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
5,004 questions
{count} votes

Accepted answer
  1. Vinodh247 23,581 Reputation points MVP
    2024-11-10T11:13:27.4466667+00:00

    The java.io.NotSerializableException you're encountering is due to the SecureVegasFileSystem object not being serializable. This issue arises when Spark attempts to serialize objects that aren't designed for serialization, often during operations like broadcasting variables or using certain configurations.

    Possible Solutions:

    Avoid Serialization of Non-Serializable Objects:

    • Ensure that objects like SecureVegasFileSystem aren't inadvertently serialized. For instance, avoid using such objects within closures that Spark might serialize.

    Use Serializable Wrappers:

    • If you must use non-serializable objects, consider encapsulating them within serializable wrappers. This approach allows Spark to handle them without serialization issues.

    Configure Spark to Use Hive Metastore Properly:

    • Misconfigurations related to the Hive Metastore can lead to serialization problems. Review your Spark and Hive configurations to ensure they're correctly set up.

    Disable Hive Support in Spark:

    • If Hive support isn't essential for your operations, you can disable it by setting spark.sql.catalogImplementation to 'in-memory'. This change can prevent Spark from attempting to serialize Hive-related objects.

    Next Steps:

    Review Your Code:

    • Identify and refactor any code segments where non-serializable objects might be serialized.

    Adjust Spark Configurations:

    • Modify your Spark configurations to prevent unnecessary serialization of non-serializable objects.

    Consult Spark Documentation

    By implementing these strategies, you should be able to resolve the serialization exception and achieve stable operation when reading from the linked ADLS account.


0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.