Notebook fails in pipeline due to error 6002 , user error

NANDARAJ M 0 Reputation points
2024-06-11T08:08:46.39+00:00

**Notebooks function well when run normally, however they malfunction when utilized in a pipeline. The cause of the error is data write in csv, however the reason for the failure is unknown. **

Operation on target Notebook 3 failed: ---------------------------------------------------------------------------

Py4JJavaError Traceback (most recent call last)

Cell In[7], line 1

----> 1 df = spark.read.load('abfss://files@storagefortoday.dfs.core.windows.net/data/*.csv', format='csv', header=True)

  2 #display(df.limit(10))
```File /opt/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py:300, in DataFrameReader.load(self, path, format, schema, **options)

298 self.options(**options)

299 if isinstance(path, str):


301 elif path is not None:

302 if type(path) != list:


   1316 command = proto.CALL_COMMAND_NAME +\

   1317     self.command_header +\

   1318     args_command +\

   1319     proto.END_COMMAND_PART

   1321 answer = self.gateway_client.send_command(command)

-> 1322 return_value = get_return_value(

   1323     answer, self.gateway_client, self.target_id, self.name)

   1325 for temp_arg in temp_args:

   1326     if hasattr(temp_arg, "_detach"):

File /opt/spark/python/lib/pyspark.zip/pyspark/errors/exceptions/captured.py:169, in capture_sql_exception.<locals>.deco(*a, **kw)

167 def deco(*a: Any, **kw: Any) -> Any:

168 try:


170 except Py4JJavaError as e:

171 converted = convert_exception(e.java_exception)


324 value = OUTPUT_CONVERTERtype

325 if answer[1] == REFERENCE_TYPE:


327 "An error occurred while calling {0}{1}{2}.\n".

328 format(target_id, ".", name), value)

329 else:

330 raise Py4JError(

331 "An error occurred while calling {0}{1}{2}. Trace:\n{3}\n".

332 format(target_id, ".", name, value))


: java.nio.file.AccessDeniedException: Operation failed: "This request is not authorized to perform this operation using this permission.", 403, GET, https://storagefortoday.dfs.core.windows.net/files?upn=false&resource=filesystem&maxResults=5000&directory=data&timeout=90&recursive=false, AuthorizationPermissionMismatch, "This request is not authorized to perform this operation using this permission. RequestId:da437775-401f-0018-45cc-bb4518000000 Time:2024-06-11T06:53:08.2745986Z"

at org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.checkException(AzureBlobFileSystem.java:1443)

at org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.listStatus(AzureBlobFileSystem.java:516)

at org.apache.hadoop.fs.Globber.listStatus(Globber.java:128)

at org.apache.hadoop.fs.Globber.doGlob(Globber.java:291)

at org.apache.hadoop.fs.Globber.glob(Globber.java:202)

at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:2124)

at org.apache.spark.deploy.SparkHadoopUtil.globPath(SparkHadoopUtil.scala:316)

at org.apache.spark.sql.execution.datasources.DataSource$.$anonfun$checkAndGlobPathIfNecessary$3(DataSource.scala:738)

at org.apache.spark.util.ThreadUtils$.$anonfun$parmap$2(ThreadUtils.scala:393)

at scala.concurrent.Future$.$anonfun$apply$1(Future.scala:659)

at scala.util.Success.$anonfun$map$1(Try.scala:255)

at scala.util.Success.map(Try.scala:213)

at scala.concurrent.Future.$anonfun$map$1(Future.scala:292)

at scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:33)

at scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:33)

at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64)

at java.base/java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1426)

at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290)

at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020)

at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656)

at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594)

at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183)


at org.apache.hadoop.fs.azurebfs.services.AbfsRestOperation.completeExecute(AbfsRestOperation.java:231)

at org.apache.hadoop.fs.azurebfs.services.AbfsRestOperation.lambda$execute$0(AbfsRestOperation.java:191)

at org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.trackDurationOfInvocation(IOStatisticsBinding.java:464)

at org.apache.hadoop.fs.azurebfs.services.AbfsRestOperation.execute(AbfsRestOperation.java:189)

at org.apache.hadoop.fs.azurebfs.services.AbfsClient.listPath(AbfsClient.java:302)

at org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.listStatus(AzureBlobFileSystemStore.java:1158)

at org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.listStatus(AzureBlobFileSystemStore.java:1128)

at org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.listStatus(AzureBlobFileSystemStore.java:1110)

at org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.listStatus(AzureBlobFileSystem.java:513)

... 20 more








**The above error is shown when the pipeline is triggered**
Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,671 questions
{count} votes

2 answers

Sort by: Most helpful
  1. Amira Bedhiafi 19,626 Reputation points
    2024-06-11T08:57:34.91+00:00

    Based on this old thread, the error you're encountering is due to insufficient permissions.

    It means that you don't have the necessary permissions for your Synapse workspace.

    Specifically, you need to enable the "All Networks" option and verify that the "Storage Blob Data Contributor" role has been added. The Synapse workspace requires this role to access storage accounts.

    Assign the "Storage Blob Data Contributor" role in Synapse to grant access to a specific storage account.

    Go to Access Control(IAM) -> +Add -> Storage Blob Contributor. If the following roles are already assigned. Assign yourself to the storage blob data owner role on storage account.

    enter image description here

    Documentation : https://video2.skills-academy.com/en-us/azure/synapse-analytics/security/how-to-grant-workspace-managed-identity-permissions


  2. Harishga 5,910 Reputation points Microsoft Vendor
    2024-06-14T06:16:06.24+00:00

    Hi @NANDARAJ M
    Based on the error message you provided, it appears that you are facing an AccessDeniedException error while attempting to access a CSV file in Azure Synapse Analytics using your notebook. This error typically states that the account running the notebook does not have the necessary permissions to perform the operation.

    If you are still having trouble accessing a CSV file in Azure Synapse Analytics using your notebook, even after assigning the Storage Blob Data Contributor role, it could be due to the specific restrictions of a vendor account in Azure. 

    To troubleshoot the issue, you can take the following steps:

    • Review the Vendor Account Permissions to see if there are any specific limitations or additional steps required for vendor accounts to grant access to Azure resources.
    • Validate Role Assignments to ensure that the role assignments are propagated correctly. Sometimes, it takes a while for the permissions to take effect.
    • Check Network Policies to see if the “All Networks” option is enabled. If not, it could prevent access to the storage account from the Synapse workspace.
    • Inspect the Resource-Level Permissions to check if there are any explicit deny policies or if the storage account has additional conditional access policies applied, besides IAM roles.
    • Consult Azure Support or your account manager for insights specific to your account setup, since vendor accounts can have unique configurations.
    • Use Azure Portal for Diagnostics, which provides diagnostic tools that can help identify permission issues. Use the Storage Account’s “Access Control (IAM)” blade to review effective permissions.

    Reference:
    https://video2.skills-academy.com/en-us/azure/synapse-analytics/troubleshoot/troubleshoot-synapse-studio-and-storage-connectivity
    https://github.com/MicrosoftDocs/azure-docs/issues/70324
    https://stackoverflow.com/questions/72490005/errorcode6002-in-azure-synapse-analytics-pipeline
    https://video2.skills-academy.com/en-us/azure/synapse-analytics/security/connectivity-settings
    Hope this helps. Do let us know if you any further queries.


    If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

    0 comments No comments