How to read a Parquet data file in jupyter notebook with R kernel

Question

I am very new to Azure ML. I have a registered data asset with name "test" with Data source as "workspaceblobstore" and there is a parquet file with name "userdata1.parquet" in the "test" dataset with version 2. I want to read this "userdata1.parquet" file in the jupyter notebook with R kernel. And, after reading this parquet file, I also want to convert it to a R dataframe for further processing.

from azureml.core import Workspace, Dataset  
from azureml.core.dataset import Dataset  
      
 subscription_id = 'abc'  
 resource_group = 'abcd'  
 workspace_name = 'xyz'  
      
 workspace = Workspace(subscription_id, resource_group, workspace_name)  
      
 dataset = Dataset.get_by_name(workspace, name='test')  
      
      
from azureml.core.dataset import Dataset  
web_path ='https://abc/UI/09-17-2022_125003_UTC/userdata1.parquet'      
sample_ds = Dataset.Tabular.from_delimited_files(path=web_path)

But I am getting the "DataValidationError" as

 Error Code: ScriptExecution.StreamAccess.Authentication Failed Step: e9512c41-05cf-492e-a7a2-26fc17daf578 Error Message: ScriptExecutionException was caused by StreamAccessException. StreamAccessException was caused by AuthenticationException. Identity authentication failed for 'AzureBlob GetReference' operation at 'https://abc/UI/09-17-2022_125003_UTC/userdata1.parquet' with '403: AuthorizationPermissionMismatch'. Please make sure the compute or login identity has 'Storage Blob Data Reader' or 'Storage Blob Data Owner' role in the storage IAM. This request is not authorized to perform this operation using this permission. | session_id=5b6a2eb3-f70b-46a6-b57a-d657bc4eb4bf

Can anyone please tell me how to proceed from here. Any help would be appreciated.

Accepted Answer

Hi

Thanks for reaching out to Microsoft Q&A.

The error message is clear. It states that your current role is not enough to read the data from blob, you will have to get your login one of the following IAM roles to eradicate this error.

Identity authentication failed for 'AzureBlob GetReference' operation at 'https://abc/UI/09-17-2022_125003_UTC/userdata1.parquet' with '403: AuthorizationPermissionMismatch'. Please make sure the compute or login identity has 'Storage Blob Data Reader' or 'Storage Blob Data Owner' role in the storage IAM.
This request is not authorized to perform this operation using this permission.

Please Upvote and Accept as answer if the reply was helpful.

Share via

How to read a Parquet data file in jupyter notebook with R kernel

0 additional answers

Your answer