Unable to access Azure Blob storage from HDInsight cluster

azureuser586 1 Reputation point
2020-06-17T00:52:46.397+00:00

Hi,

I have spun up a HDInsight Spark cluster and am trying to access blob storage on the cluster as follows, but getting an exception:

hdfs dfs -ls wasbs://deploy@nisumstorageaccount2.blob.core.windows.net/

20/06/16 19:28:39 ERROR azure.AzureNativeFileSystemStore: Service returned StorageException when checking existence of container deploy in account nisumstorageaccount2.blob.core.windows.net
com.microsoft.azure.storage.StorageException: 
        at com.microsoft.azure.storage.StorageException.translateException(StorageException.java:87)
        at com.microsoft.azure.storage.core.ExecutionEngine.executeWithRetry(ExecutionEngine.java:209)
        at com.microsoft.azure.storage.blob.CloudBlobContainer.exists(CloudBlobContainer.java:769)
        at com.microsoft.azure.storage.blob.CloudBlobContainer.exists(CloudBlobContainer.java:756)
        at org.apache.hadoop.fs.azure.StorageInterfaceImpl$CloudBlobContainerWrapperImpl.exists(StorageInterfaceImpl.java:233)
        at org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.connectUsingAnonymousCredentials(AzureNativeFileSystemStore.java:863)
        at org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.createAzureStorageSession(AzureNativeFileSystemStore.java:1088)
        at org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.initialize(AzureNativeFileSystemStore.java:543)
        at org.apache.hadoop.fs.azure.NativeAzureFileSystem.initialize(NativeAzureFileSystem.java:1358)
        at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3303)
        at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:124)
        at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3352)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3320)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:479)
        at org.apache.hadoop.fs.Path.getFileSystem(Path.java:361)
        at org.apache.hadoop.fs.shell.PathData.expandAsGlob(PathData.java:352)
        at org.apache.hadoop.fs.shell.Command.expandArgument(Command.java:250)
        at org.apache.hadoop.fs.shell.Command.expandArguments(Command.java:233)
        at org.apache.hadoop.fs.shell.FsCommand.processRawArguments(FsCommand.java:104)
        at org.apache.hadoop.fs.shell.Command.run(Command.java:177)
        at org.apache.hadoop.fs.FsShell.run(FsShell.java:328)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
        at org.apache.hadoop.fs.FsShell.main(FsShell.java:391)
Caused by: java.net.UnknownHostException: nisumstorageaccount2.blob.core.windows.net
        at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:184)
        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
        at java.net.Socket.connect(Socket.java:607)
        at sun.security.ssl.SSLSocketImpl.connect(SSLSocketImpl.java:681)
        at sun.net.NetworkClient.doConnect(NetworkClient.java:175)
        at sun.net.www.http.HttpClient.openServer(HttpClient.java:463)
        at sun.net.www.http.HttpClient.openServer(HttpClient.java:558)
        at sun.net.www.protocol.https.HttpsClient.<init>(HttpsClient.java:264)
        at sun.net.www.protocol.https.HttpsClient.New(HttpsClient.java:367)
        at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.getNewHttpClient(AbstractDelegateHttpsURLConnection.java:191)
        at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1162)
        at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1056)
        at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:177)
        at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1570)
        at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1498)
        at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480)
        at sun.net.www.protocol.https.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:352)
        at com.microsoft.azure.storage.core.ExecutionEngine.executeWithRetry(ExecutionEngine.java:115)
        ... 22 more
ls: org.apache.hadoop.fs.azure.AzureException: No credentials found for account nisumstorageaccount2.blob.core.windows.net in the configuration, and its container deploy is not accessible using anonymous credentials. Please check if the container exists first. If it is not publicly available, you have to provide account credentials.

Please help.

Thanks.

Azure Blob Storage
Azure Blob Storage
An Azure service that stores unstructured data in the cloud as blobs.
2,843 questions
Azure HDInsight
Azure HDInsight
An Azure managed cluster service for open-source analytics.
210 questions
0 comments No comments
{count} votes

3 answers

Sort by: Most helpful
  1. PRADEEPCHEEKATLA-MSFT 89,466 Reputation points Microsoft Employee
    2020-06-17T04:50:40.563+00:00

    @azureuser586-9670, Welcome to the Microsoft Q&A platform.

    Have you attached the storage account to the HDInsight cluster?

    Note: Accessing the data residing in the external storage which is not configured in HDInsight cluster is not allowed.

    If you want to access the data residing on the external storage. Then you will have to add that storage as additional storage in the HDInsight cluster.

    Steps to add storage accounts to the existing clusters via Ambari UI:

    Step 1: From a web browser, navigate to https://CLUSTERNAME.azurehdinsight.net, where CLUSTERNAME is the name of your cluster.

    Step 2: Navigate to HDFS -->Config -->Advanced, scroll down to Custom core-site

    Step 3: Select Add Property and enter your storage account name and key in following manner

     Key => fs.azure.account.key.(storage_account).blob.core.windows.net
     Value => (Storage Access Key)
    

    10206-hdi-addstorage.jpg

    Step 4: Observe the keys that begin with fs.azure.account.key. The account name will be a part of the key as seen in this sample image:

    10284-hdi-addstorageverify.jpg

    Reference: Add additional storage accounts to HDInsight

    Hope this helps. Do let us know if you any further queries.


    Do click on "Accept Answer" and Upvote on the post that helps you, this can be beneficial to other community members.


  2. PRADEEPCHEEKATLA-MSFT 89,466 Reputation points Microsoft Employee
    2020-06-18T08:07:51.913+00:00

    @azureuser586-9670, In case if you running the commands on the default storage it should return the items in the container.

    Make sure the container named "deploy" exists in the default storage account and also please check the error message"ls: org.apache.hadoop.fs.azure.AzureException: No credentials found for account nisumstorageaccount2.blob.core.windows.net in the configuration, and its container deploy is not accessible using anonymous credentials. Please check if the container exists first. If it is not publicly available, you have to provide account credentials."

    Repro: I have created a container named "deploy" and it contains three files in it.

    10270-hdi-defaultstorage.png

    And able access the files from default storage account.

    hdfs dfs -ls wasbs://deploy@cheprasparkhdistorage.blob.core.windows.net/

    10341-hdi-defaultstorageaccess.png

    Hope this helps. Do let us know if you any further queries.


  3. PRADEEPCHEEKATLA-MSFT 89,466 Reputation points Microsoft Employee
    2020-07-03T07:28:33.3+00:00

    Hello @azureuser586-9670,

    Apologize for the delay in response.

    Have you resolved the issue or still you are not able to access the storage container?

    Repro: I have created HDInsight cluster with storage account named “cheprahdistorage” and container named "deploy".

    And I’m able to see all the files using the below command.

    hdfs dfs -ls wasbs://deploy@cheprahdistorage.blob.core.windows.net/

    11402-hdi-storage.png

    Hope this helps. Do let us know if you any further queries.


    Do click on "Accept Answer" and Upvote on the post that helps you, this can be beneficial to other community members.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.