RDMA Traffic Test Failed

Richelle Lanuza 96 Reputation points
2021-04-21T05:25:28.317+00:00

Hi,

I run the RDMA Testing with diskspd inside C:\Windows\Sytems32. What does it mean that the "RDMA Traffic test Failed"?

Error Prompted:

ERROR: RDMA traffic test FAILED: Please check
ERROR: a) physical switch port configuration for Priority Flow Control.
ERROR: b) job owner has write permission at 172.16.xx.xx \C$

Can anyone elaborate these concerns?

Thank you,
Rich

Azure Stack HCI
Azure Stack HCI
A hyperconverged infrastructure operating system delivered as an Azure service that provides security, performance, and feature updates.
342 questions
0 comments No comments
{count} votes

Accepted answer
  1. Trent Helms - MSFT 2,541 Reputation points Microsoft Employee
    2021-04-21T18:34:14.683+00:00

    Hi @RichelleLanuza-2661,

    Test-RDMA should work for iWARP and RoCE. As I understand, it simply uses DiskSpd to generate a synthetic workload which is carried over the SMB connection. For this test to pass, that connection must be established as an RDMA connection. Also, be sure you are running the tool as a user that has local admin rights on each node as this is required to access c$.

    A few questions to get a better understanding of your environment.

    1. Are you setting this up in Windows Server 2019 or an Azure Stack HCI 20H2 environment?
    2. What model of NICs are you using?
    3. Is RDMA enabled and set for iWARP on all storage NICs? (some NICs support both iWARP and RoCE)
    4. Are you using dedicated storage NICs (i.e. no virtual switch on top of the physical NICs)? I assume this is true because you are using a switchless config, but I want to be sure.

    Some things you could check are:

    1. Ensure the RDMA/NIC settings completely match across all cluster nodes.
    2. Ensure the driver and firmware on the NICs matches and is updated on each cluster node.
    3. Ensure that your storage NICs are each on their own separate VLAN/subnet.
    4. Check the SMB Client Connectivity logs to see if there are any useful errors regarding RDMA.

    Further than this, it may be worth opening a support ticket with your hardware vendor initially as the vast majority of RDMA is handled by the hardware. If they deem an issue in the OS, you could open a support ticket with us and we'd be glad to assist in confirming your setup.

    I hope this information is helpful.

    Thanks so much, Rich, and I hope you have a wonderful day!
    Trent

    0 comments No comments

5 additional answers

Sort by: Most helpful
  1. Trent Helms - MSFT 2,541 Reputation points Microsoft Employee
    2021-04-23T15:14:58.397+00:00

    Hi Rich,

    For those QLogic cards, they support both iWARP and RoCE RDMA. As you want to use iWARP, please be sure of the following:

    1. Ensure the latest firmware and drivers are loaded for these cards.
    2. iWARP is listed in the "NetworkDirect Technology" field in the output from Get-NetAdapterAdvancedProperty for every RDMA NIC. If you need to change this property, you can do so by using the command Set-NetAdapterAdvancedProperty "NAMEOFNIC" -RegistryKeyword *NetworkDirectTechnology -RegistryValue 1
    3. Ensure the "NetworkDirect Functionality" field in the output from Get-NetAdapterAdvancedProperty for every RDMA NIC is set to 'Enabled'. If you need to change this property, you can do so by using the command Set-NetAdapterAdvancedProperty "NAMEOFNIC" -RegistryKeyword *NetworkDirect -RegistryValue 1
    4. Ensure the output of Get-NetAdapterRdma shows 'Enabled' next to each of the RDMA NICs.
    5. Check and ensure any settings within the BIOS for each machine is properly configured for RDMA are set for iWARP (if any). Some manufactures have settings within BIOS that control how the NICs are controlled and these setting must match with the OS settings to properly function.

    If all of this is true, but the Test-RDMA script is still failing, check the SMB Client Connectivity logs to look for any issues or errors that may indicate RDMA session setups are failing. Again, this may need engagement from the hardware vendor if the RDMA connections still will not establish. You can also look at the information here - https://video2.skills-academy.com/en-us/windows-server/networking/technologies/conv-nic/cnic-app-troubleshoot

    As for your additional questions, you should still be able to run Test-RDMA with your cluster in maintenance mode. If you wanted to bring your cluster online, but the RDMA connections were not able to be established, the connections should be made using standard SMB connections. The difference here is that standard SMB connections utilize the Windows TCP stack which means it won't be quite as performant as RDMA, but should still be fine. You may also notice some warnings in the event logs indicating standard SMB connections were used vs RDMA, but these would be expected knowing RDMA wasn't in use.

    I hope this information is helpful!
    Trent


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.