How the SMB network connection issues to be fixed in S2D 2 node cluster, Mellanox adapter for SMB connections (Back to Back)

SG MUHAMMAD HASHIM 0 Reputation points
2024-04-03T08:19:54.5966667+00:00

Hello !!

We are setting up the S2D 2 node cluster in On prem infra , Using Mellanox Adapters for connecting DAC cables which is switchless configuration for SMB Connections ,

configured IP for 2 Nic ports on each node and executed the cluster validation report was successful, Installed DCB and changed few DCB related settings for RDMA (RocE v2) post configuring the cluster and enabling the S2D in 2 nodes, facing issue with SMB connections, showing as Partitioned,

please suggest, below is the event description retrieved from cluster event,

"Cluster network 'Cluster Network 2' is partitioned. Some attached failover cluster nodes cannot communicate with each other over the network. The failover cluster was not able to determine the location of the failure. Run the Validate a Configuration wizard to check your network configuration. If the condition persists, check for hardware or software errors related to the network adapter. Also check for failures in any other network components to which the node is connected such as hubs, switches, or bridges."

Azure Stack HCI
Azure Stack HCI
A hyperconverged infrastructure operating system delivered as an Azure service that provides security, performance, and feature updates.
300 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Trent Helms - MSFT 2,536 Reputation points Microsoft Employee
    2024-04-05T14:38:20.6433333+00:00

    Hi @SG MUHAMMAD HASHIM ,

    Can you confirm if you are installing Azure Stack HCI version 22H2 or 23H2?

    With 23H2, the entire deployment and cluster creation process is automated. Manually deploying a cluster on 23H2 is not supported. As such, the network is configured during the deployment using Network ATC, so there are generally very few changes that need to be made prior to the deployment. Instead, you provide the necessary information in the deployment wizard which configures your nodes for you accordingly.

    Given you seem to be manually creating the cluster, my assumption is that you are deploying 22H2. In this scenario, I would recommend using Network ATC to configure all of the host network settings. Doing so configures all of the required QoS/DCB/RDMA settings for the network adapters based on the intent types applied to them. It will also automatically assign IP addresses to each of your storage NICs so you don't need to worry about managing IPs or VLANs. This is especially useful in switchless environments where the storage networks are completely separated from the physical network. But if you need to change a setting on an adapter that Network ATC manages, you can do so by setting an override. More information on Network ATC can be found in the links below.

    Deploy host networking with Network ATC

    Manage Network ATC

    A common issue with switchless environments that I have seen is that customers will attempt to put their two storage adapters on the same IP subnet and VLAN which won't work. The cluster assumes that all IPs on the same IP subnet will be able to talk to each other, so if it can't reach a particular IP in that range, it shows the network as 'partitioned'.

    Hope this helps!

    0 comments No comments