Hi,
Thank you for your suggestions and the literature links.
The first one unfortunately doesn't cover Switch Embedded Teaming (SET).
The second one indicates the following hint:
Only teams where each of the team members is connected to a different external Hyper-V switch are supported.
However the whole section seems to be written für Windows Server 2012 R2 and doesn't mention SET on the host explicitly. So it seems it doesn't cover the scenario above. Of course I could be reading the document wrong.
I did some performance testing with ctsTraffic.exe comparing the teamed setup with the setup that only had one vmNIC (and therefore only one VF) available and got some interesting results.
I used the teamed/unteamed VM which I wanted to compare as ctsTraffic server (running two instances of the program at different ports with a push/pull pattern) and two different VMs (on different Hosts) as clients. Hardware and host setup was identical on all hosts. When running this against the teamed VM I got slower results both for traffic from a single VM (somewhat expected due to the teaming overhead) but also much slower when running this from two VMs which is where the teamed solution was supposed to shine. The single NIC VM on the other hand gave great results for single traffic (if my math is correctly it squeezed 40 GB of data (20GB read, 20GB write) through that line in just over 18s. It took over 26s for the teamed VM. And from two VMs the time pretty much doubled to 40s for the non-teamed VM while it spiked to 75s for the teamed VM. I repeated the tests a couple of times but the results remained similar.
I also did some testing for SMB performance using diskspd.exe which gave no performance advantage for the teamed solution. Thanks to SMB Multichannel that's not that surprising but after removing one of the vmNICs (leaving only one vmNIC on the VM the results remained identical). That made me wonder so I've checked CPU utilization and this spiked to 100% if having two transfers in parallel (about 80% for system, 15-20% for interrupt processing). I thought this would be different when using SR-IOV but it's not - so either something is wrong or I misunderstood the concept (most likely). So CPU utilization is the limiting factor here (as widely published) making this approach not feasible to increase throughput beyond around 7.5 Gbps. I guess Guest RDMA inside the VM would be the next logical step but unfortuantely I haven't gotten this to work in Windows Server 2016 (it always results in Bluescreens on the Host).
I've also just checked resiliency by deactivating the host NICs in the middle of a transfer. Here also the solutions with only 1 vmNIC performed better. There was a short drop in transmission speed when deactivating the primary NIC but then it went back to the 7.5 Gbps in an instant. When testing the same on the VM with the Team the transfer speed dropped to around 4 Gbps until the transfer was fully done.
My takeaway from this is that I'll stick with a single SR-IOV-vmNIC per VM when using Switch Embedded Teaming on the host. At least until I get Guest RDMA to work...
Any ideas on the above? Any wrong assumptions or stupid ideas in my tests above?
Best wishes,
Jens