Contoso Labs-Fabric Choices (Network)

Article
03/07/2014

In the prior post, we discussed how our major decision revolved around storage. The choices we made there would dictate the cost of the rest of the solution, so we had to make a decision there first, and carefully at that. Once we knew we'd be using Scale-Out File Servers and a converged fabric, we had to evaluate what that meant for our network.

Speeds and Feeds

Our first inclination was to get excited by all the possibilities and performance that the SOFS+SMB 3.0 stack could enable. How awesome is your life when you technologies like SMB Multichannel and SMB Direct (RDMA) at your disposal? Once our budget numbers came back, we realized we had to scale back our ambitions a little. While SMB Direct is amazing, it requires special RDMA-enabled 40GbE, 10GbE, or Infiniband cards. That presented a few serious problems.

Dual port, RDMA-capable cards are expensive. Well over $500 a piece for a 10GbE example, with limited options because it is newer technology. Do the math for one per 288 nodes, and you see a tremendous cost.
High speed NICs require drastically more capable and expensive switching and routing infrastructure. 48-port 10GbE switches cost 2-4x more than their 1Gbe counterparts. SFP+ cables often cost 5-10x or more of an equivalent length CAT-6 cable. When you need a thousand of them? Ouch.
On top of that, because our compute nodes are Gen6 servers, we learned that it's actually frighteningly easy to overwhelm the PCIe bus in some RDMA scenarios. Our network could be TOO FAST FOR OLDER PCIe. What a great problem to have.

We decided to do the math to see if we could get away with using the existing connectivity in our scenario. The compute nodes currently have four 1GbE ports each. Frankly, that's not very much. 4Gbit of throughput, split between management, storage, AND tenant traffic is almost criminally low. We had to debate long and hard about whether we could get away with it. After long deliberation, we decided that our users and workload could manage with this limited throughput, since there won't be heavy stress being generated during steady-state. Our worst case scenario is boot storms, which we think we can manage. However, we're acknowledging there's serious risk here. Until we get users on and load test, we can't be sure what kind of user base we can support. Just like our customers, sometimes we need to learn useful things by doing them and seeing what happens.

In all honesty: This is not an architecture we'd recommend in any production environment meant for serious workloads. It does not provide enough room for I/O growth, and could too easily produce bottlenecks. On the flip side, we think it's an excellent dev/lab design that could make good use of older hardware, and provide a good experience. Keep that in mind as you read more of our architecture in the future.

Comments

Anonymous
January 01, 2003
@Tristan

Yes, definitely. I'll start covering the network equipment on Friday, through next week. The actual design of and configuration of the L2/L3 parts of the network will come later on, and it much more detail.
Anonymous
January 01, 2003
@Istvan Szarka

The absolute smallest set of hosts you could use would be 2-3. At minimum you'd need one host for the network virtualization gateways, and one for the fabric AND tenant VMs. The easier one would be 3...splitting tenant and fabric VMs between hosts, plus the aforementioned NV gateway host. The fabric VM host would have to be generously provisioned as well.

As for the business side of things, that depends entirely on how you'd run and market your business. For a service provider who is offering some serious levels of support and service to customers, you could manage capacity very ad-hoc and stay small but profitable. There is definitely no set answer to that question.
Anonymous
March 18, 2014
Are you going to discuss the network equipment that you selected?
Anonymous
March 22, 2014
A 288 node dev/lab is a pretty big playground :)

It makes me be very curious about two things:
1. Can you simulate an IaaS cloud in a much smaller lab, only with a couple of Hyper-V hosts, in order to learn this?
2. What would be the minimum number of servers to start a real cloud providing business?

This blog series is awesome so far!
Anonymous
October 09, 2014
Hi, can you explain to me if there is any specific featureset is needed on the fabric Switches? Stuff like Will a L2 Switch do, Need for BGP Capable Switches (assuming I have a dedicated NVGRE Gateway which is the Internet Access Point)? VLAN Capability? I'm planning a even smaller dev lab and I'm especially confused about the BGP Part. Thanks for your help

Share via

Contoso Labs-Fabric Choices (Network)

Speeds and Feeds

Comments

Additional resources