Nov 14, 2016

Spine and Leaf architecture

As virtualization, cloud computing, and distributed cloud computing (Hadoop, for example) becomes more popular in the data center, a shift in the traditional three-tier networking model is taking place as well.
The traditional core-aggregate-access model is efficient for traffic that travels “North-South”, which is traffic that travels in and out of the data center. This kind of traffic is typically a web service of sorts–HTTP/S, Exchange, and Sharepoint, for example–where there is a lot of remote client/server communication. This type of architecture is usually built for redundancy and resiliency against a failure. However, 50% of the critical network links are typically blocked by the Spanning-Tree Protocol (STP) in order to prevent network loops, just to sit idly as a backup, which means 50% of your maximum bandwidth is wasted (until something fails). Here is an example:

This type of architecture is still very widely used for service-oriented types of traffic that travel North-South. However, the trends in traffic patterns are changing with the types of workloads that are common in today’s data centers: East-West traffic, or server-to-server traffic. Take a look at the diagram above. If a server connected to the left-most access switch needs to communicate with a server connected to the right-most access switch, what path does it need to take? It travels all the way to the core switch and back down again. That is not the most efficient path to take, and causes more latency while using more bandwidth. If a cluster of servers (this number can be in the hundreds, or even thousands) is performing a resource-intensive calculation in parallel, the last thing you want to introduce is unpredictable latency or a lack of bandwidth. You can have extremely powerful servers performing these calculations, but if the servers can’t talk to each other efficiently because of a bottleneck in your network architecture, that is wasted capital expenditure.
So how do you design for this shift from North-South to East-West traffic? One way is to create a Spine and Leaf architecture, also known as a Distributed Core. This architecture has two main components: Spine switches and Leaf switches. You can think of spine switches as the core, but instead of being a large, chassis-based switching platform, the spine is composed of many high-throughput Layer 3 switches with high port density. You can think of leaf switches as your access layer; they provide network connection points for servers, as well as uplink to the spine switches. Now, here is the important part of this architecture: every leaf switch connects to every spine switch in the fabric. That point is important because no matter which leaf switch a server is connected to, it always has to cross the same amount of devices to get to another server (unless the other server is located on the same leaf). This keeps the latency down to a predictable level because a payload only has to hop to a spine switch and another leaf switch to get to its destination.

Before you design an architecture like this, you will need to know what the current and future needs are. For example, if you have a server count of 100 today and that will eventually scale up to 500 servers, you need to make sure your fabric can scale to accommodate future needs. There are two important variables to calculate your maximum scalability: the number of uplinks on a leaf switch and the number of ports on your spine switches. The number of uplinks on a leaf switch determines how many spine switches you can have in your fabric–remember: every leaf switch has to connect to every spine switch in the fabric! Also, the number of ports on a spine switch determines how many leaf switches you can have; this is why spine switches need to have a high port density. Let’s take the example of 100 servers today with a need to scale to 1000 servers in the future. If we plan on using a 24-port 10Gbps switch for the leaf layer, utilizing 20 ports for servers and 4 ports for uplinks, we can have a total of 4 spine switches. If each spine switch has 64 10Gbps ports, we can scale out to a maximum of 64 leaf switches. 64 leaf switches x 20 servers on each switch = 1280 maximum servers in this fabric. Keep in mind this is a theoretical maximum and you will need to accommodate for connecting the fabric to the rest of the data center. Regardless, this design will allow for seamless scalability without having to re-architect your fabric. You can start off with 5 leaf switches and 4 spine switches to meet your current need of 100 servers and scale out leaf switches as more servers are needed.
Another factor to keep in mind when designing your fabric is the oversubscription ratio. This ratio is calculated on the leaf switches, and it is defined as the max throughput of active southbound connections (down to servers) divided by the max throughput of active northbound connections (uplinks). If you have 20 servers each connected with 10Gbps links and 4 10Gbps uplinks to your spine switches, you have a 5:1 oversubscription ratio (200Gbps/40Gbps). It is not likely that all servers are going to be communicating at 100% throughput 100% of the time, so it is okay to be oversubscribed. Keeping that in mind, work with the server team to figure out what an acceptable ratio is for your purpose.
Advantages:
  1. It is possible to use low-cost 1U or 2U Spine Switches Vs. Expensive Chassis-based Core Switches.
  2. It is possible to start small and expand the Spine/Leaf network by adding more switches, when required, without discarding the existing setup.
  3. There are networking vendors who make specialized Leaf/Spine switches.
  4. It is possible to configure the Distributed Core network to offer maximum redundancy/resiliency. Even if a Spine Switch fails, there will only be a performance degrade Vs. Service outage.
  5. It is possible to achieve higher throughput/bandwidth & connect more servers with Distributed Core networks Vs. Core-Aggregation-Edge Networks.
  6. Leaf/Spine networks can handle both East-West traffic (Server to Server: Cloud computing, Hadoop, etc.) and North-South traffic (Web content, Email, etc.) efficiently. The traditional networking model is more suitable for the latter, and expansion is limited.
  7. It is possible to use Standards-based protocols (even in a multi-vendor setup) to implement Leaf-Spine networks. But some vendors have developed their own proprietary protocols/fabrics, as well.
  8. Distributed Core networks enable Containerized (and Expandable) Data Centers.
  9. Networks can scale up/down/out massively and quickly.
  10. Can handle East-West (Server to Server) traffic efficiently.

Reference 
1. http://www.cisco.com/c/en/us/products/collateral/switches/nexus-9000-series-switches/guide-c07-730115.html
2. http://thenetworksurgeon.com/cisco-spine-and-leaf-architecture-discussion-nexus-5500-vs-6001/
3.http://www.excitingip.com/4490/distributed-coreleaf-spine-network-architecture-an-intro/

No comments:

Post a Comment