This post is written to explain why I believe rack servers are a better choice in datacenters over blade servers.
Both blades and rack servers has their merits, and the choice comes down to some key attributes that makes them different. In this note I will explain the reasoning why I prefer rack servers. I will also briefly point out solutions to watch going forward.
We cannot consider servers without considering where they will be located. A typical enterprise has datacenter facilities set up at different points in time. The facilities will have different characteristics due to the fact that datacenter best practices have changed over the years.
For any datacenter we are constrained with physical parameters like floor space, cooling and power. The general consensus in the Industry seems to be that the highest cost of any datacenter is power and cooling at 70% cost, and the floor space is just a bit over 10%.
Most datacenters are balanced to cool down (get rid off) the power it is designed to consume. Usually this is evenly distributed through the floor space, based on a watt per square meter / watt per rack assumption.
Filling up racks with blades has previously caused hotspots in datacenters, and they have to add additional cooling. It can also cause issues with the power distribution in the datacenter, and in some cases will require some redesign for power as well.
We already experienced this in one of our datacenters where we previously were using blades. After experiencing hotspots, we moved to rack servers to distribute the heat and power more evenly in the datacenter.
Buying servers for optimizing floor space is where you try to optimize on 10% of your datacenter cost. (As longs as we don’t use prime location real estate for our datacenters) Buying servers to optimize for power and cooling is where you optimize for 70% of your datacenter cost.
With that said, experience shows that practically the amount of compute power you can put in a rack is not significant different between rack and blade servers due to the above limitations with power distribution and cooling. Usually the above factors limits the use of max 2 blade chassis in a rack. If each chassis holds 16 compute blades, that gives us 32 compute nodes per rack. Going for rack based servers, even reserving 4U for other equipment in the rack, we can have 38 compute nodes in a typical 42U rack.
It is not uncommon to use more floor space and getting better Power Usage Efficiency.
Another aspect is many datacenters I know have large set of racks that is not filled. If they choose to maximize the current rack space, they can optimize that easier with rack servers as they are more flexible with the number of U’s available in each rack.
How we think of scale units might also impact the choice of rack over blades. As blade chassis are most efficient when they are filled up (space and power efficiency). In reality we often end up with blade chassis not filled. Over time our vendor preferences might change, or requirements might change to something not suitable for that blade chassis.
Open vs Proprietary solutions
Most blade server solutions use proprietary technology from the respective vendor. This also means a blade based solution also requires more proprietary training to support and operate.
Being on a proprietary blade fabric will limit selection of features to what the blade vendor makes available. In some cases, they might even have their own reasons for what to offer, especially if there are competing standards.
This is less of a challenge now than it used to be, as the current generation blades and blade servers can handle larger amounts of memory. However, there is still challenges’ using blade compared to rack servers when it comes to disk configurations and I/O port. I/O ports in blade setups are usually configured at the blade chassis level, and then distributed to the different blades. This means that each blade could have less capacity available compared to rack servers. (due to the sharing of ports with other blades in the fabric). A common stated benefit for blades are less cabling, however, this is not always beneficial with modern workloads. In a hyper-converged fabric setup where storage and compute nodes are scaled out and distributed, combined with workloads adopting micro-services architecture, the east-west traffic in the datacenter is becoming the driver for network design(ECMP, Clos). Rack based servers scale better networking in this case, as well as better for integrating storage and compute in same unit. Yes, there is more cabling, but there is also more throughput dedicated to each node.
With rack servers we will therefore have more opportunities with regards to selecting how we want to build our cloud fabric. Blade servers can easily give a very compute intensive high density setup. However, to build a “cloud” setup we need more balanced servers with harmony between CPU, Memory, Disk and I/O capacity. I believe this is easier and more cost efficient using rack servers.
Operations and management
In addition to having more proprietary requirements for managing the blade and blade chassis, we also need to take into consideration another level of failures with blades. Blades are dependent on their blade chassis, and the blade chassis is an intelligent device that is connecting the blades to the external world. It is true that blade chassis failures are very rare, but still they do occur. Another challenge with blades and blade chassis is that also the blade chassis need management and servicing. This will usually require all workloads running on the chassis to be evacuated to different chassis during the service.
Also because of this we need intelligent placement of workloads that require high availability to avoid them being placed in the same blade chassis.
In a cloud based model a key principle is to reduce your complexity, as complexity gets more difficult to manage as you scale. Using blade factories will add to complexity, and as such rack servers are favorable for operations and management. For workloads to be distributed across failure and upgrade domains, with blades we need to take the chassis into consideration in addition to the rack. With rack based servers, we got one less component to care about with regards to failure and upgrade domains.
A different example of this complexity using blades is cross vendor compability. Blade fabrics are optimized for their own vendors technologies, and the added complexity in integrating technologies across vendors could potentially also bring additional security risks.
Benefits of blades
I would briefly look into some of the strengths for blades. Lately blades and rack based servers are very similar with regards to compute power. In all practically terms they are also very similar in max memory configurations (due to the existence of larger RAM modules).
Blades are quicker to rack up in the datacenter. Even though each vendor has a different way of configuring their blade fabric, they are very efficient to configure, as long as you use one vendor.
Blades deliver a very compact setup that could save floor space, as long as you have the power and cooling in place to support such a dense setup. This is due to the fact that rack servers are minimum 1RU, however, blade fabrics can make each server use down to 0.5RU.
Another benefit of blades is that they are easy to replace when broken. However, over time we might get challenges getting the right blades if we want to keep the blades in a given blade chassis similar. (to reduce complexity). For sake of operational stability and simplification the blades in a chassis usually is kept similar, and configured with same versions of firmware etc.
Blades are often described as more cost-efficient, however, real data does not seem to support this. Rack based servers could be more capex efficient, and I have seen rack based servers often come out around 10% cheaper capex. Interestingly the power/cooling requirements for the blade servers could be higher as well, and might require you to upgrade your HVAC implementation in the datacenter. Because rack servers are where the volume is, the prices will be lower. Also, rack based servers are more replaceable across vendors, and could be scaled in more flexible units.
The future of the servers?
The largest cloud players are embracing “Open” hardware specifications for servers, however the traditional server players seems to be slow to adapt to this. Makes sense, as they most likely look at it as a threat. However, more and more of the traditional server vendors are also having product lines support open compute and similar initiatives.
There is a lot of ongoing research when it comes to datacenter hardware, and this research is especially driven by the large cloud providers like Google, Amazon, Facebook and Microsoft. Research at this level requires a lot of investment, however, much of this is open sourced, and you can participate. This is called the Open Compute initiative. With Open Compute you can find open sourced design and specifications for servers used by cloud providers like Facebook, Microsoft and others for their cloud datacenters. This is high-density compute nodes that could be thought of as a combination of the best from rack and the best from blades. Open Compute is also looking at specifying storage and network equipment.
The industry is moving beyond traditional rack and blade servers, looking into new industry standard architectures that truly disaggregates components of different lifecycle, such as power, compute, storage and network. Most of these designs build the around a new rack design, integrating at the rack level. An example will be the open Intel standard Intel Rack Scale Design, as well as Open Compute.
Some further reading: