Scale-out storage is a network-attached storage (NAS) architecture in which the total amount of disk space can be expanded as needed, even if some of the new drives reside in other storage arrays. If and when a given array reaches its storage limit, another array can be added to expand the system capacity.Scale-out storage differs conceptually from the older scale-up approach.
In a scale-up storage system, new hardware can be added and configured as the need arises. The main advantage of the scale-out approach is cost containment, along with more efficient use of hardware resources.
Before scale-out storage became popular, enterprises often purchased storage arrays much larger than needed, to ensure that plenty of disk space would be available for future expansion. If that expansion never occurred or turned out less than expected, much of the originally purchased disk space went to waste. With the scale-out architecture, the initial investment can be more modest; if the storage requirement expands beyond expectations, new arrays can be added as needed, without limit.
In theory, scale up storage appeals because the data center can start small and add capacity and performance as needed. Do these theoretical advantages apply to the use cases in which storage is most commonly deployed; databases and virtualization?
The problem is that scale out storage systems are more expensive to build, implement and maintain. There are many use cases for scale out storage; it is most ideal for situations where meeting a high capacity demand takes precedence over a performance demand.
In scale up architectures, all the performance and capacity potential of the storage system are provided in a single controller unit, typically upfront. Current scale out architectures provide performance and capacity as storage nodes (servers with internal capacity) are added to the infrastructure. These architectures have their ideal use case depending on performance and capacity demands. As stated above, the appeal of a scale out storage system is that performance and capacity can be added incrementally as needed.
Should we scale performance and capacity at same time?
Performance and capacity operate on different vectors and are not necessarily linked together.
In a scale up architecture, all the performance is delivered with the unit upfront where capacity is added, as needed, to the system. While performance can’t necessarily be scaled, it is delivered in its entirety up front and essentially is a fixed cost with no surprises.
One side effect of scale out storage is that the nodes typically need to be homogeneous. Each node needs to have a similar processor chip set and must leverage the exact same size SSDs. A scale up system could intermix SSDs of different sizes and even different types as new flash technology becomes available.
Is Scale-up performance really a big issue?
While the scale up lack of performance scaling is often cited by scale out advocates, the reality is that the overwhelming majority of applications can’t push current scale up systems. Additionally, some scale up systems can do a periodic controller unit upgrade. So as processing technology continues to advance, the head can be upgraded to offer more performance to the existing storage shelves. As a result, there actually is some performance scaling capability in scale up systems.
Some scale up vendors have the ability to add a scale out design to their architecture if the need ever becomes relevant. It is hard to imagine that processing technology would fall behind storage I/O performance, but if it were to happen, this is the ideal way to scale; scale up completely first, then start scaling out if performance exceeds the capabilities of the current processors.
Is Scale-out cheap or at least cheaper than Scale-up?
In storage there are two hard costs to be concerned with. The first is the initial purchase cost. In theory, this should favor a scale out storage system since it can start small. But again, current scale out designs need to have an initial cluster created or they need to deliver high availability in each node. Counting on the cluster for HA requires the purchase of potentially more performance and capacity than the customer needs because more nodes are needed initially. Building HA into each node requires added expense per node, probably equivalent to the scale up storage system.
A case could be made that a storage node could be delivered less expensively than a scale-up controller unit. This would require that the first option be chosen, that nodes are delivered with no HA and require a quorum to do that. Again, buying multiple nodes eliminates that advantage and it leads to node sprawl because nodes have to be added to address performance issues, not capacity issues.
At a minimum, the initial cost difference between the scale up and scale out implementation types may be a wash. When implementation time or time to data is factored into that equation then scale up systems have a clear advantage. It simply takes longer to install more pieces and get those pieces working together.
The second cost, incremental cost, is an area where scale out storage should have an advantage. But again the limits of current scale out designs tell a different story. The only way a scale out All-Flash system would have a cost advantage is if the need for expansion is being driven by performance instead of capacity. But as mentioned earlier, the overwhelming majority of flash vendors and customers report that they can’t exceed the performance of a single box. So any scenario that would justify a scale out deployment will probably not happen in most data centers.
A theoretical advantage to scale out is how simple it is to expand. “Like adding Lego blocks” is the common analogy. However, current scale out systems don’t actually “snap” together, they are a series of individual servers with clustering software that must be carefully networked together for maximum performance and availability. This combination makes initial implementation more complex and it makes ongoing upgrades something that needs to be carefully planned.
Scale up architectures are actually relatively simple. All the capabilities, at least from a performance perspective, are delivered upfront. There is nothing to “click” in. Capacity can be added incrementally either by inserting drives into the existing shelf or adding shelves to the existing storage controller. While adding shelves also requires planning, the capacity per shelf is high and as long as the scale-up array can do non-disruptive upgrades, no down time should result.
Scale up storage, while having the disadvantage of buying all the performance capabilities up front, has the dual advantage of more incremental capacity expansion and a less complex backend infrastructure. And leveraging data in-place storage controller upgrades can easily eliminate the lack of performance scalability.
source: Storage Switzerland, EMC, Dell, and 1010data