Saturday, August 18, 2012

Grid-based Storage

When I introduced the Storage Evolution, I shared some topics I talk about when meeting with a customer: Virtualization, Deduplication, Grid and Encryption. That list has grown to include Information Lifecycle Management (ILM), Convergence, Business Continuity, Clouds and Big Data. It’s about time I start writing about more of these.

The world of storage is changing – fast. When I started consulting, I used to install clusters and super-computers. My specialty was IBM’s SP super computer (like Deep Blue the one that played the Russian chess champion Garry Kasparov). My wife asked if they wore capes.

The super computer market fell apart over a decade ago to grid-based systems. What used to cost millions of dollars was swept away by inexpensive commodity Intel-based servers, usually running Linux and grid software. The market changed and super computers started to become extinct. What used to be a scale-up model became a modular massively-parallel model which became a highly distributed model. The market changed.


When you go to deploy an Internet facing app today, you are not building one large web server, but many web servers, backed by application servers backed by database servers. A load balancer distributes the incoming traffic amongst the various servers to maximize throughput and minimize response time. This again is a scale out, not scale up model.

This is the basis of the grid (or farm if you prefer). Many distributed nodes are balanced, distributed and protected so there is no single point of failure. Server and desktop virtualization is protected N+1 and can easily scale. We repeat this model throughout the enterprise.

Storage is no different.

EMC has: Avamar, Isilon, Atmos and Centerra; HP: Lefthand and 3Par; IBM: XIV and SONAS; NetApp: Cluster-mode. There are others like Hadoop which distributes the data into different function nodes. There are products that go half-way, such as EMC VMAX and IBM SVC; these effectively partition the data amongst redundant controllers but the data and throughput is siloed without being completely distributed.

Why Grids

Why grids: mainly because we need them. Imagine storage that scales with you. Instead of head swapping to a larger controller, you simply add more nodes. You now have a scalable system not just at the spindle-level, but also at the controller level.

As solid-state/flash disk becomes more prevalent in the data center, the bottleneck moves from the spindles to the controllers. As storage efficiencies such as compression, deduplication, thin provisioning, WAFL, etc. continue to grow in use, this puts more strain on the controller and it’s processing power. Snapshots, mirroring, replication and NAS all add overhead. The result is the controller, which has long been over-powered, is starting to strain.

We’ve been lucky. For most of my customers, as throughput has grown, so has capacity and spindle count. We’ve mostly been keeping pace. FlashCache, FAST Cache, auto-tiering and SSD have helped control this, but we’re getting close to the breaking point. Storage efficiency software combined with low-latency solid state is pushing controllers to the brink.

We’re growing beyond a dual-controller solution. When IBM came out with real-time compression, they stated you need a lot of free CPU on the v7000/SVC. If EMC or IBM adds dedupe, we’ll see what they do to controller loads. NetApp cluster mode arrived just in time for two of my larger customers. Whatever the cause, we’re running out of bandwidth within the controllers.

This isn’t a terrible worry. We’ve been moving toward grids for some time. We have to. Without them, we’ll eventually run out of gas. Today, I can architect any solution, but I may have to partition the data: multiple controllers, multiple SVC nodes, or multiple VMAX engines. It’s not the best solution; we end up  with silos of data. Where do we want to go: a single-managed intelligent grid. One set of data that self-balances, self-tiers, distributes itself elegantly, without external add-on tools.

This is where we’re heading. It will be a better world. We will be able to start small and grow performance and capacity like we are used to in the virtualization cluster, the web farm, the VDI farm. We’ll grow our storage the same way. We’ll be much better for it.

No comments:

Post a Comment