Retooling the Datacenter: Proof of Concepts and Bakeoffs

If you’ve got the time, we’ve got the gear.™

Storage is the slowest resource in the datacenter. We measure things in milliseconds, while other components are in micro or nanoseconds. We’re increasingly asked to push more and more data through pipes faster and faster. When storage fails: screens go blue, kernels panic, things get ugly. Some call this a resume generating event.

For this reason, storage professionals as a general rule are a conservative bunch. We resist change. We want things to be safe and mature with low risk and no bugs. We avoid the bleeding edge. Some wait months or years after new products are released before they considering deploying them.

This approach doesn’t always serve us well.

A few years back, I had the privilege of attending the initial launch of Cisco’s Nexus product line, traveling out to Palo Alto for a partner summit. The technology looked revolutionary. Taking FibreChannel concepts and applying them to the Ethernet world yielded amazing features: non-blocking Ethernet with guaranteed delivery, virtual output queues, FCoE OTV and many others on the horizon that we weren’t privy to at product launch. The distributed line cards were interesting, the 1000v confused us before we realized how we could take the fright out of mixing ESX and VLANs (a network administrator’s nightmare), not to mention QoS, configuration consistency across the cluster and statistics. Vendors were developing Converged Network Adaptors (CNAs) and Unified Target Adaptors (UTAs) marrying a NIC with a HBA. But the thing that really mattered: the TCO was less than SAN and IP network gear sold separately.

What else really mattered? The economy sucked, budgets were tight. Anyplace you could save money made people interested. However, not everyone was sold on it.

Other SAN and IP switch vendors balked at the technology saying it wasn’t ready for prime time. “Where’s multihop?” they cried while rushing to release products of their own. Others exclaimed it would never take off. I have to seriously ask them, using my best Vazzini voice, “You fell victim to one of the classic blunders, the most famous of which is never get involved in a protocol war against Ethernet – but only slightly less well-know is this: never go against Cisco when network convergence is on the line!” In years gone by, I deployed Token-Ring, FDDI and ATM. Ethernet took them all down. I’ve seen TCP/IP take over telephony and video conferencing. FibreChannel giving ground to FCoE is only a matter of time. (By the way, multihop has arrived.)

So three years ago we architected and sold FCoE with Nexus, and our competition didn’t get it.

Then NetApp came out with a FCoE UTAs followed by EMC. Now we no longer needed dedicated FC ports in a Nexus switch. Then Cisco surprised us all by coming out with Intel servers with the launch of UCS. Now we had this distributed blade system with the brains in a UCS fabric module: a souped-up Nexus 5000 switch. Suddenly Cisco’s master plan to take over the datacenter became apparent. The TCO was amazing, others balked again.

So we sold UCS while our competition didn’t get it.

Then EMC, Cisco and VMware formed Acadia, which begot VCE and vBlocks. NetApp followed suit with FlexPods. Suddenly we could create or collapse a datacenter into one rack or just a few, depending on the size of deployment. Again people didn’t get it. It was new. It was different. It was fully qualified without a ton of interoperability testing. Meanwhile HP and IBM cobbled together so many products it looked like an integration nightmare.

So we sold vBlocks and FlexPods, while our competition didn’t get it.

But we didn’t stop there. There was a different set of requirements growing, taking business continuity to the next level. Instead of a 5-20 minute recovery time, our customers were asking for zero downtime. So we architected Long-Distance VMotion (LDVM) with EMC VPlex and NetApp MetroClusters, Cisco Nexus, load balancers and firewalls. This was taking advantage of a different usage of Nexus gear, that of Overlay Transport Virtualization (OTV), routing Layer 2 over Layer 3, much the same way that FibreChannel was routed between datacenters using Transit VSANs and Inter-VSAN Routing. This is what happens with the SAN and IP development teams get together. Later they would give us FabricPath.

We tried to do this with IBM SVC, but IBM wouldn’t go there. They stopped at 10 km, the campus boundary. So with EMC and NetApp, we now offered a different solution, one that could move VMs between metro distanced datacenters without shutting anything down. (IBM recently introduced an enhanced stretched metro cluster in December.)

We sold LDVM while our competition didn’t get it. Like I said, storage professionals are a conservative bunch.

The Proof of Concept

New technology on the bleeding edge doesn’t have to leave you bloody. I would never recommend taking something new straight to production. I avoid risk like the best of them. But new technology can be successfully deployed by use of the Proof of Concept (PoC). This let’s us put it in, kick the tires, work out the kinks and see that it works as advertised.

We have been successfully deploying all of these solutions, safely, by use of proof of concepts. When the technology is at its newest, the early adopter phase, we plug a PoC into a customer’s development gear. It has a chance to shine or fail, without impacting production workloads. By the time development has moved on to testing, QA, staging and finally production, it has all the kinks out. And it works!

The PoC doesn’t always have to be used in the traditional sense: do I want to buy this? It can also be used to fully test new technologies in your datacenter in a pre-production sense. When that new TSM with DB2 version came out, we used PoCs to test the migration at each customer to make sure the migration didn’t break anything. The PoC model works with new technology making deployments more successful. You can test it then deploy it and reap the money saving benefits without getting blood on your hands.

These technologies are still a bit radical for some people, and the traditional use of the PoC is still requested. Do vBlocks/FlexPods really work? In it goes, the VMs get loaded and voilà testing commences. Most roll into production and never have a chance to leave. They are purchased at the conclusion of the trial.

We’ve been deploying these for years now. Most of these technologies are now mature, no longer in the early adopter stage. As new features or technologies roll, we will safely test them out, in our lab, then a PoC.

Bakeoffs

As a general rule, I’m not a fan of bakeoffs. Bakeoffs are when you pit two vendors against each other in a grudge match. A lot of time and effort go into setting everything up before the test. I am however confident that they will perform well. I architect everything on the pre-sales side to be right-sized – balancing head size, spindle count, capacity and projected growth over the life of the system.

Each vendor optimizes for different results and philosophies. Some offer rigidity in favor of being more fully qualified and tested. Others offer flexibility with less thoroughly testing all combinations and software releases. They are both valid and map with different IT organizations philosophies quite well. One size does not fit all.

The issue I have with most bakeoffs is: most of the time everyone passes the test. The architectural choices often determine the speed of the solution, not the vendor. The buying decision usually falls to the bottom line.

Of the vendors I sell, each solution can be architected to work well. Whether it’s EMC, IBM or NetApp, I can make it work for you. It’s really not that complicated if you know what you’re doing. Head size is a matter of performance sized for today and growth over 3-7 years. Spindle count will be dictated by throughput, with a cushion (overhead). Once desired capacity is known, drive size is determined knowing how many spindles you need to hit throughput optimizing for the capacity target.

So when I help someone decide what solution will work best for them, optimizing the value comes into play quickly. If I’m not offering the best value, my competition is. Always offering the best value is a secret to my success.

Doing Amazing Things

Where I’m going with all of this is some of these newest and coolest technologies are actually money saving. Business as usual often isn’t. Convergence reaps rewards for your CapEx and OpEx. Taking your datacenter to the active-active model isn’t as out of reach as you might expect and not that different in management.

Some of our smallest customers are reaping the rewards. Some of our largest see the benefits. This isn’t a Fortune 100 sized solution, it’s often a money saving one. We’re saving money for customers large and small. We’re improving availability for all types. It’s an amazing time to be in this field.

So next time you think these solutions are too new, know that people have been putting them in for years with us. Next time you think availability paradigms haven’t changed, understand that people are reaping the benefits today. It’s a rapidly changing landscape, and picking the right partner often can be key. Some of us have been deploying these solutions for years now, while others are just jumping into the pool and offering them.

Is your technology partner seasoned? If you have new technology you want to deploy, is there a proven safe way to get to where you want to go? Can your partner deliver in all the areas they need to: network, storage and compute?

By the end of this year, we will have deployed many more vBlock, FlexPod and Long-Distance VMotion solutions: safely and successfully. Can your partner do that?

If you’re not sure or you don’t trust them, drop a line and we will put in a PoC. If you’ve got the time, we’ve got the gear.™

Retooling the Datacenter

Pages

Tuesday, April 24, 2012

Proof of Concepts and Bakeoffs

The Proof of Concept

Bakeoffs

Doing Amazing Things

No comments:

Post a Comment

Followers

ShareThis