Monday, January 17, 2011

I’ve Got The Remote Replication, Single-Storage Image MPIO Blues

There are not a lot of customers I meet that don’t want some form of replication to a disaster recovery/colocation facility. What used to be financially unreachable has come down over 10-15 years to be affordable for most businesses. Remote replication, coupled with VMware or one of the other hypervisors providing server virtualization has made recovery quick, easy and within budget.

So as I look at some of the new storage systems being released lately, I’m scratching my head. Why would an affordable small to medium business mid-tier storage system provide only FibreChannel-based replication – today?

When I started doing SANs, they were SCSI, SSA and ESCON. None of these were that scalable in terms of connectivity. FibreChannel really opened up SANs to consolidate and centralize storage. And it was necessary. Without FC, we had many storage islands. We had clusters that needed terminators pulled. We would not have the VMware farms of today. It started expensive enough by itself, but iSCSI was a low cost alternative and really brought the price of SANs and FC down to the street prices we see today.

So when I look at the latest crop of storage from vendors (and you know who you are), I am floored when replication is still FC based, expecting dark fiber, DWDM, CWDM or FCIP to extend from one data center to another (and I’ve done all of these). To put it another way, everybody has TCP/IP connectivity between datacenters. Most of these same storage subsystems have iSCSI interfaces. So how hard can it be to send that same replication traffic over TCP/IP? The interface is already there! Let me do the IP QoS engineering. I’m good at it.

So imagine you are the Storage Solution Architect at a large and influential $9B reseller and you walk into a customer and am asked what each vendor has to offer? They want to replicate storage from one site to another and they have a budget under $200K. Guess what, those FC only replication solutions are knocked out. As much as I love and believe in that solution, as much as roadmaps say they might someday offer … they’re outta there.

So I’m may be head over heals about some of these new storage solutions. I may really believe in their functionality, their ease of use, their pizzazz, but when the rubber hits the road, they just don’t cut the mustard.

There are places they fit and there are customers that either have the infrastructure or can afford it. For those that don’t, I have the remote replication blues.

Single-Storage Image

Vendors, if you are still reading I have another message for you. Customers crave long-distance VMotion. Don’t wait for Gartner or IDC to tell you on a quadrant or chart. They are screaming for it.

I am currently evaluating hardware reference architectures from several vendors to accomplish this goal. Many fall short in one way or another, but wouldn’t take a lot of effort to fix (at least from my seat).

What I really want is a single-storage image. A stretched storage subsystem between two different geographic locations within metropolitan mirroring distance, that is 200 km circuit distance (although 400 km would be nice). I want a single WWNN between the storage cluster or grid. I’d prefer to have two nodes at location A and two nodes at location B, sharing one WWNN, appearing as one storage subsystem. I want volumes writable on both sides, kind of like EMC vPlex, NetApp MetroCluster or IBM split-node SVC. Each of these have some limitation I run into.

I want to vMotion from A to B. I want to change MPIO active/passive paths to be primary to the local controller at site A, passive to site B and flip them when I vMotion. I want to rely on vmfs to take care of my locking and coherency, which it does already. I want at least 100 km of distance. I want mirroring to be internal and transparent between the nodes of each site. I want Cisco IVR with an isolated fault-domain (transit VSAN) in-between sites. I really don’t have all of this yet.

I can get vMotion working, but there’s still a tromboning of reads or writes (due to the MPIO not being flipable). I don’t get enough node-pairs on each side. I need a quorum tie breaker. I need over 10 km. I need IVR and ISLs. I’m close, but I’m not there yet.

XIV seems close with it’s grid, but can I split the grid between locations? I think the latency (with XIV only) would kill the performance.

All that being said, the technology is doable with what I’ve got and I am designing it and building it. I may not have everything, the distance isn’t what I want here, I have one node at each location in that solution there, yadda yadda. I can do NFS (but that comes with its own issues). With FC on ESX, I likely need to do this all with ALUA. (I can’t load custom MPIO drivers, there goes Open System Hyper Swap.)

So now you see why I’ve got the Single-Storage Image MPIO blues.

Long Distance vMotion – The Holy Grail

Forget all the clouds. It’s winter here in Minnesota and it gets cloudy for weeks at a time with cold and snow. CIOs may be speaking of clouds, but IT managers, CTOs and Enterprise Architects want long-distance vMotion. Two local sites, one shared subnet, one shared single-storage image and a VMware grid with nodes at each of the two locations. Add SRDF/A, GlobalMirror, SnapMirror and I have 3 site geographic replication.

Give the people what they want: VM mobility.

Everyone I know as asking for it. People are scratching their heads. Until they get it, they will all have the Remote Replication, Single-Storage Image MPIO Blues.

No comments:

Post a Comment