Tuesday, July 12, 2011

Long-Distance vMotion: Part 3

This is the third part in a multi-part series on the topic of Long-Distance vMotion. Part 1 introduced us to disaster recovery as practiced today and laid the foundation to build upon. Part 2 built out the Long-Distance vMotion architecture with a few different approaches.

There are some limitations and challenges that we must consider when designing LDVM or other workload mobility technologies. If it were too easy, everyone would be doing it. The first area we’ll address is commonly referred to as the traffic trombone.

Traffic Trombone

Understanding what I mean by a traffic trombone requires a bit of visualization. When we have one site, everything is local. My storage is connected within my datacenter, my other subnets are routed by a local router. The path to these devices is very short, measured in meters, so latency is very small. As we migrate VMs to another datacenter, the VMs have moved, but the network traffic and storage traffic continue to go back to their original router and storage controllers, if we don’t add a little extra prevention. When we send a packet or a read/write, it goes back to the original datacenter, gets serviced, then returns back to our new datacenter where we’re now running. That backing and forthing is what we refer to as tromboning, hence the traffic trombone. (My PowerPoint presentation drives this home.) I’ll address this in two parts: network and storage.