What started as a basic VM test has taken on a life of its own; it looks like we'll be walking the virtualization talk, pushing a VM host out near the edge of my production network. Wish me luck ...I wear many hats. One of them involves the care and feeding of a campus network. My organization is in the midst of planning a summer upgrade to fiber, Internet2, etc., via a zippy connection to CEN. While there eventually will be much rejoicing, we're spending a bunch of time redesigning our edge components to make sure our world keeps working when we cut over from our current last-mile providers.
To keep things under control on our existing ~17 Mbps in and ~3.5 Mbps out lines, we rely on a Squid proxy running on a 2.6 distro of Debian and a homegrown Perl-based "Exiler" traffic-shaping tool running on top of BSD 4. Exiler sits transparently on our default route while the Squid box is a sidestep off our core switch, annoyingly soaking up supervisor module CPU cycles under heavy traffic loads due to the loop-de-loop hops for cache misses. Damn you, iTunes U and your legitimate giant files.
So why am I writing about this in a virtualization blog? Stay with me.
1. We hope to incorporate Squid and Exiler in our new scheme, but we're facing a number of design and performance "unknowns" related to system resources under high bandwidth utilization, specifically NIC I/O for the shaper and disk I/O for the proxy. 1 Gbps is a bit denser than 17 Mbps, and both existing boxes are older, lightweight P4s.
2. We'd like to run both apps on the same box, but we don't want to deal with migration/rewrite issues for the different flavors of Linux.
3. We'd like to get the proxy "loop" off the core.
4. We need to improve redundancy at our edge.
Our first thought: Let's build out a decent Xen test box, p-to-v our BSD and Debian instances, and beat the heck out of the configs till we get it right. After tweaking VM configs and collecting lessons learned we'd build a couple of new, appropriately sized Linux boxes. We end up with everyone happy, aggregate time savings thanks to snapshots, quick tear-down, and the adjust-on-the fly fun of VM testing. Heck, this is why x86 virtualization started catching on in the first place.
But ... we'd still be left with two (transparent) hops and two potential single points of failure in our outbound route. Our design included failover paths from DMZ to redundant edge routers, but all caching and shaping functionality would be kaput if either box had a really bad day.
Many hours, one pot of coffee, and a few dry-erase stains later: we've ended up looking at a possible virtualization solution for production if everything plays out as hoped. Our biggest concerns are now ... network and disk I/O performance (sound familiar?) for guested instances under load. We'll be using speedy, dedicated local drives to keep the design straightforward and costs low. We're hoping that the intrahost virtualized network will yield some benefit for VM to VM hand-offs vs. a real-world wire hop.
We're still going ahead with VM testing first, of course. We have the luxury of time and ready labor to try a couple of hypervisor platforms for performance and functionality comparisons.
I've laid out VMs in production and the test lab for app hosting, LDAP, file services, and traditional network goodies like DHCP, DNS, and NTP. A quick scan of VMware's appliance roster reveals 167 hits for Web cache offerings, so we know it's been done before. Heck, one of 'em even bundles a bandwidth shaper.
I've never played with VMs at the edge of a production network before, so I guess we'll be walking the talk.
And that talk includes all the usual VM-goodness of production flexibility, redundancy via some flavor of live-migration to a second host box, and a streamlined process for incorporating additional VMs or virt appliances (e.g., IDS, content management, etc.) as needed without modifying the physical infrastructure.
I hope it works out.