Building Blocks for Solid Networks
By Paul Venezia on Sep 07, 2010
The network is a datacenter’s foundation. In most infrastructures, the datacenter core is constructed differently from the LAN core.Here are the ground rules for building reliable networks.
The term “network” applies to everything from LAN to SAN to WAN. All these variations require a network core, so let’s start there. The size of the organization will determine the size and capacity of the core. In most infrastructures, the datacenter core is constructed differently from the LAN core. If we take a hypothetical network that has to serve the needs of a few hundred or a thousand users in a single building, with a datacenter in the middle, it’s not uncommon to find that there are big switches in the middle and aggregation switches at the edges.
Ideally, the core is composed of two modular switching platforms that carry data from the edge over gigabit fiber, located in the same room as the server and storage infrastructure. Two gigabit fiber links to a closet of, say, 100 switch ports is sufficient for most business purposes. In the event that it’s not, you’re likely better off bonding multiple 1Gbit links rather than upgrading to 10G for those closets. As 10G drops in price, this will change, but for now it’s far cheaper to bond several 1Gbit ports than to add 10G capability to both the core and the edge.
In the likely event that VoIP will be deployed, it may be beneficial to implement small modular switches at the edge as well, allowing PoE (Power over Ethernet) modules to be installed in the same switch as the non-PoE ports. Alternatively, deploying trunked PoE ports to each user is also a possibility. This allows a single port to be used for VoIP and desktop access tasks.
In the familiar hub-and-spoke model, the core connects to the edge aggregation switches with at least two links, either connecting to the server infrastructure with direct copper runs or through server aggregation switches in each rack. This decision must be determined site by site, due to the distance limitations of copper cabling.
Either way, it’s cleaner to deploy server aggregation switches in each rack and run only a few fiber links back to the core than try to shoehorn everything into a few huge switches. In addition, using server aggregation switches will allow redundant connections to redundant cores, which will eliminate the possibility of losing server communications in the event of a core switch failure. If you can afford it and your layout permits it, use server aggregation switches.
Regardless of the physical layout method, the core switches need to be redundant in every possible way: redundant power, redundant interconnections, and redundant routing protocols. Ideally, they should have redundant control modules as well, but you can make do without them if you can’t afford them.
Core switches will be responsible for switching nearly every packet in the infrastructure, so they need to be balanced accordingly. It’s a good idea to make ample use of HSRP (Hot Standby Routing Protocol) or VRRP (Virtual Routing Redundancy Protocol). These allow two discrete switches to effectively share a single IP and MAC address, which is used as the default route for a VLAN. In the event that one core fails, those VLANs will still be accessible.
Finally, proper use of STP (Spanning-Tree Protocol) is essential to proper network operation. A full discussion of these two technologies is beyond the scope of this guide, but correct configuration of these two elements will have a significant effect on the resiliency and proper operation of any Layer-3 switched network.
Going Virtual
The ability for virtualization hosts to migrate virtual servers across a virtualization farm absolutely requires stable and fast central storage. This can be FC, iSCSI, or even NFS in most cases, but the key is that all the host servers can access a reliable central storage network.
Networking virtualization hosts isn’t like networking a normal server, however. While a normal server might have a front-end and a back-end link, a virtualization host might have six or more Ethernet interfaces. One reason is performance: A virtualization host pushes more traffic than a normal server due to the simple fact that as many as dozens of virtual machines are running on a single host. The other reason is redundancy: With so many VMs on one physical machine, you don’t want one failed NIC to take a whole bunch of virtual servers offline at once.
To combat this problem, virtualization hosts should be constructed with at least two dedicated front-end links, two back-end links, and ideally a single management link. If this infrastructure will service hosts that live in semi-secure networks (such as a DMZ), then it may be reasonable to add physical links for those networks as well, unless you’re comfortable passing semi-trusted packets through the core as a VLAN. Physical separation is still the safest bet and less prone to human error. If you can physically separate that traffic by adding interfaces to the virtualization hosts, then do so.
Each pair of interfaces should be bonded using some form of link aggregation, such as LACP (Link Aggregation Control Protocol) or 802.3ad. Either should suffice, though your switch may support only one form or the other. Bonding these links establishes load-balancing as well as failover protection at the link level, and is an absolute requirement, especially since you’d be hard-pressed to find a switch that doesn’t support it.
In addition to bonding these links, the front-end bundle should be trunked with 802.1q. This allowed multiple VLANs to exist on a single logical interface and makes deploying and managing virtualization farms significantly simpler. You can then deploy virtual servers on any VLAN or mix of VLANs on any host without worrying about virtual interface configuration. You also don’t need to add physical interfaces to the hosts just to connect to a different VLAN.
The virtualization host storage links don’t necessarily need to be either bonded or trunked unless your virtual servers will be communicating with a variety of back-end storage arrays. In most cases, a single storage array will be used, and bonding these interfaces will not necessarily result in performance improvements on a per-server basis.
However, if you require significant back-end server-to-server communication, such as front-end Web servers and back-end database servers, it’s advisable to dedicate that traffic to a specific set of bonded links. They will likely not need to be trunked, but bonding those links will again provide load-balancing and redundancy on a host-by-host basis.
While a dedicated management interface isn’t truly a requirement, it can certainly make managing virtualization hosts far simpler, especially when modifying network parameters. Modifying links that also carry the management traffic can easily result in a loss of communication to the virtualization host.
So if you’re keeping count, you can see how you might have seven or more interfaces in a busy virtualization host. Obviously, this increases the number of switchports required for a virtualization implementation, so plan accordingly. The increasing popularity of 10G networking – and the dropping cost of 10G interfaces – may enable you to drastically reduce the cabling requirements so that you can simply use a pair of trunked and bonded 10G interfaces per host with a management interface. If you can afford it, do it.
Wide Area Networking
When organizations have multiple locations, connecting those locations with fast and reliable links can have a significant impact on the users at that site. Unfortunately, no tried-and-true method of WAN interconnectivity can be applied to every organization. The approach you use depends on the services available at the main datacenter and the remote office site.
In an ideal situation, both sites are served by a single carrier that can drop in fiber links at each location. This will provide the highest bandwidth and lowest latency of any solution, and will probably be cheaper to boot. If this option is available to you, be sure to treat the link as untrusted and use a VPN across the pipe to encrypt the traffic – which, after all, will be flowing across someone else’s network.
Without the same carrier on both ends, you’ll need alternative connection methods. The most popular of these is MPLS (multiprotocol label switching). This is somewhat related to the traditional frame-relay networking model, but is generally cheaper and offers higher bandwidth for a lower cost.
An MPLS network is composed of various links from remote sites into the provider’s cloud. In addition, the main site is linked to the same cloud, usually with enough bandwidth to ensure that all edge sites can fully saturate their link without overwhelming the link to the datacenter. Unlike frame-relay, however, the main site is not a hub site. This means that traffic between remote sites does not pass through the main site; it simply flows through the provider’s cloud.
For example, you might have three remote sites with a single T1 or fractional T3 connection each, and the main site with either a fiber handoff or a fractional T3 as well. All these sites are connected via routing protocols running on the MPLS routers, sharing information with the provider’s routed network. This means that even though the MPLS provider is passing the packets, they all appear to be part of your internal network, using your internal network numbering, exactly like other WAN models. Nonetheless, it’s still a good idea to encrypt the traffic passing over an MPLS network to ensure security.
In many cases, MPLS providers will assume management control over this network, and take responsibility for the configuration and maintenance of the routers running the network. If you don’t have the skills necessary to manage an MPLS network in-house, this is a very low-cost way to implement a WAN. It’s also a double-edged sword, however, because modifications to the network may take a while to be completed, and you’ll always have a part of your own network that you cannot access or modify.
Another, less popular method of WAN connectivity is to use a traditional VPN over Internet circuits. This is by far the cheapest method, but is also the most problematic from a performance and reliability standpoint. It’s constructed by adding a cable, DSL, or T1 circuit to each remote office and terminating VPN connections from those sites to the main site. There is no guarantee or SLA (service-level agreement) possible for this type of connection, and the performance of the connection is subject to the vagaries of Internet communication. It may be a viable solution for smaller offices, but not for larger remote sites.
But there’s another use for VPN connections of this sort: adding redundancy to remote site connections. Should the main link fail for one reason or another, the VPN connection becomes your temporary failover solution. Moreover, this scenario requires an Internet circuit at each remote office, which can do double-duty as an Internet gateway for that site. This can reduce the traffic across the WAN and lower bandwidth requirements for each office, decreasing the overall cost of the network. There is a caveat, however – if you’re running some form of Internet use monitoring, you will need to add those sites to the monitoring scheme in order to see that traffic.
Securing The Network
Network security is the first line of defense for any organization. The low-hanging fruit here is the ubiquitous firewall, which is an obvious necessity, but extends throughout the network to DMZ networks, IPSes (intrusion prevention systems), and so on.
In most cases, the network edge is represented by one or more firewalls behind one or more Internet circuits. Firewall implementation and configuration is quite mature these days and doesn’t warrant too much attention, but a few architectural points are worth making.
For starters, smaller companies can get away with terminating client VPN connections directly on their production firewall, while larger companies should dedicate a VPN concentrator to this task. Generally, a multipurpose firewall will have a subset of the management, monitoring, and control capabilities of a dedicated concentrator, and may be simpler to manage in the long run. Yes, a properly sized firewall can terminate plenty of VPN connections, but a dedicated VPN appliance is the better bet for clients, while the production firewall can be used to terminate any site-to-site VPNs.
Client VPNs take on two main forms: a client-based IPSec VPN or an SSL VPN. The former is generally more robust, but can run into problems when clients are at sites that restrict certain types of traffic, which prevents proper operation of an IPSec client. SSL VPNs, on the other hand, communicate over port 443, which is also used for standard SSL Web communications. An SSL VPN leverages the fact that most sites will pass TCP port 443 through unfettered, and thus can be used in places that may not permit an IPSec VPN. Users don’t need a client pre-installed on their computer to access an SSL VPN. Instead, they simply use a Web browser to initiate the VPN, which may require a small on-the-fly client download.
In practice, most VPN appliances offer both forms of client VPN connection, and it’s a good idea to enable and configure both in order to permit your remote clients every possible method of connection no matter where they happen to be.
It’s also a good idea to implement several VPN group types and organize VPN users into these groups. That way you can impose restrictions on internal resource access on a group basis, which is a far better method than putting every user in the same VPN group and leaving the doors wide open. Unlike the LAN and WAN, you have no control over where a remote user might be, and spending a little time ensuring that they have access to what they need and no more can pay off in spades, should a laptop be stolen or left unattended.
As far as internal network security goes, it’s a good idea for a network of any reasonable size to deploy an IDS (intrusion detection system) sensor in critical locations to flag known attack patterns. Specifically, a sensor monitoring the traffic just behind the firewall and another watching the server VLAN would be a good start.
The network is the circulatory system of any organization, and it needs to be designed, built, and managed correctly to ensure that the rest of the critical systems can function properly. The tips and ideas presented here can go a long way toward making all the trains run on time now and in the future.