Learn about some innovative techniques for troubleshooting hard-to-find problems and validating the reliability of 10GbE-enabled networks.
The development of 10 Gigabit Ethernet (10GbE) opens the door for unprecedented capacity, speed, and distance for Ethernet-based networks. A Dell�Oro study shows that spending on 10GbE will grow more than five times from 2004 levels, exceeding $2 billion by 2008.
The growth in gigabit connectivity within the enterprise is building a huge demand for high-speed Ethernet in the backbone both within and between campuses and to connect server farms within the corporate data center.
Using 10GbE in the backbone is the easiest way to scale because it leverages the technology of the hundreds of millions of nodes that run Ethernet today. It also enables Ethernet to match the speed of the fastest technology currently used in the WAN backbone (OC-192). Ethernet can serve as the end-to-end Layer-2 transport technology for LANs, MANs, and WANs.
What is a 10GbE Network?
The technology behind 10GbE networks was ratified in IEEE 802.3ae. The familiar IEEE 802.3 Ethernet medium access control (MAC) protocol, frame format, and frame size are used in 10GbE networks as well. The physical layer (PHY) interface comes in several flavors including LAN, the most common interface type, and WAN, which can interface to existing SONET/SDH networks.
Why Test 10GbE Networks?
The widespread adoption of 10GbE is inevitable and will impact every network worldwide. Although there are many benefits to these high-speed networks, associated risks exist anytime hardware or software is upgraded.
It is essential to independently validate the functionality, robustness, scalability, and performance of these new devices in the lab before installing them in production networks. By performing a series of tests, you can help ensure that access to key revenue-generating systems is not jeopardized when the 10GbE capacity is added.
10GbE Network Testing Challenges
The two main challenges in testing 10GbE networks are scalability and reliability. As a consequence of the higher bandwidth now available, transmissions from more subscribers will be carried over a single link. The network must be able to scale to handle this large number of clients.
The second test challenge involves reliability. Because more critical traffic will be passing over relatively fewer high-speed links, network outages will have greater impact and must be mitigated.
In addition to addressing these challenges, the tests that are performed on lower speed networks still apply since the underlying Ethernet technology is the same. For example, performance tests that measure throughput, loss, and delay still are essential. These tests need to be performed under both realistic and stress conditions.
Measuring 10GbE Performance
Although it may be stating the obvious, to stress test a 10GbE device, you must be able to send full line-rate IP traffic to the device for sustained periods of time. However, it is not enough to simply blast random packets at the DUT. These packets need to be crafted to recreate the realistic conditions that will be experienced in a live network.
A large number of clients will be transmitting data over a single 10GbE link. How can this be simulated in the lab? A traffic generator can be used to create multiple flows of data, with each of these flows representing traffic coming from a single subscriber. These flows can be grouped into streams of data, with each stream containing packets with similar characteristics such as packet size.
Converged triple-play networks carry many types of traffic, such as video, voice, and data. This means that there will be a mixture of big and small packet sizes as well as a mix of constant and bursty traffic.
Data and voice packets have different characteristics. Both types of traffic profiles should be used to test whether the device�s buffers can handle these traffic profiles.
A variety of packet sizes is simultaneously present in the Internet. This mixture of packets is commonly referred to as an Imix. Some packet sizes occur more frequently than others:
� 40 Bytes: typically TCP SYN packets that are sent at the beginning of a TCP session.� 576 Bytes: TCP packets from old implementations that use this maximum segment size (MSS).
� 1,500 Bytes: Packets corresponding to the maximum transmission unit (MTU) size of an Ethernet connection. Most data transferred on the Internet consists of full-size Ethernet frames.
Table 1 presents a suggested Imix composition.
Table 1. A Complete Imix
Once the traffic has been adequately characterized, measurements need to be made. Some of the most important measurements include throughput, latency, packet loss, and jitter.
These measurements are especially critical for video and voice over IP (VoIP) traffic that is carried over 10GbE networks. If these values become too large, the client will perceive a lower quality of experience (QoE) resulting in poor customer satisfaction.
The Internet engineering task force (IETF) has defined Request for Comment (RFC)2544 as the Benchmarking Methodology for Network Interconnect Devices. This standard describes how to measure and report performance characteristics so network devices from different vendors can be compared and evaluated. A binary search algorithm determines the optimum performance of the device. Many test instruments provide an automated set of scripts for testing devices against this RFC.
Quality of Service
A high-level definition of quality of service (QoS) is the assurance that traffic meets certain performance criteria. This may mean different things depending on the type of traffic as well as the service level agreement (SLA) that is in place for a particular customer.
How QoS is configured also can differ depending on the traffic that is being sent. For Ethernet networks, one key way to signal the importance of the traffic is through the user priority field in the virtual LAN (VLAN) tag. IEEE 802.1p defines how this field can be used to specify up to eight priority levels.
When combined with 4,096 possible VLAN IDs, a total of 32,768 streams could result on a single interface. Lab tests should be conducted by transmitting each of these possible combinations to ensure that the DUT can correctly handle a large number of flows (Figure 1). Real-time statistics can be gathered on each of the streams; sorting should occur to make it easy to determine which streams are dropping frames.
Figure 1. VLAN QoS Configuration
At the IP layer, the DiffServ code point (DSCP) fields specify the traffic priority. Again, various DSCP values should be used to classify the packets into high, medium, low, and best-effort traffic.
If QoS mechanisms are properly enabled in the DUT, the high-priority traffic streams should lose the least amount of packets. A test tool must be able to distinguish between the different types of traffic in real time so that any problems can be identified.
10GbE Reliability
Ensuring a highly available and reliable 10GbE network infrastructure is critical. Traffic from thousands of end stations should be forwarded to its correct destinations without any data corruption. There are several elements that must be tested to ensure that the traffic will reach its destination correctly.
� Data must not be corrupted in any way. The integrity of the packet�s header and payload must be preserved as the packet traverses the network.
Most routers can detect errors in the header by examining the header checksum. However, they may not be able to detect bit errors within the packet payload. Payload errors can lead to degraded voice or video quality. Your test tool must check for bit errors within the payload of each packet that is received.
� Traffic must arrive at its intended destination. It is possible that a switching error could occur as the device�s switching fabric becomes overloaded with many packets delivered at 10GbE speeds.
Packets that arrive at an incorrect destination are both a security and a performance concern. A test tool that can examine the packets to determine if any are being incorrectly forwarded is invaluable in identifying these errors.
� An essential security feature on Ethernet switches is to keep a VLAN�s traffic on its assigned VLAN. It is necessary to ensure that traffic does not leak into the wrong VLAN.
Traffic should be tested on as many VLANs as possible; the IEEE defines a maximum of 4,096 VLANs. Although many of today�s devices only simultaneously support a subset of VLANs in the field, it still is important for lab tests to stress the device with the full range of VLAN IDs.
� If packets take different network paths to the destination, or if they go through different queues and buffers in the network device, there is a possibility that some packets might arrive in a different order than they were transmitted. Both sequence error (packets arriving out of order) and misordered packets (packets arriving out of order, independent of frame loss) should be identified.
Testing Network Resiliency
The Spanning Tree Protocol (STP) was developed as a method of loop prevention on LANs, as defined in IEEE 802.1d. Through the exchange of bridge protocol data units (BPDUs) sent between the switches, the STP builds a loop-free network when redundant paths are present.
The STP algorithm removes switching loops by turning off or blocking redundant links that are not part of the STP tree. When a primary link is broken causing a network segment to become unreachable, STP reconverges the network to a stable topology by activating a standby link over which traffic can be forwarded.
The Rapid Spanning Tree Protocol (RSTP) was further developed by the IEEE as a loop prevention method for LANs with faster network convergence and released in 2001. RSTP retains all the benefits of STP while removing the limitation of a significant convergence time (Figure 2).
Both STP and RSTP should be tested, not only to ensure correct operation of the protocol, but also to determine the service disruption time from the point of failure until the network reconverges onto an operational link. The number of lost packets also must be measured to determine the effect on the traffic quality. The service disruption time should be minimal to prevent degradation in quality.
What Happens When Things Go Wrong?
It is inevitable that problems will be identified during the course of testing. Successful testing requires both creating the correct network conditions as well as determining the cause of problems.
Capturing all the frames, including any errored frames, can be useful. However, at 10GbE speeds, the amount of captured data quickly becomes overwhelming. A more useful approach to troubleshooting only captures the data of interest.
Triggers and filters can be used to start the capture. Triggers can fire when packets contain a corrupt payload, are misdirected to the wrong destination port, or arrive out of sequence. The packet that causes the problem should be centered in the capture buffer so that packets leading up to the event can be examined to help determine the cause of the problem. Further analysis can be performed by decoding the packets to examine specific conditions.
Summary
10GbE networks provide many advantages over existing Ethernet networks. However, there also are associated challenges that need to be tested before deployment in a live network occurs.
A test instrument is required that can emulate a broad range of protocols while simultaneously transmitting full line-rate traffic. This traffic needs to look realistic, with varying packet sizes and a multitude of protocol signatures. Finally, there must be a fast way to gain insight to the root cause of problems that might occur.
About the Author
Rick Ruta is a product manager in the Data Networks Division of Agilent Technologies. He has worked for Agilent and HP for the past 10 years. Mr. Ruta graduated from the University of Alberta with a BSc in electrical engineering. Agilent Technologies, Data Networks Division, Vancouver, Canada, e-mail: [email protected]
FOR MORE INFORMATION
on testing 10GbE networks
www.rsleads.com/511ee-214
November 2005