The searching and classification tables are shared among all ports in a round-robin fashion. Any header replacement is accomplished while the frame is still in the receiver FIFO. From the FIFO, the frames are sent through a serial-to-parallel converter to a 1-Mbyte shared-buffer memory for storage and switching.
The memory is wide enough to switch up to 33 million frames/s. The biggest problem with getting data in and out of memory at these speeds is that a wide memory bus is required. Because it isn't cost effective to use external memory, this must be done on-chip. According to Wong, "The available technology is either SRAM or DRAM. Unfortunately, the DRAM process has a low yield, especially for high-gate-count devices. For its part, SRAM can make use of standard-logic processes, but it comes as a four- or six-transistor cell. Four-transistor SRAM is smaller, but consumes more power, and vice-versa for six-transistor SRAM," he explains.
Taking these issues into consideration, Allayer chose to partner with Mosys, which came back with its one-transistor SRAM that Allayer embedded on-chip. One-transistor SRAM has the advantages of both DRAM and SRAM. With only one transistor, it's both small and cost effective, while it also has relatively low power consumption. Plus, it has been proven to work in such high-density designs as electronic gaming, namely Nintendo. The 1-Mbyte buffer is split into two halves: one half is given over to the 10-Gbit port, and the other half is shared among the 12 1-Gbit ports.
In order to support quality of service (QoS), each output port has four priority queues. Their assignments are based on L2 to L7 classification, the TOS/DiffServ DS field protocol, or the 802.1p priority field protocol. Each output port retrieves the frames from the shared-buffer based on queuing and sends them to the transmitting FIFO.
In L2 to L7, there are 128 freely programmable filters that can work on any field in the packet. Comprising up to 512 rules, the filters make it a rule-based classification scheme. Possible actions include drop, change destination, reassign priority or VLAN tag, and statistics gathering. Not one of these is easy at 10 Gbits/s. Furthermore, the classifier includes a CPU trap as certain protocol packets. This trap is for functions like address resolution or link aggregation (supported by the AL1032) that need to be handled by the CPU.
A key feature of the AL1032 is its ability to disable local switching independently for the receive and transmit channels. Possible permutations include local switching for the Gigabit Ethernet ports only; no local switching, with all data funneled into the 10-Gbit uplink; and local L2 switching for the 10-Gbit port.
By disabling switching altogether, the AL1032 can function solely as a high-end multiplexer in a backplane application. Security, too, can be enabled or disabled on every port. Each port's address can be preprogrammed or frozen so that only those addresses on the allowed list can access the network. An alternative is to disable a port upon detection of intruders.
For overall network management, the AL1032 collects all the management-information-based (MIB) statistics that are required for simple network management protocol (SNMP). Supported MIBs include Ethertype, Bridging, RMON and RMON II, as well as SMON.
As a whole, the device is initialized and configured by an off-chip CPU, which also is responsible for search and table updates, plus management functions. The CPU has a separate, 32-bit/66-MHz PCI port with its own transmit/receive FIFOs. Those also can be employed as a fourteenth port. Alternatively, the AL1032 has 4-kbyte EEPROM support for CPU-less operation in low-cost, standalone applications.
Currently, the XGMII-compatible uplink on the AL1032 is being defined by the IEEE's 802.3ae HSSG, al-though it has pretty much been decided upon. Of more concern are the flow-control methods required to support OC-192 (which runs at 9.6 Gbits/s). At present, there are two proposalsopen loop and busy idle. To ensure compatibility, the AL1032 supports both throttling schemes. The 12-port side's compatibility ranges from the 10/100/1000 MII/GMII to the ten-bit interface (TBI).
When it comes to implementing the AL1032, a key feature is the 802.3ad port-aggregation support that was mentioned before. This allows the grouping of ports to logical fat pipes, with up to six trunks, each supporting up to 12 ports.
Up to 16 remote ports can be supported within an aggregation group. This provides plenty of options in terms of combining AL1032s for an optimum balance of performance versus flexibility. The flexibility and performance combination of the device make it a key enabler in the drive to get data off servers and networks and into the Internet backbone (Fig. 2).
The device uses a 0.18-µm CMOS process, runs off 3.3/1.8 V, and comes packaged in a 721-pin TBGA. With a power consumption in the 5- to 6-W range, it's dwarfed by the expected consumption of the physical layer.
Price & Availability
The AL1032 is sampling now, and production quantities will be available in November. Pricing is $250 each per 1000-unit quantities.
Allayer Communications Inc., 107 Bonaventura Dr., San Jose, CA 95134; Contact Claus Stetter at (408) 570-0888; fax (408) 570-0880; e-mail: cstetter@allayer.com; Internet: www.allayer.com.