Cavium is well known for high performance multicore network processing chips (see A 40 Gbit/s Bump-in-the-Wire). It latest chip, the 28nm OCTEON III (Fig. 1) is built around 48 MIPS64 cores and includes major hardware accelerators to handle everything from RAID to packet processing. The individual, 2.5 GHz MIPS64 cores have some impressive specs including a new superscaler, out-of-order execution system.

OCTEON III MIPS64 core features

  • 2.5 GHz superscaler core with out-of-order execution
  • Data cache: 32 Kbytes, 32-way
  • Instruction cache: 64 Kbytes, 32-way
  • Branch history: 16 K entries
  • Jump prediction table: 512 entries
  • Load-to-use latency: 3 cycles max
  • Instruction enhancements: branch prediction, prefetching, pipeline scheduling
  • Built-in packet processing
  • Built-in crypto security
  • Shared: L2 cache
  • Shared: four hyper access DDR3/DDR4 memory controllers, up to 256 Gbytes/chip

A cross bar switch links high speed SERDES to peripheral interfaces that include most Ethernet flavors up to 40G, 6G SATA, PCI Express Gen 3 and Interlaken-LA. The system supports up to four PCI Express ports and four SATA ports. The chip has a bandwidth greater than 500 Gbits/s.

Fig. 1: The OCTEON III's 48 cores have access to an array of hardware accelerators that address storage, networking and security.

One area the OCTEON III excels in is deep packet inspection (DPI). It includes a NEURON search processor with IPv4 and IPv6 support along with up to 64 patent–pending Hyper-Finite-Automata (HFA) engines. The OCTEON III's DPI performance is 2.5 that of the OCTEON II delivering up to 100 Gbit/s DPI for full wire speed analysis.

The system support a rich syntax for the DPI Rules including PCRE and POSIX syntax. It can support very complex rules that incorporate back references and capture groups. Best of all there are no system bottlenecks so flow between engines are not locked. This type of DPI support is very handy for Intrusion Prevention Systems (IPS) and antivirus filtering.

The processors have their own security acceleration providing line rate crypto support up to 100 Gbit/s for the top end chip. Compression support runs at speeds up to 50 Gbit/s. Software based solutions provide flexability especially when it comes to handling new protocols. Software support addresses all of the latest security standards and crypto algorithms.

Up to eight OCTEON III's can be linked together using high speed OCI (OCTEON Coherent Interconnect) links (Fig. 2). From a programming perspective the array of chips appears to be a single entity. All the cores have access to all memory, up to 2 Tbytes, and to all the hardware accelerators regardless of where they reside.

Fig. 2: Up to eight OCTEON III's can be linked together using high speed OCI links. The array appears to be one chip from a programming perspective.

The OCTEON III employs a range of power management features. Power gating shuts down unused hardware eliminating static and leakage current to those areas. Ultra fine-grain dynamic clock gating reduces active power for unused functional areas while the system is processing. Core speeds and power requirements are controlled individually. Overall the family delivers in impressive 2 GHz/W.

The OCTEON III definitely packs a lot of power into a single chip. It has garnered significant support from hardware vendors such as Emerson Network Power, Advantech, and Kontron as well as software vendors such as MontaVista, 6WIND, TeamF1, Vineyard Networks, and tool vendors including Lauterbach and Macraigor.

Cavium provides a common SDK that also handles the OCTEON II and OCTEON PLUS platforms. Cavium provides APIs for all the integrated hardware accelerators. MontaVista is part of Cavium an it provides a carrier class Linux.

The Eclipse-based IDE includes the GCC 4.x toolset along with a multi-core OCTEON hardware debugger. The open source gdb debugger has been enhanced to handle the multicore hardware.

Cavium's architecture provides significant scaling capabilities while retaining a simple (relatively speaking) programming model. The scalability allows the chips to fit into lower end applications like security appliances and gateways while handling heavy duty chores in enterprise switches. The RAID support comes into play with data center and cloud storage applications.