Parallel Processing Tools Arrive For Automata Processor
Tools for developing applications for Micron’s Automata Processor (see “Automata Processor Piques Parallel Processing”) are now available and there is hardware to match. The Automata Processor, or AP, is a parallel processing engine that has characteristics in common with content-addressable memory, FPGAs, and parallel processors but the AP actually much different than all three of these.
In the simplest case, AP can process multiple regular expressions simultaneously. Regular expressions are commonly used in applications for pattern matching. Regular expressions are sometimes used for deep packet inspection.
The AP consists of a set of state transitional elements (STE), counters and other control logic. The current incarnation consists of 49,152 STEs on a 15.4 mm by 12 mm FPBGA. It has a DDR3-style interface but this is not a memory device. All active STEs are used to match an 8-bit input symbol. The system can process at 128 Msymbols/s.
Of course, not all STEs will be active at one time. Only one entry in a regular expression is active at one time. A match normally disables the STE and enables one or more additional STEs. The STEs are linked by a fabric similar to an FPGA. The new graphical development tool (Fig. 1) shows the logical linkage.
Like FPGAs, the AP tools (Fig. 2) take a logical program and convert it into hardware-based linkages. The place-and-route is needed to determine the interconnections between STEs and other logic devices like counters.
AP Workbench tools actually use a programming language called ANML (pronounced animal) for Automata Network Markup Language. It is XML-based. It is possible to use ANML directly but many will likely use the IDE to develop with a graphical interface. Regular expressions can be used but ANML provide full access to the AP functionality that is a superset of what regular expressions can describe.
The AP requires a host processor to set up the system, feed the data stream and process the results. The AP does not really perform other types of computation. It can handle 512 different data streams though. The advantage of the architecture is that if more streams must be processed then add more APs. Each AP would have the same configuration and they use less than 4 W/chip.
The tools are available now including a simulator and debugger. The AP ANML compiler performs optimizations like dead-code removal. It can also spread designs across multiple APs.
The PCI Express-based development board will be available in 2015. The board contains up to 32 APs. They are mounted on DIMMs. An FPGA links the APs.