Tabula’s SpaceTime architecture
Chips are divided into regions
Signal propagation advantages
Tabula’s ABAX chips take FPGA design to the next level. They implement a virtual 3D architecture by dynamically changing the underlying FPGA definition on each clock cycle. Accomplishing this feat while maintaining compatibility with existing design tools and methodologies has meant overcoming a number of challenges, including changing the underlying structure of the system at a rate of 1.6 GHz.
From a designer’s point of view, the 40-nm ABAX looks like an FPGA. The A1EC06 version has 630,000 lookup tables (LUTs), 5.5 Mbytes of RAM, and 1280 DSP blocks. The chip also has 48 high-speed, 6.5-Gbit/s serializers/deserializers (SERDES) and 920 I/O ports. The various components are arranged in regular blocks like a typical FPGA, including the interconnects, which are configured by the programmer, along with the LUTs.
Tabula’s SpaceTime architecture (Fig. 1) increases the number of LUTs and interconnects by a factor of eight. The company calls the configuration of the underlying FPGA structure a “fold.” The current incarnation of ABAX chips can handle up to eight folds. The trick comes in the form of a time via.
The time via is a transparent latch found on every interconnect on the chip. It lets information pass through to logic configured within a fold. It also propagates the data to the next fold in the sequence. Information in registers and memory is maintained between fold transitions too. The difference is that the time vias are implicit in the system and not specified by the designer, whereas the registers and memory are explicitly allocated.
The approach resembles Achronix’s Speedster FPGA, which employs picoPIPE elements on the interconnects like the time vias (see “1.5-GHz FPGA Takes Clock Gating To The Max”). The Archonix FPGA definition is static, though, and the latches provide a way for data to flow through the system. Also, the picoPIPE elements are more like single-bit FIFOs, allowing asynchronous operation. The SpaceTime time vias, on the other hand, operate in a synchronous fashion.
Tabula’s chips are divided into regions (Fig. 2) that can have different fold definitions and operate at different clock frequencies. Regions can be grouped together, providing larger fold groups. All areas within a common region have the same number of folds and operate in lockstep with respect to fold transitions. Operation at lower frequencies is more power-efficient since the definitions within the region don’t change as often. Different regions can be synchronized when they operate at the same frequency or multiples of each other. This enables synchronous data exchange between regions.
The SpaceTime architecture has advantages when it comes to signal propagation (Fig. 3) as well. A signal moves no faster within a fold than it normally would, but it can propagate significantly farther within the cycle. Each fold allows a signal to move farther from its source. This is the same type of approach used in conventional FPGA design, except that registers must be explicitly utilized on the clock transitions.
Tabula’s design offers an added advantage because the logic changes with each new fold so the original source LUTs can be used for computation based on the data from the prior fold. A conventional FPGA has to move the signal to a point where the subsequent logic is located.
The upper limit of the number of folds in the current chip may seem limiting, but it isn’t. Additional folds provide more logic within a given space and more reach for a given signal. In practice, the first fold in the series follows the last fold. In theory, data from the last fold can be used within the first fold as if it were the next fold in the sequence working on this information while the other logic within the first fold is working on new information. The last fold cannot reuse the logic in the first fold, but it can use other logic defined within the first fold.
SUPERIOR SOFT CORES
Soft-core processors are used in a significant number of new projects. FPGA designers have a challenge when it comes to optimizing a soft-core design for a particular FPGA platform. In addition, soft-core designs are often behind their ASIC counterparts because of the overhead and design restrictions of the FPGA fabric. One of these is multiport register file support. FPGAs typically provide dual-port register files to address these design requirements.
Tabula only provides single-port register files. This might panic soft-core designers until they consider the impact of the 3D architecture because a single-port register file can deliver one piece of information for each fold. This meshes nicely with pipeline architectures for two reasons.
First, an eight-fold region essentially has eight port register files within its cycle. Second, each new fold has a new set of logic next to the register so the processing pipeline can start next to or near the register file propagating outward toward more logic. The same approach works with memory interfaces as well.
Soft cores targeting existing FPGA platforms initially will be used on the ABAX. It will be interesting to see how designers take advantage of the virtual 3D architecture when trying to improve designs. Likewise, it will be interesting to see how much of the underlying system Tabula will give to designers because the tools essentially hide the underlying complexity of the system. In the future, it might be possible for designers to select the sophistication of the soft-core design the same way that developers select features like cache and cache size.
MANAGING COMPLEXITY
Tabula is taking the same approach to providing a more powerful FPGA platform as Achronix. Essentially, the chips are presented as a conventional FPGA with the layout tools churning out multifold definitions. In fact, the platform even partitions regions.
The layout tools account for timing details, putting logic at the far end of a chain in folds farther from the start. Deep logic benefits from more folds. The layout tools provide details about the number of folders. Users have some control over regions by providing clocking details about the logic. This approach will work well because details like time vias are transparent to designers.
On the other hand, these types of features could provide interesting design options in the future. Achronix has had this same issue with its picoPIPE elements. If designers can specify that a latch is employed at a particular point, as in a soft-core processor pipeline, they may be able to take better advantage of the underlying architecture.
Power management also comes into play with the layout tools. The static power of the chip is lower compared to another FPGA since fewer LUTs are needed because of the virtual 3D architecture. Dynamic power requirements can vary depending on the application. The layout tools can handle some automatic power-down details. For now, all the fold details are available to Tabula’s experts, with a limited amount provided to developers.
The 3D architecture offers some interesting possibilities when it comes to debugging. Consider a design that uses fewer than eight folds. The unused folds could be used for additional debugging logic. There are timing considerations, but it is an option that could lead to some interesting designs.
Four versions of the ABAX 3D programmable logic device chips are available. Pricing ranges from $105 to $200. They compete with FPGA chips that cost two to four times as much. The chips are equipped with flexible SERDES that can handle a range of chores from interfaces like PCI Express and Gigabit Ethernet to storage interfaces like SATA.
The ABAX represents a major shift in FPGA capabilities that essentially place it in its own category. Still, its compatibility with FPGA tool chains makes it a much more flexible FPGA platform. Its ability to support features such as multiport RAM within soft-core processor designs will radically change designers’ views of FPGA platforms.