Virtex-7 2000T 2.5D stack
Multislice partitionin map
28 Gbit/s SERDES slices
Xilinx delivers big, fast FPGAs pushing 28nm technology (see Xilinx Unifies FPGA Line). So how do you improve on the cutting edge? How about stacking a bunch of FPGA slice together. That is what Xilinx has done with its new Virtex-7 2000T (Fig. 1).
The Virtex-7 2000T is actually a multichip solution that does not use the conventional, costly multichip packaging approach. Instead, Xilinx has utilized a passive interposer layer that sits between the chips and the BGA package. The approach allows Xilinx to pack in 6.8 billion transistors using this 2.5D Stacked Silicon Interconnect technology.
Unlike the 3D transistor approach (see Moore's Law Continues With 22nm 3D Transistors), Xilinx's approach places conventional FPGA chips edge-to-edge on top of an interposer layer. The current generation of chips has four 28nm FPGA slices (Fig. 2).
The slices are not connected directly to each other. Instead, the chips are connected through the interposer layer through a set of microbumps on the bottom of the chips. The signal path drops down from the chip, across the interposer layer and up to the next chip. The connections are short allowing on the order of 10,000 connections to be passed between slices. This meshes well with the FPGA interconnect fabric. More important is the lack of I/O interfaces to accomplish this feat.
I/O connections are passed through the interposer layer to the chip carrier. This allows I/O connections from any slice, not just one or two.
Conventional connections between multiple FPGA chips require links to pass between I/O connections. FPGAs have a lot of pins but these number in the hundreds and high speed SERDES are even more limited. Putting FPGAs on a multichip carrier simply make the solution smaller but do not address the I/O issue. The Virtex-7 2000T does answer this question and it brings along a number of other features that these alternatives cannot provide.
Initially the Xilinx chips have four identical slices that can include up to 72 x 13 Gbit/s SERDES. An alternate layout has three slices and two sets of eight 28 Gbit/s SERDES at each end of the array (Fig 3). In this case, the designer gets both the 13 Gbit/s and 28 Gbit/s SERDES. The developer must often choose between one or the other on monolithic designs.
The ability to include different slices on the interposer layer leads to some very interesting configurations. For example, slower but lower power slices could be mixed with these high speed slices. Xilinx has already unified its (see Xilinx Unifies FPGA Line) with its Artix-7, Kintex-7, and Virtex-7 lines. Many applications require high speed computation but not throughout the design. This allows a coarse grain partitioning to optimize cost and power usage. It could also lead to other configurations that might include hard core processors like the Cortex-A9's contained in Xilinx's Zynq line (see FPGA Packs In Dual Cortex-A9 Micro).
Lower power is one of the key advantages of splitting the FPGA into slices. The chips use tens of watts that simplies cooling requirements. Combined with the higher transistor count, leads to the possibility of replacing ASICs and ASSPs with FPGAs. A slice might even have special dedicated circuitry found on an ASIC or ASSP with the other slices being FPGA slices.
This advanced mix of slices is speculation at this point. The logical design is easy but the actual implementation is more complex because power, cooling and interconnects remain a challenge. On the other hand, the advantages of this approach are significant because customization and verification are major costs for ASICs and ASSPs. Limit these to a slice while retaining the flexibility of an FPGA brings major benefits to development and deployment. Likewise, many ASIC and ASSP solutions may be economical simply using the FPGA slices already in the mix.
The target market for the initial crop of Virtex-7 2000T chips will be in high performance application areas such as military and communications where large FPGAs are already common. Mixing slices enlarges the target audience including mobile applications that may be power limited. It is possible to employ the technology for lower end solutions but it remains to be seen if this is practical. For now, Xilinx is focues on performance and power efficiency of applications that require large FPGAs.
In a sense, the partitioning of the FPGA into slices is akin to the move to multicore processors. The individual pieces can be made more power efficient while providing a large computation fabric.
Developers will utilize Xilinx’s ISE Design Suite already used for current FPGA designs. ISE has been updated to handle the Stacked Silicon Interconnect technology. This was easier for Xilinx than it originally anticipated. It has several new design rules checks (DRCs) and routing rule that are handled transparently although advanced designs can tune the layout. The PlanAhead and FPGA Editor provide a graphical representation of the FPGA layout. The interactive floorplan design tools also include analysis and debug capabilities. Advanced designs that employ a mix of slices will require more work but also enable designers to have more control over partitioning.
10,000 interconnects is a lot. Slices can also provide I/O links that are brought out through the bottom of the interposer layer. This provides a lot of flexibility that has not been available before. It will be instructive to see how designs take advantage of this technology.
Check out Engineering TV for a video interview (watch Combining Multiple FPGA Slices) I did with Liam Madden of Xilinx.