High Performance Reconfigurable Computing
High-performance reconfigurable computing (HPRC) combines the computing power of conventional microprocessors with the flexibility of the configurable fabric of field-programmable gate arrays (FPGAs). Most HPRCs are computers with microprocessors that can be used in traditional CPU cluster computers with a close coupling to user-programmable FPGAs for hybrid computing. In this configuration, FPGAs are typically used as co-processors for specific functions.
Most HPRCs achieve reconfiguration through either HDL or C-based languages. Another approach is to have a C-to-Gates compiler with the idea that the same C code can run on a processor or an FPGA depending on resources available.
The most important factor in HPRC efficiency is host coupling. Often the reconfigurable array is used as a processing accelerator attached to a host processor. The level of coupling determines the type of data transfers, latency, power, throughput and overheads involved when utilizing the reconfigurable logic.
Combining High-Performance CPUs with Embedded Hardware
Most software and hardware development is performed independently by separate design teams. Arches Computing Systems facilitates this division of power by providing the technology necessary for bridging the CPU and embedded hardware domains. The Arches design flow is a methodology for developing a tightly coupled environment consisting of a combination of CPUs and FPGAs sharing a common memory bus.
Arches-MPI is used to isolate Computing Elements (processes, FPGA-based hardware engines, embedded processors etc) and to provide a well known communications paradigm.
The steps involved in creating a HPRC system are as follows:
Step 1 At the beginning of the development cycle, your application is written targeting a standard CPU. In this phase, software developers ensure the correctness of the application and perform thorough runtime profiling.

Step 2 Software developers use information gathered in Step 1 to split the application into smaller independent processes. Strategies used to divide the application include identifying sections of code which can be run in parallel as well as those which encapsulate computational hotspots. Individual processes are isolated and run together in a standard MPI environment such as MPICH, LAM/MPI or Open MPI. This can be done on a single computer or on a standard cluster or supercomputer.

Step 3 This can be an optional step. The processes identified in Step 2 which will ultimately reside as an embedded processor or dedicated hardware engine in a FPGA will be executed on a FPGA. These processes can be recompiled for embedded processors such as the Microblaze or PowerPC. All processes are compiled and linked with the Arches-MPI library for easy CPU to FPGA communication.

Step 4 This is the final stage in the development cycle. Those processes which were identified as computationally intensive are excised and replaced with dedicated digital hardware. Hardware developers are given requirements for these processes (inputs, algorithm, outputs), and they generate drop-in replacement hardware blocks. An Arches Message Passing Engine (Arches-MPE) is attached in order to easily interface this hardware block with other hardware blocks and CPU-based processes.

Like the application itself, phases of this cycle can be done in parallel. Once the computationally intensive processes have been isolated, hardware developers can begin to develop corresponding hardware engines. With Arches-MPI and the Arches-MPE blocks, algorithm kernels are isolated from communication details. Changes in network topology, process numbers and data formats will not result in setbacks for the hardware development team.