FGPU

所属分类:VHDL/FPGA/Verilog
开发工具:VHDL
文件大小:389179KB
下载次数:0
上传日期:2022-05-31 10:35:56
上 传 者sh-1993
说明:  FGPU是一种用于FPGA的类似GPU的软架构。它是用VHDL描述的,完全可定制,并且可以为我们编程...
(FGPU is a soft GPU-like architecture for FPGAs. It is described in VHDL, fully customizable, and can be programmed using OpenCL.)

文件列表:
FrameworkOverview.png (85544, 2022-05-31)
HW (0, 2022-05-31)
HW\outputs (0, 2022-05-31)
HW\outputs\new_2CUs.bit (13321517, 2022-05-31)
HW\outputs\new_2CUs.vhd (25146, 2022-05-31)
HW\outputs\new_2CUs_utilization.rpt (10553, 2022-05-31)
LICENSE (35141, 2022-05-31)
RTL (0, 2022-05-31)
RTL\constr (0, 2022-05-31)
RTL\constr\fgpu.xdc (2771, 2022-05-31)
RTL\db (0, 2022-05-31)
RTL\db\FGPU_v2_1.vhd.old (18914, 2022-05-31)
RTL\db\alu.vhd (16878, 2022-05-31)
RTL\db\axi_controllers.vhd (32541, 2022-05-31)
RTL\db\cache.vhd (13215, 2022-05-31)
RTL\db\cu.vhd (10083, 2022-05-31)
RTL\db\cu_instruction_dispatcher.vhd (9573, 2022-05-31)
RTL\db\cu_mem_cntrl.vhd (50541, 2022-05-31)
RTL\db\cu_scheduler.vhd (55353, 2022-05-31)
RTL\db\cu_vector.vhd (35796, 2022-05-31)
RTL\db\dsp.vhd (1902, 2022-05-31)
RTL\db\fgpu_components_pkg.vhd (41530, 2022-05-31)
RTL\db\fgpu_definitions_pkg.vhd (24796, 2022-05-31)
RTL\db\fgpu_top.vhd (42450, 2022-05-31)
RTL\db\fgpu_wrap.vhd (13194, 2022-05-31)
RTL\db\float_units.vhd (9912, 2022-05-31)
RTL\db\gmem_atomics.vhd (25514, 2022-05-31)
RTL\db\gmem_cntrl.vhd (56324, 2022-05-31)
RTL\db\gmem_cntrl_tag.vhd (60161, 2022-05-31)
RTL\db\init_alu_en_ram.vhd (5321, 2022-05-31)
RTL\db\lmem.vhd (3852, 2022-05-31)
RTL\db\loc_indcs_generator.vhd (15613, 2022-05-31)
RTL\db\mult_add_sub.vhd (8151, 2022-05-31)
RTL\db\rd_cache_fifo.vhd (5819, 2022-05-31)
RTL\db\regFile.vhd (4844, 2022-05-31)
RTL\db\rtm.vhd (4075, 2022-05-31)
RTL\db\wg_dispatcher.vhd (28173, 2022-05-31)
... ...

FGPU is a soft GPU-like architecture for FPGAs. It is described in VHDL, fully customizable, and can be programmed using OpenCL. FGPU is currently being developed and maintained by the [Chair of Computer Engineering at the Brandenburg University of Technology Cottbus-Senftenberg], in Germany. It was [originally developed](https://github.com/malkadi/FGPU) by Muhammed Al Kadi from the [Ruhr University Bochum], in Germany. [Chair of Computer Engineering at the Brandenburg University of Technology Cottbus-Senftenberg]: https://www.b-tu.de/en/computer-engineering-group [Ruhr University Bochum]: https://www.ei.ruhr-uni-bochum.de/fakultaet/ # Contents and Structure of the FGPU Repository This repository contains the following resources: - **An RTL description of the FGPU architecture**, in VHDL, which can be used for behavioral simulation and FPGA-targeted implementation -- see the [RTL](rtl/) folder. - **An LLVM-based FGPU compiler** -- see the the [compiler](compiler/) folder. - **Files for running behavioral simulation in Mentor ModelSim** -- see the [project_modelsim](project_modelsim/) folder. - **Files for setting up simulation and implementation projects in Xilinx Vivado**. - **Pre-generated bitstreams** to quickly test FGPU applications in the [Zynq-7000 ZC706], skipping the HW generation step -- see the [bitstreams](bitstreams/) folder. - **Examples of OpenCL kernels** for execution in the FGPU -- see the [kernels](kernels/) folder. - **Examples of complete benchmarks** for execution in an ARM+FGPU processing system configured using the Vivado SDK -- see the [benchmark](benchmark/) folder. # FGPU Quick Start The following figure illustrates the HW and SW flow for setting up the FGPU hardware and compiling applications. ![Overview of the FGPU Framework.](FrameworkOverview.png) ## Setting up the FGPU LLVM-based compiler The LLVM-based compiler will be used to generate, from an OpenCL kernel description, the binaries containing the FGPU instructions that implement the kernel. To ensure portability, the FGPU compiler is built inside a Docker container. See the instructions in see the [compiler README](compiler/README.md). ## Setting up the Vivado SW/HW Development Environment The Xilinx Vivado platform will be used to generate the hardware implementation from the set of RTL files, and also necessary libraries for communicating the main processing system (an ARM core, in case of the [Zynq-7000 ZC706] board). These libraries must be linked with the binaries generated in the compilation stage to enable the processing system to send commands and receive results from the FGPU. Detailed instructions for this process can be found in the [README for the Vivado flow](project_vivado/README.md). # Limitations ## Supported Platforms / Tools - The current version of the repository has been tested in both Windows and Linux. - The implementation was tested in Xilinx Vivado v2017.2, v2019.2, and v2020.2. - The simulation was tested in Mentor ModelSim 2020.1 - The VHDL code uses some VHDL-2008 constructs, which may be unsupported in some tools. - Easily portable to ZYNQ devices. For Ultrascale+ architecture, check branch ultrascale. ## Known Bugs ### Hardware - Branch in a conditional procedure call is not working. In other words, if a branch took place, then a function is called, whose code will branch, a problem will occurr. - Reason: A branch writes the current top entry of the diveergence stack(in CU schedulre) and will create a new entry on the top. But the overwritten entry is needed when the function returns. - Solution: When a function is to be called, the wavefront active record (which referes to the top of divergence stack) needs to be incrmeneted. Have a look to the defined but commented PC_stack_dummy_entry in CU_scheduler.vhd. - How to regenerate: Use the LUdecompose kernel. Make a soft floating point operation in some if. You need data without NaN of inf. - When the buswidth of read data from cache to CUs is smaller than the #CUs * 32bit, they may be some problems. The BRAM which stores read data from the cache in each CU may be overfilled and overwritten. - Changing the address of atomic operation within the same kernel has never been tested. Application is needed. - Setting the number of cache banks equal or less to the number of the AXI interface banks (CACHE_N_BANKS_W <= GMEM_N_BANKS_W) may lead to a bottleneck ### Software / Compiler - Using other sections rather that .text is not yet supported. - How to generate: Define the coeffecients of a 5x5 image filter as a 2D array constant. This data will be stored into .rodata section. [Zynq-7000 ZC706]: https://www.xilinx.com/products/boards-and-kits/ek-z7-zc706-g.html [ZedBoard]: http://zedboard.org/product/zedboard

近期下载者

相关文件


收藏者