RISu064
risc-v 

所属分类:处理器开发
开发工具:Verilog
文件大小:6360KB
下载次数:0
上传日期:2023-01-18 00:29:49
上 传 者sh-1993
说明:  双发RV64IM处理器,娱乐和学习
(Dual-issue RV64IM processor for fun & learning)

文件列表:
LICENSE (1073, 2023-07-05)
asic (0, 2023-07-05)
asic\analog_area (0, 2023-07-05)
asic\analog_area\analog_area.bb.v (1510, 2023-07-05)
asic\build.py (9462, 2023-07-05)
asic\caravel_defines.v (2236, 2023-07-05)
asic\floorplan.py (14832, 2023-07-05)
asic\flows (0, 2023-07-05)
asic\flows\mpwflow.py (1140, 2023-07-05)
asic\importpins.py (4925, 2023-07-05)
asic\libs (0, 2023-07-05)
asic\libs\analog_area.py (600, 2023-07-05)
asic\libs\sky130sram_32_256.py (684, 2023-07-05)
asic\libs\sky130sram_32_512.py (684, 2023-07-05)
asic\libs\sky130sram_8_1024.py (684, 2023-07-05)
asic\options.vh (2505, 2023-07-05)
asic\sarlogic.v (4072, 2023-07-05)
asic\sky130 (0, 2023-07-05)
asic\sky130\ram (0, 2023-07-05)
asic\sky130\ram\sky130_sram_1kbyte_1rw1r_32x256_8.bb.v (1083, 2023-07-05)
asic\sky130\ram\sky130_sram_1kbyte_1rw1r_32x256_8.gds (9871766, 2023-07-05)
asic\sky130\ram\sky130_sram_1kbyte_1rw1r_32x256_8.lef (25654, 2023-07-05)
asic\sky130\ram\sky130_sram_1kbyte_1rw1r_32x256_8.v (3161, 2023-07-05)
asic\sky130\ram\sky130_sram_1kbyte_1rw1r_32x256_8_TT_1p8V_25C.lib (15919, 2023-07-05)
asic\sky130\ram\sky130_sram_1kbyte_1rw1r_8x1024_8.bb.v (1083, 2023-07-05)
asic\sky130\ram\sky130_sram_1kbyte_1rw1r_8x1024_8.gds (9277682, 2023-07-05)
asic\sky130\ram\sky130_sram_1kbyte_1rw1r_8x1024_8.lef (13285, 2023-07-05)
asic\sky130\ram\sky130_sram_1kbyte_1rw1r_8x1024_8.v (2909, 2023-07-05)
asic\sky130\ram\sky130_sram_1kbyte_1rw1r_8x1024_8_TT_1p8V_25C.lib (15916, 2023-07-05)
asic\sky130\ram\sky130_sram_2kbyte_1rw1r_32x512_8.bb.v (1081, 2023-07-05)
asic\sky130\ram\sky130_sram_2kbyte_1rw1r_32x512_8.gds (15213048, 2023-07-05)
asic\sky130\ram\sky130_sram_2kbyte_1rw1r_32x512_8.lef (26311, 2023-07-05)
asic\sky130\ram\sky130_sram_2kbyte_1rw1r_32x512_8.v (3161, 2023-07-05)
asic\sky130\ram\sky130_sram_2kbyte_1rw1r_32x512_8_TT_1p8V_25C.lib (15916, 2023-07-05)
asic\tapcell_custom.tcl (120, 2023-07-05)
asic\therm_out.v (1582, 2023-07-05)
... ...

# RISu0*** illustration RISu*** (Reduced Instruction Set Processor *** / Squirrel ***) is a series of my toy ***-bit RISC-V compatible processors. RISu0*** (this repo) is the first in the series. Illustration by [Andy Lithia](https://github.com/andylithia). ## Features ![pipeline_diagram](doc/pipeline.svg) - RV***IMZicsr_Zifencei instruction set - 7-stage pipeline: PCGen(F1), IMem(F2), Decode(ID), Issue(IX), Execute(EX), DMem(MEM), Writeback(WB). - In-order issue and out-of-order writeback - Dual-issue - BTB + Bimodal/Gselect/Gshare/Tournament + RAS branch predictors - 2x Integer (arithmetic, barrel shifter, branch) - 1x Load store unit (aligned access only, unaligned access generate precise exception) - 1x Multiply/ divide unit (non-pipelined, 3/6-cycle 32/***bit multiply, 34/66-cycle ***bit divide) - Multiply/ divide is optional - Optional L1 instruction and data cache (2-way set associative blocking cache) - Machine mode with exception and interrupt support - Optional experimental hardware refilled MMU + supervisor and user mode support - Written in portable synthesizable Verilog ## Performance The performance varies based on configurations: - Single-issue + 512-entry Bimodal + 32-entry BTB + TCM: 3.01 Coremark/MHz - Single-issue + 4K-entry Tournament + 32-entry BTB + TCM: 3.06 Coremark/MHz - Single-issue + 4K-entry Tournament + 32-entry BTB + 16KB L1$: 3.01 Coremark/MHz - Dual-issue + 4K-entry Tournament + 32-entry BTB + TCM: 4.31 Coremark/MHz Note: 1. Compiled with GCC 9.2.0, with the following options: ```-MD -O3 -mabi=lp*** -march=rv***im -mcmodel=medany -ffreestanding -nostdlib -fomit-frame-pointer -funroll-all-loops -finline-limit=1000 -ftree-dominator-opts -fno-if-conversion2 -fselective-scheduling -fno-code-hoisting -freorder-blocks-and-partition``` 2. Single-issue is no longer supported in the latest branch, testing was carried out using commit ```efd0d3``` 3. L1-cache is organized as 2-way set associative, 16KB each, with simulated unlimited L2 memory and 15-cycle latency 4. Each BPU entry is 2-bit, internally it expects 8-bit wide memory interface. 4K-entry = 1K x 8bit SRAM ## Area The area is quite big right now (rather poor PPA). FPGA: Currently the multiplier is not optimized for FPGA yet. With Aritx-7 XC7A100T-3CSG324C: - Multiplier disabled, no cache: ~120 MHz fmax, 19.6K LUT, 6.9K FF The critical path is at write-back stage. ASIC: The project has been submitted to Google + efabless MPW-7 shuttle for tapeout, with a 5GHz narrow-band RF transceiver. ![asic](doc/MPW71.jpg) The total area allocated to this project is about 8.5mm^2. The core is configured to be: - 4K depth Gshare predictor - 8KB 2-way I-cache + 8KB 2-way D-cache - Hardware multiplier and divider enabled - MMU disabled, machine mode only Total area allocated to core minus SRAM cell is about 3.4mm^2, with around 39% utilization. Assuming 85% target placement density, this translate to a 1.56mm^2 die area at SKY130 process with SKY130HD cell library. Regarding maximum frequency, without SRAM/ cache, Fmax is around 100MHz with CLA+KSA hybrid adder, or 80MHz with inferred adder. With cache, tag comparsion logic becomes the critical path and Fmax drops to about 50MHz. ## Status This project is mostly a proof-of-concept and is regarded as done. There might be bug fixes in the future, but don't expect major changes. ## Running Simulation In sim folder, run make. It should build the simulator. To run coremark, build the coremark by running ```make``` in tests/coremark, then in the sim folder do ```./simulator --ram ../tests/coremark/coremark.bin```. Note: Verilator required for building the simulator. RV*** gcc (riscv***-unknown-elf-gcc) required for building the coremark. ## Debugging RTL The core implementation probably contains bugs. Due to its OoO WB without reordering design, the core's architectural state would often diverge from ISA model, making lock-step co-simulation or trace comparsion with ISA simulation hard. A trace comparison tool is provided to allow comparing between RTL simulator generated trace and Spike generated trace. Example usage: ``` spike -m0x20000000:4096,0x80000000:1048576 -l --log-commits tests/coremark/coremark.elf 2> spike.log sim/simulator --ram tests/coremark/coremark.bin --cycles 10000 > sim.log tests/trace_comparater.py --risu sim.log --spike spike.log ``` Differences (if any) will be reported. ## Acknowledgements During the design of this processor, I have used the following projects as reference: - [lowRISC's muntjac](https://github.com/lowRISC/muntjac), Apache 2.0 license - [UltraEmbedded's biriscv](https://github.com/ultraembedded/biriscv), Apache 2.0 license The following third-party code have been used: - [Gary Guo's round robin arbiter](https://garyguo.net/), BSD license ## License MIT

近期下载者

相关文件


收藏者