Skip to content

Commit 4b5f511

Browse files
committed
Add example projects. Add a README.
1 parent 03925c6 commit 4b5f511

File tree

11 files changed

+1567
-2
lines changed

11 files changed

+1567
-2
lines changed

README.md

Lines changed: 64 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,64 @@
1-
# core_ddr3_controller
2-
A DDR3 memory controller in Verilog for various FPGAs
1+
### Lightweight AXI-4 DDR3 Controller
2+
3+
Github: [https://github.com/ultraembedded/core_ddr3_controller](https://github.com/ultraembedded/core_ddr3_controller)
4+
5+
This IP is a compact DDR3 memory controller in Verilog aimed at FPGA projects where the bandwidth required from the memory is lower than DDR3 DRAMs can provide, and where simplicity and LUT usage are more important than maximising the DDR performance.
6+
It currently supports Xilinx 7 series (Artix, Kintex) and Lattice ECP5 FPGAs, but other FPGA specific DFI compatible PHYs might be added later.
7+
8+
The idea with this project is to run DDR3 at a much slower clock frequency than the maximum supported by the DDR part, reducing the complexity required in the DDR3 controller by giving the bus interface much more margin and tolerance.
9+
This can make sense for some FPGA projects where the fabric speed is limiting factor in the design, rather than the external DDR memory interface speed, and where typically an SDR DRAM could have been used but wasn't (for reasons of availability, capacity, cost per bit).
10+
11+
DDR3 has a very high signalling rate, and in order for this to work reliably, it has added complexity such as;
12+
* Read/write levelling
13+
* ZQ calibration
14+
15+
In normal operating mode (DLL-on mode), DDR3 has a minimum clock frequency (300MHz+). However, it is possible to turn the DDR3 DLL off (in most DRAM parts) and run at frequencies <= 125MHz.
16+
17+
DLL-off mode (which this memory controller utilises) is listed as an optional feature for DDR3 parts to implement, however it seems that the popular DDR3 parts do implement it (and testing proves that it works well)!
18+
19+
##### Design Targets
20+
* Run at a reduced DDR clock speed (< 125MHz) to decrease the complexity of the DDR3 PHY, ease timing closure, reduce design LUT usage.
21+
* Support multiple FPGA vendors/toolchains.
22+
* Achieve high performance (for the clock speed) sequential read/write performance.
23+
* Support an AXI-4 target port with burst capabilities.
24+
* To be substantially smaller (using fewer FPGA LUTs) than commercial DDR3 cores (such as Xilinx MIG).
25+
* To be open-source, free to use, free to modify.
26+
27+
##### Features
28+
* 32-bit AXI-4 target port supporting INCR bursts.
29+
* Support up to 8 open rows, allowing back-to-back read/write bursts within an open row.
30+
* Standardized DFI interface between memory controller core and PHY.
31+
* PHYs: Xilinx 7 series, Lattice ECP5
32+
33+
##### Performance / Area
34+
Performance for sequential burst accesses is good, as a burst of the same type - read or write, will be pipelined to an already open row.
35+
Currently, there is no capability for read/write re-ordering/coalescing, so random read/write performance will not be optimal (this might be addressed in future releases).
36+
37+
On the Digitalent Arty A7 running at 50MHz (max 200MBytes/s of bandwidth available), performing sequential reads / writes;
38+
![Performance](docs/artya7.png)
39+
40+
As for area, on the Xilinx Artix 7 (XC7A35T), the area used by the core (plus a small UART to AXI-4 bridge);
41+
![Area](docs/artya7_area.png)
42+
43+
It should be noted that the same project using the Xilinx MIG DDR3 controller takes 33% of the FPGA LUTs (vs 9% with this core).
44+
45+
##### Testing
46+
Verified under simulation, then exercised on the following FPGA boards;
47+
* Digilent Arty A7 (Xilinx Artix + MT41K128M16JT-125)
48+
* LambaConcept ECPIX-5 (Lattice ECP5 + MT41K256M16RE-125)
49+
50+
The performance and error checking was done using this [RAM Tester](https://github.com/ultraembedded/core_ram_tester).
51+
These boards have also booted Linux reliably with this DDR core, at the same time as been stressed by video frame buffer accesses to DDR.
52+
53+
##### Future Work
54+
Weaknesses/areas of improvements;
55+
* ECP5 PHY is sub-optimal - relies on aligning the read data capture to the internal clock instead of capturing on the DQS input. This works reliably on the board/clock speeds tested, but really could do with fixing!
56+
* Support for AXI-4 WRAP bursts - these are often used with cache controllers which do critical word first fetches.
57+
* Add optional DDR scheduler logic on the frontend of the core to improve read/write thrashing performance (re-order and coalesce).
58+
* The PHY modules have tuneable delays, it would be good if these were automatically tuned at startup to simplify integration efforts (although this would increase LUT usage).
59+
60+
##### Integration
61+
If you would like help with getting this core integrated into your FPGA project, contact me:
62+
```
63+
64+
```

docs/artya7.png

68.4 KB
Loading

docs/artya7_area.png

7.76 KB
Loading

examples/arty_a7/artix7_pll.v

Lines changed: 111 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,111 @@
1+
2+
module artix7_pll
3+
(
4+
// Inputs
5+
input clkref_i
6+
7+
// Outputs
8+
,output clkout0_o
9+
,output clkout1_o
10+
,output clkout2_o
11+
);
12+
13+
14+
15+
16+
17+
wire clkref_buffered_w;
18+
wire clkfbout_w;
19+
wire clkfbout_buffered_w;
20+
wire pll_clkout0_w;
21+
wire pll_clkout0_buffered_w;
22+
wire pll_clkout1_w;
23+
wire pll_clkout1_buffered_w;
24+
wire pll_clkout2_w;
25+
wire pll_clkout2_buffered_w;
26+
27+
// Input buffering
28+
IBUF IBUF_IN
29+
(
30+
.I (clkref_i),
31+
.O (clkref_buffered_w)
32+
);
33+
34+
// Clocking primitive
35+
PLLE2_BASE
36+
#(
37+
.BANDWIDTH("OPTIMIZED"), // OPTIMIZED, HIGH, LOW
38+
.CLKFBOUT_PHASE(0.0), // Phase offset in degrees of CLKFB, (-360-360)
39+
.CLKIN1_PERIOD(10.0), // Input clock period in ns resolution
40+
.CLKFBOUT_MULT(12), // VCO=1200MHz
41+
42+
// CLKOUTx_DIVIDE: Divide amount for each CLKOUT(1-128)
43+
.CLKOUT0_DIVIDE(24), // CLK0=50MHz
44+
.CLKOUT1_DIVIDE(6), // CLK1=200MHz
45+
.CLKOUT2_DIVIDE(6), // CLK2=200MHz
46+
47+
// CLKOUTx_DUTY_CYCLE: Duty cycle for each CLKOUT
48+
.CLKOUT0_DUTY_CYCLE(0.5),
49+
.CLKOUT1_DUTY_CYCLE(0.5),
50+
.CLKOUT2_DUTY_CYCLE(0.5),
51+
.CLKOUT3_DUTY_CYCLE(0.5),
52+
53+
// CLKOUTx_PHASE: Phase offset for each CLKOUT
54+
.CLKOUT0_PHASE(0.0),
55+
.CLKOUT1_PHASE(0.0),
56+
.CLKOUT2_PHASE(0.0),
57+
.CLKOUT3_PHASE(0.0),
58+
59+
.DIVCLK_DIVIDE(1), // Master division value (1-56)
60+
.REF_JITTER1(0.0), // Ref. input jitter in UI (0.000-0.999)
61+
.STARTUP_WAIT("TRUE") // Delay DONE until PLL Locks ("TRUE"/"FALSE")
62+
)
63+
u_pll
64+
(
65+
.CLKFBOUT(clkfbout_w),
66+
.CLKOUT0(pll_clkout0_w),
67+
.CLKOUT1(pll_clkout1_w),
68+
.CLKOUT2(pll_clkout2_w),
69+
.CLKOUT3(),
70+
.CLKOUT4(),
71+
.CLKOUT5(),
72+
.LOCKED(),
73+
.PWRDWN(1'b0),
74+
.RST(1'b0),
75+
.CLKIN1(clkref_buffered_w),
76+
.CLKFBIN(clkfbout_buffered_w)
77+
);
78+
79+
BUFH u_clkfb_buf
80+
(
81+
.I(clkfbout_w),
82+
.O(clkfbout_buffered_w)
83+
);
84+
85+
//-----------------------------------------------------------------
86+
// CLK_OUT0
87+
//-----------------------------------------------------------------
88+
assign pll_clkout0_buffered_w = pll_clkout0_w;
89+
90+
assign clkout0_o = pll_clkout0_buffered_w;
91+
92+
93+
//-----------------------------------------------------------------
94+
// CLK_OUT1
95+
//-----------------------------------------------------------------
96+
assign pll_clkout1_buffered_w = pll_clkout1_w;
97+
98+
assign clkout1_o = pll_clkout1_buffered_w;
99+
100+
101+
//-----------------------------------------------------------------
102+
// CLK_OUT2
103+
//-----------------------------------------------------------------
104+
assign pll_clkout2_buffered_w = pll_clkout2_w;
105+
106+
assign clkout2_o = pll_clkout2_buffered_w;
107+
108+
109+
110+
111+
endmodule

0 commit comments

Comments
 (0)