So you have a 100 \text{ MHz} on-board clock on your FPGA board, but you need only 50 \text{ MHz} in your design. Okay, let us design a simple clock divider, divide-by-2 logic using flip-flops/counters, and derive 50 \text{ MHz} from 100 \text{ MHz} ; simple, but a Big NO!
This is something everyone including me has done at some point in our RTL design journey. Well, experience has taught me that this is one of those bad design practices for clocking on FPGAs, which you should say good bye to. So what is the problem and what can we do about it? We will skim through some of my thoughts, observations, alternatives, and work-arounds for this design-conundrum.
Motivation
So a couple of weeks ago, I was playing around with my VGA Project on FPGA and I needed a 25 \text{ MHz} clock in my design to clock pixels. My Basys-3 board has 100 \text{ MHz} on-board clock which I thought I could divide by 4 and feed it as my core clock for the design. Things looked all good on simulation, so I went with the on-board testing.
BIG-TIME FAIL!!
I got no video output on my monitor. I instantly knew what went wrong. So I replaced my clock divider logic with a PLL, and yes, I got my beautiful color patterns displayed on the monitor thru VGA.
What went wrong
The clock went wrong obviously. To understand why, we should take a look at the characteristics of a good clock:
- Less latency and balanced skew: We don’t want clock to reach the synchronous elements with a huge network latency. And the clock skew needs to be zero ideally. We have to balance the skew so that clock latencies are uniformly distributed to different elements/nodes. There are techniques like H-tree design dedicated to achieve this at clock-tree synthesis stage.
- Minimal uncertainty/jitter: The clock should have a predictable uncertainty or jitter and it should be minimal.
- Necessary slew and drive capability: Signal integrity of clocks is crucial. The clock lines are typically high-load (high fan-out) lines, and hence the high load capacitance damps the shape of clock signals to kind of ‘triangular’ with lesser amplitude, instead of ‘square’ (Imagine a triangular wave from 0 to 2V while we expect a square wave from 0 to 3.3V, that’s bad!). Hence, the clock integrity is at stake. Clock lines, therefore, should have necessarily high drive capability and transition (higher slew). Clock buffers are used to achieve this. Clock buffers also have to be placed in such a way that skews are balanced.
So, keeping above things in mind, if we design a clock divider in RTL and use it to drive a high-density module on FPGA, the design should fail (for at least high frequencies of order > 25 \text{ MHz} ) as it will be synthesised just like any other logic signal, nothing special like a clock. This logic-generated clock can be dedicatedly taken care during clock synthesis and physical design in ASIC. But on FPGAs, it would be big-time fail as we don’t have that flexibility with logic-generated clocks. The clock signal get routed through LUTs on FPGA fabric, and drives the synchronous blocks with poor skew, latency, jitter and slew. This is a bad clock.
Simple Solution
Make use of PLLs/MMCMs on FPGAs (On Xilinx FPGAs, BUFG element can be used to generated divided clocks) to derive divided/multiplied clocks. The clocks generated in this way, are always place-and-routed on to dedicated global clock routing resources on FPGAsfor best skew, slew and jitter. You are assured to get a good clock at all your synchronous blocks.
Alternate Solution
The above solution looks cool and simple. But it has a downside; the RTL code is not portable anymore across platforms. Also, it creates another clock domain (even flip-flop based clock divider results in a new clock domain, even though both are synchronous clock domains). For simple designs with medium footprint/logic-complexity, there is an alternate solution in RTL which you can try out (I have used this technique in my UART Controller project):
Instead of generating a divided-clock, generate/derive a clock-enable at necessary intervals. Suppose we want a 25 \text{ MHz} from 100 \text{ MHz} core clock. Instead of generating a 25 \text{ MHz} clock signal, clock-enable pulses are generated at 25 \text{ MHz} :
Then, use this clock-enable as a strobe to drive registers inside the synchronous logic driven by 100 MHz core clock.
Verilog code to implement this technique will look like:
/* Synchronous logic to generate Clock-Enable pulse @ 25 MHz from 100 MHz clk */
always @ (posedge clk, posedge rst) begin
// Reset
if (rst) begin
clk_en_rg <= 1'b0 ;
count_rg <= 0 ;
end
// Clocked
else begin
clk_en_rg <= 1'b0 ;
count_rg <= count_rg + 1 ;
if (count_rg == 3) begin
clk_en_rg <= 1'b1 ;
end
end
end
/* Synchronous logic to be clocked at 25 MHz */
always @ (posedge clk, posedge rst) begin
// Reset
if (rst) begin
counter_rg <= 0 ;
end
// Clocked
else begin
if (clk_en_rg) begin
counter_rg <= counter_rg + 1 ;
end
end
end
Some advantages of this technique are:
- Entire design will be in a single clock domain, making timing constraints simpler.
- Divided-by-any-integer clock-enable can be generated easily without the need of negative edge triggered flops.
One disadvantage of this technique is that it may reduce the timing performance in larger designs, as flops with enable pin switching are usually slower and when clock-enable fan-out becomes higher, routing delays become high as well, tightening timing.
Takeaways
So the major takeaway from this blog is that:
Discourage yourself from using clock dividers on FPGAs; use PLLs/MMCMs instead. Or make use of clock-enable in designs with medium footprint. Let’s step up the quality of our multi-clocked designs!
Support
Leave a comment or visit support for any queries/feedback regarding the content of this blog.
If you liked Chipmunk , don’t forget to follow!:
Follow Chipmunk
Like this:
Like Loading...
Related