Designing Skid Buffers for Pipelines

In this blog, we will be looking into these ‘smart’ buffers called Skid Buffers, which you may find useful to add in your pipelined designs. We dive into Skid Buffers from design perspective, considering the nuances of classic Valid-Ready handshaking in pipelines. I hope you should be able to design one in Verilog / VHDL by the end of this blog.

Valid-Ready Handshaking

Valid-Ready handshaking is simple and popular handshaking protocol used to transfer data between two modules (Sender and Receiver) in a design. Sender’s valid and ready is simply connected to Receiver’s valid and ready in this scheme.

valid-ready handshaking

Valid-Ready Handshaking

Think of Sender and Receiver as part of a pipeline. Ready signal path here is crucial as it is responsible for stalling the data transfer in the pipeline. As soon as Receiver stalls by de-asserting ready, Sender has to stall its data. In fact, the entire pipeline has to immediately stall in the next clock cycle.

valid-ready pipeline

Valid-Ready Pipeline

The following diagram shows how data is piped every clock cycle, through a simple 3-stage pipeline with no buffers in between.

pipeline
pipeline stalled

Pipeline operation

To achieve this, ready path between any Sender and Receiver in the pipeline is designed as combinatorial. Usually, the ready signal is the bottle neck/critical path for the timing performance of a pipeline. Sometimes, you may need to break the combinatorial path between Receiver’s ready and Transmitter’s ready, to improve the timing on this path. Otherwise, this combinatorial path may trickle down to more upstream modules in the pipeline (modules before Sender), resulting in a high fan-out ready path with poor timing. Receiver’s ready can be registered before sending to Transmitter to break the combinatorial path and achieve better timing at ready path. However, simply registering ready and valid won’t work, as it may result in missing / overwriting the last data from Sender when ready was de-asserted. Let us see how to solve this problem while maintaining the classic Valid-Ready handshaking.

Using depth-1 FIFO

Introducing a FIFO of depth one between Sender and Receiver solves the above problem by de-coupling the ready path. It has a place holder (entry) for one data in case Receiver stalls the data transfer. But it has two problems associated with it, which may not be desirable.

  • Always has a latency of one clock cycle regardless of whether Receiver is ready to accept data or not.
  • Reduces the throughput by half. Because, once the FIFO is full after enqueuing one entry, it stalls Sender in next clock cycle, while Receiver dequeues the data from FIFO.

So using a depth-1 FIFO may not be a good idea. We need something more robust and dynamic that overcomes the above shortcomings.

Skid Buffer saves the day!

Skid Buffer is a specially designed buffer with a mux and register. The mux simply forwards (bypasses) input data to output as long as Receiver is ready. If Receiver is not ready, and Sender sends valid data, Skid Buffer allows the data to ‘skid’ and come to stop by storing it in a buffer. In this way, stalling (‘stopping’) need not happen immediately, but only in the next clock cycle. The mux is switched to forward data in the buffer. When Receiver is ready, data in the buffer is sampled by Receiver. The mux is switched again and the input data is forwarded to Receiver in subsequent cycles. This allows the registering of ready signal from Receiver as we can now manage that extra data from Sender by buffering it inside if necessary. Thus, Skid Buffer overcomes the shortcomings of depth-1 FIFO by offering:

  • No latency.
  • Full throughput.
skid buffer circuit

Skid Buffer

The following diagram shows how data is piped every clock cycle, through a simple 3-stage pipeline with skid buffers, and what happens when an external stall is asserted at stage s3. Let’s assume that the data is valid in every clock cycle in this pipeline flow.

  1. Pipeline gets stalled…
skid buffer in action

2. The stall leads to “skidding” across the buffers…

skid buffer in action

3. Stall is de-asserted, the buffered data gets propagated, and the pipeline resumes operation…

skid buffer stalling

Stall signal to previous stage is delayed by one cycle due to registering

As we can see, with a skid buffer between two stages, we have the flexibility to delay the ready (inverted of stall) signal by one clock cycle and still have seamless pipeline operation. By breaking the combinatorial path and registering the ready signal to the previous stage in the pipeline, the timing performance is improved. For eg: if there are 3 stages in a pipeline and each stage has 100 registers buffered in the pipeline, the fan-out of the external ready signal (to stage 3) will be 300 if there are no buffers between the stages. With skid buffer between the stages, the fan-out of each ready signal is brought down to 100, improving the timing of this signal. And the cost of this implementation is just two skid buffers.

Skid Buffers has small footprint and provides only one buffer. They can be cascaded to increase the capacity of buffering from one entry to multiple entries. However, this may only hamper the timing performance due to deepening combinatorial logic at data and valid paths. Hence, if you need buffering of more than one data without penalty on timing performance, Skid Buffer may not be the right option for you. That is why for pipelines, we turn our attention to more appealing “upgraded” Skip Buffer: Pipeline Skid Buffer.

Pipeline Skid Buffer

Pipeline Skid Buffer functions similar to Skid Buffer, but it provides complete decoupling / demarcation from Receiver by breaking ready as well as data and valid combinatorial paths, by registering both. It makes the interface fully pipelined with better timing performance. The implication of this design tweak is that, now you need two buffers (instead of one as in classic Skid Buffer). Data buffer stores the data, while the spare buffer stores the extra data when Receiver becomes not ready and ‘skid’ happens. The data from spare buffer is later copied to data buffer when Receiver is ready again. Following properties can be summarized for Pipeline Skid Buffer:

  • Latency of one clock cycle.
  • Offers full throughput.

Pipeline Skid Buffer is similar to a depth-2 FIFO. Pipeline Skid Buffers can also be cascaded. However, cascading Pipeline Skid Buffers beats its purpose by increasing the latency by one cycle with each stage added in the chain. Hence, FIFOs are recommended in such applications.

Source Codes

Source codes for Skid Buffer and Pipeline Skid Buffer in RTL are available for free in my github here.

Takeaways

Skid Buffers provide the smallest footprint for Valid-Ready handshaking between Sender and Receiver in an elastic pipeline by implementing a single buffer and zero latency. Pipeline Skid Buffers offer better timing and fully decoupled interface at the cost of extra buffer, and one cycle latency. To buffer more than two entries with minimal latency of one cycle, FIFOs are recommended to be used between pipeline stages.

Support

Leave a comment or visit support for any queries/feedback regarding the content of this blog.
If you liked Chipmunk , don’t forget to follow!:

Follow Chipmunk

Loading

10 COMMENTS

comments user
sanjay

Very good read. Thanks for sharing.
One question. Can “ready” be replaced with “i_ready” ? in below piece of code ?

https://github.com/iammituraj/skid_buffer/blob/main/pipe_skid_buffer.sv#L108

// Copy data from spare buffer to data buffer, resume pipeline
if (ready) begin

    comments user
    chipmunk

    Looks like you are right to optimize that way. It should work, as valid_rg is 1 in that state. After verifying, I will update the GitHub. Thanks for your input!

comments user
Naga

Hi,

I have a question about the pipe_skid_buffer. Instead of moving to the SKID state immediately, what if we only transition to SKID when i_valid is 1’b1, and otherwise remain in the PIPE state? This approach could potentially reduce latency when data starts flowing again. What are your thoughts on this?

    comments user
    chipmunk

    Hi Naga, there is no latency lost in a skid buffer. That’s the main reason for using a skid buffer. No added latency, least area foot print.

    comments user
    chipmunk

    Sorry, I see, you are referring to pipeline skid buffer, not skid buffer. Looks like we can optimize the latency that way. We can do so to avoid bubbles (valid=0) stalling the pipeline by filling the spare buffer. Instead, we can push the pipeline from behind and “burst” the bubble, allowing the next valid packet to enter the spare buffer if reqd. Nice catch.

comments user
chipmunk

I have pushed this comments and use case into the latest GitHub database. Thanks for your input. I guess for the most of the use cases, reset domain would be the same in the upstream and downstream pipeline of the skid buffer, so I have removed the ready_rg 🙂

comments user
AM

Ques1: In your Skid Buffer code, why ready_rg is not initialized 1’b1 ? If it is initialized to 1’b1 then line 71 would not have been required and in line 101 you wouldn’t have to qualify i_valid with ready_rg ?

Ques2: In case you agree with first ques then why we need a separate flop for ready_rg ? ready_rg can be driven directly from bypass_rg using a combinational assign statement.

    comments user
    chipmunk

    By initialisation if you mean on-reset value 1, I deliberately initialised it to 0. Otherwise it’s possible that skid buffer is in reset state, but the sender gets the idea the skid buffer is ready to accept the data and sends data seeing ready is 1, eventually leading to data losses. But if you think this scenario is not possible in your use case (say all modules are in same reset domain) you can modify this code accordingly by removing ready_rg. This code is a generic implementation taking care of all possible corner cases

comments user
helen

Wonderful article. Used your code in my video pipeline project. Hope you don’t mind…. 🙂

    comments user
    chipmunk

    You can. No probs!

Queries?! Leave a comment ...

Chipmunk Logic™ © 2024 Contact me