Designing Skid Buffers for Pipelines

In this blog, we will be looking into these ‘smart’ buffers called Skid Buffers, which you may find useful to add in your pipelined designs. We dive into Skid Buffers from design perspective, considering the nuances of classic Valid-Ready handshaking in pipelines. I hope you should be able to design one in Verilog / VHDL by the end of this blog.

Valid-Ready Handshaking

Valid-Ready handshaking is simple and popular handshaking protocol used to transfer data between two modules (Sender and Receiver) in a design. Sender’s valid and ready is simply connected to Receiver’s valid and ready in this scheme.

Valid-Ready Handshaking

Think of Sender and Receiver as part of a pipeline. Ready signal path here is crucial as it is responsible for stalling the data transfer in the pipeline. As soon as Receiver stalls by de-asserting ready, Sender has to stall its data. In fact, the entire pipeline has to immediately stall in the next clock cycle.

Valid-Ready Pipeline

The following diagram shows how data is piped every clock cycle, through a simple 3-stage pipeline with no buffers in between.

Pipeline operation

To achieve this, ready path between any Sender and Receiver in the pipeline is designed as combinatorial. Usually, the ready signal is the bottle neck/critical path for the timing performance of a pipeline. Sometimes, you may need to break the combinatorial path between Receiver’s ready and Transmitter’s ready, to improve the timing on this path. Otherwise, this combinatorial path may trickle down to more upstream modules in the pipeline (modules before Sender), resulting in a high fan-out ready path with poor timing. Receiver’s ready can be registered before sending to Transmitter to break the combinatorial path and achieve better timing at ready path. However, simply registering ready and valid won’t work, as it may result in missing / overwriting the last data from Sender when ready was de-asserted. Let us see how to solve this problem while maintaining the classic Valid-Ready handshaking.

Using depth-1 FIFO

Introducing a FIFO of depth one between Sender and Receiver solves the above problem by de-coupling the ready path. It has a place holder (entry) for one data in case Receiver stalls the data transfer. But it has two problems associated with it, which may not be desirable.

Always has a latency of one clock cycle regardless of whether Receiver is ready to accept data or not.
Reduces the throughput by half. Because, once the FIFO is full after enqueuing one entry, it stalls Sender in next clock cycle, while Receiver dequeues the data from FIFO.

So using a depth-1 FIFO may not be a good idea. We need something more robust and dynamic that overcomes the above shortcomings.

Skid Buffer saves the day!

Skid Buffer is a specially designed buffer with a mux and register. The mux simply forwards (bypasses) input data to output as long as Receiver is ready. If Receiver is not ready, and Sender sends valid data, Skid Buffer allows the data to ‘skid’ and come to stop by storing it in a buffer. In this way, stalling (‘stopping’) need not happen immediately, but only in the next clock cycle. The mux is switched to forward data in the buffer. When Receiver is ready, data in the buffer is sampled by Receiver. The mux is switched again and the input data is forwarded to Receiver in subsequent cycles. This allows the registering of ready signal from Receiver as we can now manage that extra data from Sender by buffering it inside if necessary. Thus, Skid Buffer overcomes the shortcomings of depth-1 FIFO by offering:

No latency.
Full throughput.

Skid Buffer

The following diagram shows how data is piped every clock cycle, through a simple 3-stage pipeline with skid buffers, and what happens when an external stall is asserted at stage s3. Let’s assume that the data is valid in every clock cycle in this pipeline flow.

Pipeline gets stalled…

2. The stall leads to “skidding” across the buffers…

3. Stall is de-asserted, the buffered data gets propagated, and the pipeline resumes operation…

Stall signal to previous stage is delayed by one cycle due to registering

As we can see, with a skid buffer between two stages, we have the flexibility to delay the ready (inverted of stall) signal by one clock cycle and still have seamless pipeline operation. By breaking the combinatorial path and registering the ready signal to the previous stage in the pipeline, the timing performance is improved. For eg: if there are 3 stages in a pipeline and each stage has 100 registers buffered in the pipeline, the fan-out of the external ready signal (to stage 3) will be 300 if there are no buffers between the stages. With skid buffer between the stages, the fan-out of each ready signal is brought down to 100, improving the timing of this signal. And the cost of this implementation is just two skid buffers.

Skid Buffers has small footprint and provides only one buffer. They can be cascaded to increase the capacity of buffering from one entry to multiple entries. However, this may only hamper the timing performance due to deepening combinatorial logic at data and valid paths. Hence, if you need buffering of more than one data without penalty on timing performance, Skid Buffer may not be the right option for you. That is why for pipelines, we turn our attention to more appealing “upgraded” Skip Buffer: Pipeline Skid Buffer.

Pipeline Skid Buffer

Pipeline Skid Buffer functions similar to Skid Buffer, but it provides complete decoupling / demarcation from Receiver by breaking ready as well as data and valid combinatorial paths, by registering both. It makes the interface fully pipelined with better timing performance. The implication of this design tweak is that, now you need two buffers (instead of one as in classic Skid Buffer). Data buffer stores the data, while the spare buffer stores the extra data when Receiver becomes not ready and ‘skid’ happens. The data from spare buffer is later copied to data buffer when Receiver is ready again. Following properties can be summarized for Pipeline Skid Buffer:

Latency of one clock cycle.
Offers full throughput.

Pipeline Skid Buffer is similar to a depth-2 FIFO. Pipeline Skid Buffers can also be cascaded. However, cascading Pipeline Skid Buffers beats its purpose by increasing the latency by one cycle with each stage added in the chain. Hence, FIFOs are recommended in such applications.

Source Codes

Source codes for Skid Buffer and Pipeline Skid Buffer in RTL are available for free in my github here.

Takeaways

Skid Buffers provide the smallest footprint for Valid-Ready handshaking between Sender and Receiver in an elastic pipeline by implementing a single buffer and zero latency. Pipeline Skid Buffers offer better timing and fully decoupled interface at the cost of extra buffer, and one cycle latency. To buffer more than two entries with minimal latency of one cycle, FIFOs are recommended to be used between pipeline stages.

Support

Leave a comment or visit support for any queries/feedback regarding the content of this blog.
If you liked Chipmunk , don’t forget to follow!: