DMI – Graduate Course in Computer Science
Copyleft
2020 Giuseppe Scollo
this tutorial deals with:
hardware mapping of datapath expressions: same basic rules as with always blocks, however:
Schaumont, Figure 5.8 - Implementation of the up-down counter FSMD
function specification: computation of the median in a list of five numbers
problem: design a stream-processing filter that produces a new output upon each new input
for a fast algorithm, with no list sorting:
dp median(in a1, a2, a3, a4, a5 : ns(32); out q1 : ns(32)) {
sig z1, z2, z3, z4, z5, z6, z7, z8, z9, z10 : ns(3);
sig s1, s2, s3, s4, s5 : ns(1);
always {
z1 = (a1 < a2);
z2 = (a1 < a3);
z3 = (a1 < a4);
z4 = (a1 < a5);
z5 = (a2 < a3);
z6 = (a2 < a4);
z7 = (a2 < a5);
z8 = (a3 < a4);
z9 = (a3 < a5);
z10 = (a4 < a5);
s1 =
(( z1 + z2 +
z3 + z4) == 2);
s2 =
(((1-z1) + z5 +
z6 + z7) == 2);
s3 =
(((1-z2) + (1-z5) +
z8 + z9) == 2);
s4 =
(((1-z3) + (1-z6) + (1-z8) + z10) == 2);
q1 =
s1 ? a1 : s2 ? a2 : s3 ? a3 : s4 ? a4 : a5;
}
}
Schaumont, Listing 5.19 - GEZEL Datapath of a median calculation of five numbers
the algorithm presented by this datapath also works when some elements are equal
the presented algorithm requires 192 I/O lines and 10 comparators
to this end the hardware stores the four data preceding the last one from the input stream and at every iteration with a new input element reuses the stored results of six comparisons from the previous three iterations
Schaumont, Figure 5.9 - Median-calculation datapath for a stream of values
dp median(in a1 : ns(32); out q1 : ns(32)) {
reg a2, a3, a4, a5 : ns(32);
sig z1, z2, z3, z4;
reg z5, z6, z7, z8, z9, z10 : ns(3);
sig s1, s2, s3, s4, s5 : ns(1);
always {
a2 = a1;
a3 = a2;
a4 = a3;
a5 = a4;
z1 = (a1 < a2);
z2 = (a1 < a3);
z3 = (a1 < a4);
z4 = (a1 < a5);
z5 = z1;
z6 = z2;
z7 = z3;
z8 = z5;
z9 = z6;
z10 = z8;
s1 =
(( z1 + z2 +
z3 + z4) == 2);
s2 =
(((1-z1) + z5 +
z6 + z7) == 2);
s3 =
(((1-z2) + (1-z5) +
z8 + z9) == 2);
s4 =
(((1-z3) + (1-z6) + (1-z8) + z10) == 2);
q1 =
s1 ? a1 : s2 ? a2 : s3 ? a3 : s4 ? a4 : a5;
}
}
the filter in fig 5.9 accepts a new input and produces a new output at every clock cycle
the figure shows the schedule, the FSMD which implements this idea is in Schaumont Sect. 5.5.4, which also presents the testbench FSMD here reproduced aside the figure
Schaumont, Figure 5.10 - Sequential schedule median-calculation datapath for a stream of values
dp t_median {
sig istr, ostr : ns(1);
sig a1_in, q1 : ns(32);
use median(istr, a1_in, ostr, q1);
reg r : ns(1);
reg c : ns(16);
always { r = ostr; }
sfg init { c = 0x1234; }
sfg sendin { a1_in = c;
c = (c[0] ˆ c[2] ˆ c[3] ˆ c[5]) # c[15:1];
istr = 1; }
sfg noin { a1_in = 0;
istr = 0; }
}
fsm ctl_t_median(t_median) {
initial s0;
state s1, s2;
@s0 (init, noin) -> s1;
@s1 (sendin) -> s2;
@s2 if (r) then (noin) -> s1;
else (noin) -> s2;
}
a simple example: memory controller
State transition diagram of a memory controller
general FSM hardware structure
FSM hardware structure
in the VHDL description, states form an enumerated type
(see enclosed code):
a binary encoding of the state enumerated type is needed to synthesize an HDL description of an FSM
frequently used encodings:
explicit state encoding may be described in VHDL by defining the state type as a subtype of STD_LOGIC_VECTOR and by declaring the state encodings as constants of that type; for example, here are sequential and one-hot encodings of the states from the seen example:
subtype state is std_logic_vector(1 downto 0);
constant idle : state := "00";
constant choice : state := "01";
constant read : state := "10";
constant write : state := "11";
subtype state is std_logic_vector(3 downto 0);
constant idle : state := "0001";
constant choice : state := "0010";
constant read : state := "0100";
constant write : state := "1000";
if the encoding is not surjective, e.g. a one-hot encoding, the construct case stato is when ... must have the final clause when others ...
Current-state dependent outputs (Moore FSM)
the block diagram corresponds to the two-process and one-process descriptions in the enclosed code
Next-state dependent outputs
this may be reduced to the flip-flop clock-to-q delay by computing outputs based on the next state and by placing registers before outputs
N.B. since processes are sequential, this optimization is also applicable to two-process and one-process architectures as well, by deferring outputs' updates until after state updates
Reduction of clock-output delay
another optimization technique consists in encoding states so as to let outputs coincide with some of the (encoded) state bits
e.g. in the seen example:
this experience has a twofold target
the I/O for the first target model consist of:
after the initial state where the generator is initialized, it is suggested to let the control FSM go through a sequence of four states:
for the second target, use can be made of the 7-segment display decoder from the two latest lab experiences, but this is to be fed with BCD representations of the 16-bit output decimal digits: five displays are needed to this purpose and package bcd, is available, that provides the BCD_encode function
recommended readings:
readings for further consultation:
useful materials for the proposed lab experience
sources of examples in this presentation: