DMI – Graduate Course in Computer Science
Copyleft
2020 Giuseppe Scollo
outline:
a hardware interface connects a custom hardware module to a coprocessor bus or an on-chip bus
hardware interface design should match the flexibility of custom hardware design to the realities of the hardware/software interface
typical functions of the hardware interface:
Schaumont, Figure 12.1 - The hardware interface maps a custom-hardware module to a hardware-software interface
Schaumont, Figure 12.2 - Layout of a coprocessor hardware interface
components commonly found in a hardware interface:
from the perspective of the custom hardware module, it is common to partition the collection of ports into data input/output ports and control/status ports
the separation of control and data is an important design aspect, for, in a coprocessor design, the granularity of interaction between data and control is chosen by the designer
features of a coprocessor data port: wordlength, direction and update rate
to make a good mapping of actual hardware ports to custom interface ports, it is convenient to start from the features of the actual hardware ports
when this module is implemented as a memory-mapped coprocessor, the ports of the hardware interface will be implemented as memory-mapped registers
however, it may not always be possible to allocate an arbitrary number of memory-mapped ports in the hardware interface—in that case, one needs to multiplex the custom-hardware module ports over the hardware interface ports
multiplexing can be implemented in different ways: the first is time-multiplexing of the hardware module ports; the second is to use an index register in the hardware interface
Schaumont, Figure 12.3 - Time-multiplexing of two hardware-module ports over a single control-shell port
Schaumont, Figure 12.4 - Index-register to select one of eight output ports
multiplexing is also useful to handle long operands piecewise, whereby the operand can be provided one piece at a time by means of time-multiplexing
masking is a technique to work with very short operands, e.g. to group several single-bit ports of the hardware module in a hardware interface port: a mask register is used to this purpose, to bit-mask the module ports involved in an update, e.g.: new_hw_port = (old_hw_port & ~mask) | (upd_value & mask)
control design in a coprocessor is the collection of activities to generate control signals and to capture status signals
figure 12.5 shows a generic architecture to control a custom hardware module
Schaumont, Figure 12.5 - Command design of a hardware interface
figure 12.6 shows the architecture of a coprocessor that can achieve communication/computation overlap, as illustrated in figure 12.7
Schaumont, Figure 12.6 - Hierarchical control in a coprocessor
Schaumont, Figure 12.7 - Execution overlap using hierarchical control
the command interpreter analyzes each command from software and splits it up into a combination of commands for the lower-level FSMs
to effectively achieve execution overlap, a pipelining of the FSM actions is to be organized, where the command interpreter should adapt to the individual schedules of the lower-level FSMs
programmer’s model = control design + data design
the address map reflects the organization of software-readable and software-writable storage elements of the hardware module; its design should consider the viewpoint of the software designer rather than the hardware designer, thus:
the design of a good instruction set is a hard problem, that requires the codesigner to make a proper trade-off between flexibility and efficiency
here are a few generic design guidelines:
a recent lab tutorial presented a software implementation of the delay computation of a Collatz trajectory with given start point
hardware implementations of the same function were the subject of previous lab experiences
the performance measurements carried out on the software implementation show that it consumes almost all of the program execution time
a first alternative to evaluate: to integrate the hardware function as a custom instruction or as a memory-mapped coprocessor?
other design decisions depend on this first decision, as follows
the VHDL description of the circuit which computes the function is to be embedded into a component equipped with Avalon interfaces for the Clock, Reset, and Avalon MM Slave signals, so as to receive the initial data by a write operation and to return the result by a reply to a read operation
addressing of the coprocessor: since the (initial data) write and (final result) read operations take place at different times and have the same data size, a single address suffices
software driver : two macros and a function may be defined for the bus access software interface: DC_RESET(d), DC_START(d,x0), unsigned int delay(d), where d is the address assigned to the coprocessor
recommended readings:
for further consultation: