pencil and rubber

Logo of Triple-A Level WCAG-1 Conformance, W3C-WAI Web Content Accessibility Guidelines 1.0

XHTML 1.0 Conformance Validation CSS 3 Conformance Validation
Logo of Department of Mathematics and Computer Science, Course on Dedicated systems, link to Forum

FPGA implementation of a memory-mapped coprocessor

Tutorial 11 on Dedicated systems

Teacher: Giuseppe Scollo

University of Catania
Department of Mathematics and Computer Science
Graduate Course in Computer Science, 2017-18

Table of Contents

  1. FPGA implementation of a memory-mapped coprocessor
  2. tutorial outline
  3. project workflow
  4. coprocessor hardware interface
  5. coprocessor as a Qsys component (1)
  6. coprocessor as a Qsys component (2)
  7. coprocessor as a Qsys component (3)
  8. Nios II system with coprocessor and Performance Counter
  9. mapping to FPGA and compilation
  10. software driver
  11. test and performance measurement programs (1)
  12. test and performance measurement programs (2)
  13. test with blocking acceleration
  14. test with nonblocking acceleration
  15. references

tutorial outline

this tutorial deals with:

project workflow

development main phases:

coprocessor hardware interface

two VHDL sources implement the memory-mapped coprocessor:

both files are avaulable in the vhdl folder of the attached archive, as well as in the VHDL/code/e11 folder of the reserved lab area

consultation of the delay_collatz_interface.vhd source shows the relationships between the I/O signals of the computational component and the Avalon interface signals

coprocessor as a Qsys component (1)

folder codesign in the attached archive is preset to host the project development

after creation of project delay_collatz_codesign, with top-level entity having the same name, the construction of the custom component delay_collatz_interface may proceed

the new component type definition is shown in the figure

definition of a new component type delay_collatz_avalon_interface

coprocessor as a Qsys component (2)

the next step is the assignment of VHDL files that describe the component and their analysis, as shown in the figure

definition and analysis of files for synthesis of the component

coprocessor as a Qsys component (3)

finally, the new component definition ends with the definition of its Avalon interfaces and placement of its signals under the appropriate interfaces, as shown in the figure

definition of Avalon signals and interfaces of the component

Nios II system with coprocessor and Performance Counter

structure of the hardware system built with Qsys

address map following the Qsys assignments to system components

mapping to FPGA and compilation

for the construction of the Nios II system shown in the previous figures it may be useful to consult the Qsys introduction tutorial

the final steps to map the system to the FPGA are as follows:

in Qsys:

exit Qsys, then in Quartus:

software driver

folder script in the attached archive contains two TCL scripts for the generation of the software driver in the BSP for the project

these two scripts are to be copied in folder codesign/ip/delay_collatz_avalon_interface

the TCL scripts were written by analogy with the TCL script for the software driver of the Performance Counter, available in the Quartus Prime Lite 16.1 distribution under path
$SOPC_KIT_NIOS2/../ip/altera/sopc_builder_ip/altera_avalon_performance_counter

the motivation for this, perhaps unorthodox, way of producing the software driver lies in the twofold fact that

together with a somewhat reasonable level of operational analogy between the two components

test and performance measurement programs (1)

folder src in the attached archive contains the subject programs, which are to be copied in the provided folders for the creation of test and performance measurement projects under the Monitor Program, as follows:

project creation parameters are summarized in the attached file MonitorNotes.txt

main differences between the source of lab tutorial 09 and the present sequential version:

test and performance measurement programs (2)

the pipelined version of the program exhibits much stronger differences with respect to the program of lab tutorial 09:

the synchronization mechanism is very simple, thanks to properties of the custom component and of the waitrequest signal of the Avalon MM protocol:

test with blocking acceleration

compilation, loading on the FPGA and execution of program delay_collatz_sequential_timing.c, in the two projects codesign/amp_s and codesign/amp_s_o3, produces the Performance Counter Reports in the figure

Performance Report for the sequential version, optimization O1

Performance Report for the sequential version, optimization O3

a speed-up by an order of magnitude, w.r.t. the software computation in lab tutorial 09, results from the performance data in that case, with the same optimization levels

Performance Report for the software version, optimization O1

Performance Report for the software version, optimization O3

test with nonblocking acceleration

it is sensible to expect a further performance gain out of the nonblocking execution of the computation by the custom hardware

the comparison of the following Performance Counter Reports with the corresponding data for the implementation with all computation done in software, yields a 21x speed-up with default optimization O1 and a 16x speed-up with optimization O3; the corresponding speed-up values with blocking acceleration are 15x with O1 and 13x with O3

Performance Report for the pipelined version, optimization O1

Performance Report for the pipelined version, optimization O3

references

useful materials for the proposed lab experience: