Technological factors tip the balance
in favour of more hardware:
Computational performance
work done per time unit, or clock cycle: hardware parallelism as well as
dedicated hardware accelerators yield increased computational performance
Energy efficiency
may vary over several orders of magnitude, for example
(Schaumont, p. 14):
Energy efficiency of AES encryption implementations
Gb/J:
10-6
10-3
10-2
100
101
platform:
Java KVM Sparc
C Sparc
Asm Pentium-III
Virtex-II FPGA
0.18μm CMOS ASIC
Power density
computational performance improvement by clock frequency rise
is limited by the directly proportional rise of power dissipation,
hence by cost-effectiveness of cooling technology
→ parallel architectures
Best-match for HW/SW codesign: parallel computing platforms
Economical factors tip the balance
in favour of more software:
Design cost
Chip design is a very expensive effort, with high NRE cost;
reprogrammable chips, which allow reuse through reprogramming,
spread the chip design cost through multiple products or
product versions;
reprogrammability may take many different forms, though
Development time
Not only design cost but also development time of a new chip
is fairly high;
on the other hand, low time to market enables timely entry
into the market window;
this yields higher revenues, which is especially significant for
innovative products
Design complexity
fixed hardware means fixed design decisions;
the flexibility of software enables designers:
– to develop the application at a higher abstraction level, and
– to maintain the application through the changes needed
to resolve bugs or to cope with evolving requirements
Definable by the time granularity
of elementary (atomic) actions
Starting at the lowest abstraction level:
continuous signals
models are systems of differential equations;
useful for hybrid systems with analog components
not used in practice to describe typical HW/SW systems
discrete events
signal level changes at irregularly spaced points in time—lowest
abstraction for digital hardware
clock cycles
discrete events observed at regularly spaced time-intervals
register-transfer level (RTL) models, useful for single-clock
synchronous hardware
machine instructions
useful for simulation of complex software systems, where cycle-accurate
simulation would be too expensive; instruction-accurate simulation
may not reveal real time-performance, though
transactions
models expressed in terms of interactions between components of the
system; useful when even instruction-accurate simulation would be
too expensive, as well as in the early phases of a system design
Feature constructs for specification of (static)
structure as well as of (dynamic)
behaviour
The three most prominent ones, all with discrete event semantics:
VHDL
IEEE 1076 (revised) standard dates: 1987, 1993, 1999 (VHDL-AMS), 2006-2008
HW components are "entities" which comprise
"processes"; these react to events at their input ports
a "synthesizable" subset of VHDL may be automatically
compiled to an FPGA netlist
Verilog
IEEE 1364 standard (version) dates: 1995, 2001, 2005,
2009 (SystemVerilog: IEEE 1800)
similar to VHDL, but built-in support for 4-valued logic,
features for transistor-level description etc.
SystemC
a C++ class library providing required functions for HW modeling
structured into: core language, data types, elementary channels,
higher-level channels
A more concise language, for RTL description of synchronous hardware:
GEZEL
cycle-based: no explicit modeling of clock events
FSMD (Finite State Machine with Datapath) models
+ library of processor instruction-set simulators
automated translation of "proper" FSMD models to
synthesizable VHDL
for each positive integer
x0,
the (infinite) sequence of outcomes of the iterated application,
starting at
x0,
of the function over the positive integers defined by:
f(x) = 3x+1 if x odd, f(x) = x/2 if x even
since 3x+1 is even when x is odd, consider a slightly
compressed form of the trajectories, as is defined by iteration of the
function:
t(x) = (3x+1)/2 if x odd, t(x) = x/2 if x even
Conjecture: for every positive integer
x0,
the trajectory eventually falls into the small loop through 1
here is a hardware datapath that produces the t
trajectory (for 16-bit x0), and
its description in GEZEL
N.B. for odd x :
(3x+1)/2 = x + ⌊x/2⌋ + 1
dp collatz (
in start : ns(1) ;
in x0 : ns(16) ;
out t ns(32)) {
reg r : ns(32) ;
sig x : ns(32) ;
always {
t = r ;
x = start ? x0 : r ;
r = x[0] ? x + (x >> 1) + 1 : x >> 1 ;
}
}
time granularity: clock cycle, bus cycle, transaction
data exchange: abstract, scalar, composite
control: semaphores, handshake protocols, blocking vs nonblocking etc.
Computational performance
bottleneck analysis, say: —
channel at v bits/transfer, B cycles/transfer
—
coprocessor at w bits/execution, H cycles/execution
communication-constrained: v/B < w/H
computation-constrained: v/B > w/H
Collections of HW and SW tools for codesign development and testing
FPGA development boards are the basic hardware tools to this purpose
they come equipped with sophisticated software systems for high-level
codesign and cosimulation
for example, the DE1-SoC development board by Intel (see picture), which
hosts a Cyclone V FPGA chip, with an ARM Cortex-A9 processor on the same
chip, may include two NIOS II softcore processors on the FPGA, and is
supported by the Quartus Prime Lite software, freely available
A collection of Debian packages
for Ubuntu installation (updated for every new LTS up to 16.04)
N.B. package installation from the Gezel repository must follow
the manual installation instructions given in the
installation manual, adapted to the xenial distribution,
package version 2.5.15, and amd64 architecture if the machine
is 64-bit