516 lines
27 KiB
TeX
516 lines
27 KiB
TeX
|
\chapter{Hardware Trace Measurement}
|
||
|
\label{section:trace_measurement}
|
||
|
|
||
|
Computer systems can be analyzed with measurement tools that detect events,
|
||
|
i.e.\ changes in the state of a system \cite[p. 28]{ferrari1978computer}. The
|
||
|
same event can be interpreted on different levels as shown in
|
||
|
\autoref{fig:trace_event_levels}. A hardware trace tool can detect a voltage
|
||
|
change in memory, e.g.\ triggered by the processor which is a hardware event.
|
||
|
Accordingly, the variable that maps to the changed memory register changes too
|
||
|
which is a software event. If this variable is related to the state of a task,
|
||
|
a change of the variable also means a change of the task state which is then
|
||
|
called a system event.
|
||
|
|
||
|
In many cases, the event of interest cannot be measured directly. One or more
|
||
|
transformation steps are required to retrieve the required result. If a
|
||
|
transformation process is executed the measurement is said to be indirect
|
||
|
\cite[p. 28]{ferrari1978computer}. Considering the previous example a task
|
||
|
termination event cannot be measured directly. However, a variable that
|
||
|
contains the current task state can be measured. If the task corresponding
|
||
|
to the variable and the mapping from value to task state is known, a change of
|
||
|
the variable can be transformed into a higher level event the termination of a
|
||
|
task. After the transformation process the measurement results can be
|
||
|
displayed to the user as shown in \autoref{fig:concept_measurement}.
|
||
|
|
||
|
\begin{figure}[]
|
||
|
\centering
|
||
|
\includegraphics[width=\textwidth]{./media/trace/concept_measurement.pdf}
|
||
|
\caption[Measurement process]{The conceptual parts of a measurement process
|
||
|
according to Ferrari \cite{ferrari1978computer}. A sensor measures data. One
|
||
|
or more transformation steps are required if the data is not yet in the
|
||
|
desired format. Finally the result can be presented to the user.}
|
||
|
\label{fig:concept_measurement}
|
||
|
\end{figure}
|
||
|
|
||
|
During the transformation step the collected data may be manipulated which is
|
||
|
called prereduction. Prereduction may for example be used when the actual
|
||
|
event is not required, but rather the amount of events of a certain type that
|
||
|
occurred. For this case the transformer would increment a counter whenever a
|
||
|
certain event type is collected. If no prereduction is executed, the
|
||
|
measurement process is called tracing. Tracing is the process of recording a
|
||
|
sequence of events in chronological order of occurrence \cite[p.
|
||
|
30]{ferrari1978computer}. The result of this process is called a trace.
|
||
|
|
||
|
\section{Trace Tools}
|
||
|
|
||
|
Ferrari \cite[p. 31ff]{ferrari1978computer} distinguishes three trace
|
||
|
measurement tools: software, hybrid, and hardware tools. All tools are meant
|
||
|
to examine the behavior of a system. However, there are differences in
|
||
|
interference, resolution, and cost as summarized in
|
||
|
\autoref{tab:trace_tool_overview}.
|
||
|
|
||
|
If a measurement tool uses resources of the target system it causes
|
||
|
interference by using computational power and memory that could otherwise be
|
||
|
utilized by the application. A tool that causes interference is said to be
|
||
|
intrusive and may cause degradation, a reduction in performance of the target
|
||
|
system \cite[p. 29]{ferrari1978computer}. Consequently, intrusive trace tools
|
||
|
change the real-time behavior of an application.
|
||
|
|
||
|
An event can be represented on different levels. A voltage level change in
|
||
|
memory can map to a variable which can map to the state of a task as
|
||
|
visualized in \autoref{fig:trace_event_levels}. Those levels are called
|
||
|
hardware level, software level, and system level. To clarify the level of a
|
||
|
trace, it can be mentioned explicitly. For instance, a trace consisting of
|
||
|
hardware level events is a hardware level trace \cite[p. 29f]{felixproject2}.
|
||
|
Tools that can detect hardware events occurring at a microscopic level are
|
||
|
said to have a higher resolution than tools that can detect software events
|
||
|
only.
|
||
|
|
||
|
\begin{figure}[]
|
||
|
\centering
|
||
|
\includegraphics[width=\textwidth]{./media/trace/trace_event_levels.pdf}
|
||
|
\caption[Measurement levels]{A measurement event can be interpreted on
|
||
|
different levels. A voltage change in memory can be detected by a hardware
|
||
|
trace tool capable of supervising the memory bus that triggers the voltage
|
||
|
change. The memory section can relate to a variable, that changes in
|
||
|
consequence of the voltage change, which is a software event. If the variable
|
||
|
is related to the state of a task, a change of the variable also means a
|
||
|
change of the task state which is then called a system event.}
|
||
|
\label{fig:trace_event_levels}
|
||
|
\end{figure}
|
||
|
|
||
|
Different trace techniques can detect and record events with different
|
||
|
frequencies. The maximum frequency is usually not limited by the speed with
|
||
|
which events can be detected, but by the available bandwidth to process and
|
||
|
record the detected events.
|
||
|
|
||
|
The cost of different trace tools depends on several factors, the price for
|
||
|
hardware and software licenses, the price for installing and maintaining the
|
||
|
tool, educational costs, like training for the users of a tool, and the costs
|
||
|
of operating the tool.
|
||
|
|
||
|
\textbf{Software tools} add instructions to a hardware-software system in order
|
||
|
to detect and record events of interest. Added instructions are called
|
||
|
instrumentation. The simplest kind of instrumentation is a classical write to
|
||
|
the standard output interface, e.g.\ a \lstinline{printf} statement in the C
|
||
|
programming language. Instructions may be added to the application code
|
||
|
directly, via the compiler or post compilation via dynamic binary
|
||
|
instrumentation \cite{trumper2012maintenance}\cite{felixarc2015}. If no
|
||
|
standard output interface is available, events are recorded into memory on
|
||
|
target. From there they can be read out via debugger or serial interface.
|
||
|
Instrumentation always interferes with the application. There are two
|
||
|
components of interference, a space, and a time component \cite[p.
|
||
|
44]{ferrari1978computer}. Execution of instrumentation code takes time and
|
||
|
storing detected events uses memory space. Software tools have a low
|
||
|
resolution because they cannot detect events on a hardware level. Event
|
||
|
detection frequency is limited by the available computational resources. On
|
||
|
the upside they are usually cheap and easy to implement and use.
|
||
|
|
||
|
\textbf{Hardware tools} do not rely on instrumentation which means that they
|
||
|
are non intrusive and do not interfere with the application
|
||
|
\cite{felixarc2014}. Hardware tracing works via a dedicated trace device chip
|
||
|
that is located on the silicon of the CPU\@. Trace devices provide a very high
|
||
|
resolution since they are capable of detecting events at hardware level
|
||
|
\cite{mink1989performance}. Additionally the event detection frequency can be
|
||
|
as high as the actual system frequency, thus it is possible to record a
|
||
|
complete hardware-software system in real-time. Hardware tools are more
|
||
|
expensive compared to software solutions. Installation and maintenance are
|
||
|
more complex and require properly qualified users.
|
||
|
|
||
|
\textbf{Hybrid tools} rely on instrumentation and a dedicated hardware
|
||
|
interface to record events. The boundary between software, hybrid, and
|
||
|
hardware tools can be fuzzy in certain cases. Software tools need some kind of
|
||
|
hardware interface to send recorded traces off-chip. In this sense, all
|
||
|
software tools are hybrid tools. However, industry hybrid solutions often
|
||
|
require proprietary target interfaces which justifies why these tools fit into
|
||
|
a separate category \cite{richterganzheitliche}. Compared to pure software
|
||
|
tools, hybrid tools interfere with the system to a lesser extent
|
||
|
\cite{nacht1989hardware}. A dedicated hardware interface allows it to send
|
||
|
events off-chip in real-time. Consequently, more memory becomes available on
|
||
|
target.
|
||
|
|
||
|
As shown in \autoref{tab:trace_tool_overview} hardware trace tools have many
|
||
|
advantages over hybrid and software based solutions. Hardware tracing does not
|
||
|
interfere with the system, which is especially important for real-time systems.
|
||
|
Hardware trace tools are capable of detecting events with a higher resolution
|
||
|
and frequency. Additionally the trace duration of software and hybrid traces
|
||
|
is limited to the available memory on target and to the trace interface
|
||
|
bandwidth. When the same quantity can be measured by a hardware and a software
|
||
|
tool, the values obtained by the hardware tool are usually to be considered
|
||
|
more accurate because of the lower interference \cite[p.
|
||
|
45]{ferrari1978computer}.
|
||
|
|
||
|
\begin{table}[]
|
||
|
\centering
|
||
|
\begin{tabular}{r|c c c}
|
||
|
& Software & Hybrid & Hardware \\
|
||
|
\hline
|
||
|
Interference & high & low & no \\
|
||
|
Resolution & low & low & high \\
|
||
|
Cost & low & low & high \\
|
||
|
Frequency & low & low & high \\
|
||
|
\end{tabular}
|
||
|
\caption[Trace techniques]{Properties of different trace
|
||
|
measurement tools \cite[p. 6]{felixproject1}. Hardware tools are superior
|
||
|
to software and hybrid tools but come with higher expenses.}
|
||
|
\label{tab:trace_tool_overview}
|
||
|
\end{table}
|
||
|
|
||
|
\section{Hardware Tracing}
|
||
|
\label{subsection:hardware_tracing}
|
||
|
|
||
|
Hardware tracing is capable of recording events on hardware level. A dedicated
|
||
|
on-chip trace device and trace interface is required to record hardware events
|
||
|
and send them off-chip \cite{mink1990multiprocessor}. Target access hardware
|
||
|
is connected to the trace interface to readout the trace measurement results.
|
||
|
From there the events are forwarded to a host computer for further processing.
|
||
|
Software that runs on the host computer in order to analyze the recorded trace
|
||
|
data is provided by the target access hardware vendor \cite{winidea}. The term
|
||
|
host software is used to refer to such applications.
|
||
|
|
||
|
The on-chip trace device is designed to record hardware events executed by the
|
||
|
microcontroller. It occupies a separate section on the silicon. Usually a
|
||
|
controller is delivered in two versions, one with and one without trace device.
|
||
|
In production the ability to execute trace measurement is not required
|
||
|
\cite{felixarc2014}. Therefore, the trace device would only increase chip
|
||
|
costs without providing any benefits.
|
||
|
|
||
|
\begin{figure}[]
|
||
|
\centering
|
||
|
\includegraphics[width=\textwidth]{./media/trace/tc27_emulation_device.png}
|
||
|
\caption[Infineon TC27x trace device]{A microcontroller with hardware trace
|
||
|
support consists of two sections. A regular product chip part and the trace
|
||
|
device part. The trace device part can be omitted in the production version
|
||
|
of a chip to save costs \cite{tc27block}.}
|
||
|
\label{fig:tc27_emulation_device}
|
||
|
\end{figure}
|
||
|
|
||
|
\autoref{fig:tc27_emulation_device} shows the trace device of the Infineon
|
||
|
TC27x microcontroller family \cite{tc27x}. The upper part belongs to the
|
||
|
product chip while the lower part displays the trace device. The trace device
|
||
|
can gather data from the product part via two interfaces. \glspl{pob}
|
||
|
(\glsdesc{pob}) record processor events while \glspl{bob} record bus events.
|
||
|
All events are collected, enhanced with a timestamp and buffered in the on-chip
|
||
|
trace memory. From there they are sent off-chip via the dedicated trace
|
||
|
interface.
|
||
|
|
||
|
|
||
|
\begin{figure}[]
|
||
|
\centering
|
||
|
\includegraphics[width=\textwidth]{./media/trace/timestamp_generation_event.pdf}
|
||
|
\caption[Timestamp per event]{Each trace event is assigned a timestamp
|
||
|
relative to the previous event. By summing up the relative timestamps
|
||
|
absolute values can be generated.}
|
||
|
\label{fig:timestamp_generation_event}
|
||
|
\end{figure}
|
||
|
|
||
|
\begin{figure}[]
|
||
|
\centering
|
||
|
\includegraphics[width=\textwidth]{./media/trace/timestamp_generation_dedicated.pdf}
|
||
|
\caption[Dedicated timestamp generation]{Via dedicated timestamp events, the
|
||
|
timestamps of the other events can be interpolated. In this example two
|
||
|
events are recorded between the previous and the next timestamp event. This
|
||
|
is why both events get the same timestamp, based on these events. The value
|
||
|
is calculated via \autoref{eq:timestamp_interpolation} as $t_i = 5 +
|
||
|
\frac{(15-5)}{2}=10$.}
|
||
|
\label{fig:timestamp_generation_dedicated}
|
||
|
\end{figure}
|
||
|
|
||
|
|
||
|
\begin{figure}[]
|
||
|
\centering
|
||
|
\includegraphics[width=\textwidth]{./media/trace/timestamp_generation_io.pdf}
|
||
|
\caption[Timestamp via \gls{io}]{Dedicated \gls{io} pins can be used to output
|
||
|
a timestamp value whenever a measurement event is sent off-chip.}
|
||
|
\label{fig:timestamp_generation_io}
|
||
|
\end{figure}
|
||
|
|
||
|
There exist different techniques to add timestamp information to a trace event.
|
||
|
The obvious way is shown in \autoref{fig:timestamp_generation_event}. A
|
||
|
timestamp is added to each trace event that is sent off-chip. To save
|
||
|
bandwidth timestamps are provided relatively to the previous event. An
|
||
|
absolute value is computed by summing up all previous timestamp.
|
||
|
|
||
|
Another way is to send dedicated timestamp messages as shown in
|
||
|
\autoref{fig:timestamp_generation_dedicated}. The timestamps for the actual
|
||
|
trace events are then interpolated, e.g., via the equation
|
||
|
|
||
|
\begin{equation}
|
||
|
\label{eq:timestamp_interpolation}
|
||
|
t_{i} = t_p + \frac{(t_n - t_p)}{2},
|
||
|
\end{equation}
|
||
|
|
||
|
where $t_p$ is the previous timestamp (the latest timestamp before the event),
|
||
|
$t_n$ the next timestamp (the soonest timestamp after the event) and $t_i$ the
|
||
|
timestamp interpolated based on the dedicated timestamp events.
|
||
|
|
||
|
Finally, timestamps can also be created via dedicated \gls{io} pins as
|
||
|
specified by the Nexus \cite{turley2004nexus} standard. This means that
|
||
|
whenever a trace event is sent off-chip via the trace interface, the current
|
||
|
timestamp is provided via the \gls{io} pins as shown in
|
||
|
\autoref{fig:timestamp_generation_io}.
|
||
|
|
||
|
Cycle accurate timestamps are feasible with all timestamp generation
|
||
|
techniques. However, timestamp accuracy and resolution are only partly
|
||
|
dependent on the generation technique. More important factors are CPU and
|
||
|
trace device clock frequency, as well as the design of CPU and trace device.
|
||
|
For cycle accurate timestamps, trace device frequency must be greater or equal
|
||
|
to CPU frequency. Even if this is the case, cycle accurate time\-stamps cannot
|
||
|
necessarily be guaranteed.
|
||
|
|
||
|
For example, super scalar processors like the Infineon TC277 \cite{tc27x} are
|
||
|
capable of executing more than one instructions per cycle. However, only one
|
||
|
event can be processed per cycle by the trace device as shown in
|
||
|
\autoref{fig:timestamp_cycle}. The processor observation block filters the
|
||
|
instructions according to user specified filter rules and forwards them for
|
||
|
further processing. If two instructions, executed during the same processor
|
||
|
cycle, match the filter and are thus forwarded to the trace device, one of
|
||
|
those instructions is delayed by one cycle (in this example Instruction 2.1).
|
||
|
For a processor running at \unit[100]{MHz} this would set the timestamp off by
|
||
|
\unit[10]{ns} for this particular event.
|
||
|
|
||
|
\begin{figure}[]
|
||
|
\centering
|
||
|
\includegraphics[width=\textwidth]{./media/trace/timestamp_cycle.pdf}
|
||
|
\caption[Timestamp generation accuracy]{Even if the trace device runs at CPU
|
||
|
clock frequency, cycle accurate timestamps cannot be guaranteed.}
|
||
|
\label{fig:timestamp_cycle}
|
||
|
\end{figure}
|
||
|
|
||
|
The design of trace devices differs depending on the processor family and the
|
||
|
processor vendor. However, the general concept and provided functionality are
|
||
|
the same for all devices. Various standards for the implementation of
|
||
|
trace devices are specified and used by chip vendors. Three common standards
|
||
|
are Nexus used by PowerPC processors \cite{turley2004nexus}, \gls{etm}
|
||
|
(\glsdesc{etm}) used by ARM processors \cite[p. 476]{yiu2013definitive}, and
|
||
|
the \glsdesc{imds} \cite{stollon2011infineon} discussed here and shown in
|
||
|
\autoref{fig:tc27_emulation_device}.
|
||
|
|
||
|
According to \autoref{fig:concept_measurement}, a measurement process starts
|
||
|
with the detection of an event by a sensor. In case of the trace process the
|
||
|
sensors are the \glspl{pob} and \glspl{bob}. Each \gls{pob} monitors the
|
||
|
instructions executed by one processor core. This means the complete program
|
||
|
flow executed by a processor core can be recorded. \glspl{bob} are connected
|
||
|
to the data busses of the microcontroller and can detect memory access events.
|
||
|
A memory access event may be for example, writing to a variable or reading
|
||
|
from a special function register. A typical data trace event contains in
|
||
|
addition to the timestamp, details like address, data value, transfer size, and
|
||
|
whether a read or write access occurred \cite{hopkins2006debug}.
|
||
|
|
||
|
Filters can be specified by the user to reduce the amount of recorded trace
|
||
|
events. They can be set for an address or for an address range. Different
|
||
|
events can be executed if an address filter matches: the corresponding event
|
||
|
can be recorded, discarded or another event can be triggered. For example, it
|
||
|
is possible to start or stop the trace process if a specific function is
|
||
|
accessed or a variable is written. Filter configuration is done via the host
|
||
|
software.
|
||
|
|
||
|
Corresponding to the two main hardware event types, instruction, and data
|
||
|
access events, two hardware trace techniques can be distinguished, program flow
|
||
|
trace and data trace \cite{felixarc2014}. The two trace techniques can be
|
||
|
executed in parallel or individually as configured by the user.
|
||
|
|
||
|
A \textbf{program flow trace} (also called function trace) shows the complete
|
||
|
execution path of an application for the duration of the trace recording. This
|
||
|
means it is possible to detect when a certain function is called or which
|
||
|
branch of an if statement is executed. The amount of instructions and the
|
||
|
resulting data stream bandwidth produced by a modern CPU is too big to be
|
||
|
transmitted via the trace interface. To solve this problem trace devices use
|
||
|
trace compression. The most commonly used program flow trace compression
|
||
|
technique works by detecting and recording only such instructions that cause a
|
||
|
change in program flow such as conditional jumps and traps
|
||
|
\cite{hopkins2006debug}. Using the application binary the host software is
|
||
|
able to reconstruct the complete program flow.
|
||
|
|
||
|
A \textbf{data trace} is a sequence of data access events. Data tracing allows
|
||
|
it to supervise and to debug the state of variables in memory. Data tracing of
|
||
|
all active units is becoming increasingly important because not all data
|
||
|
interactions involve a processor \cite{mayer2003debug}. Thus, trace devices
|
||
|
must also be able to detect memory accesses via \gls{dma} (\glsdesc{dma}) and
|
||
|
accesses to memory of special on-chip modules like FlexRay or Ethernet. The
|
||
|
units that are supported by a microcontroller are depended on the trace device,
|
||
|
but all trace devices support tracing the main memory of a controller.
|
||
|
Compression is also applied to data traces. However, those techniques are
|
||
|
usually not sufficient to record a complete data trace of significant length
|
||
|
since the amount of generated data is too big. The best way to solve this
|
||
|
problem is to apply filters to avoid detecting and recording data events in
|
||
|
memory sections that are not of interest \cite{hopkins2006debug}.
|
||
|
|
||
|
A recorded hardware trace event is buffered into an on-chip trace memory. From
|
||
|
there the events can be read via the trace interface. On-chip trace memories
|
||
|
can be operated in different modes \cite{felixarc2014}. In continuous mode
|
||
|
the trace data is streamed of chip in real-time. This technique is limited by
|
||
|
the bandwidth of the trace interface. If it is high enough the trace duration
|
||
|
is only depended on the available memory on the host computer and traces of
|
||
|
arbitrary length can be recorded. If the bandwidth is too small to process the
|
||
|
recorded trace stream \emph{buffer mode} must be used. This means the recorded
|
||
|
trace is written into trace memory and read out by the target access hardware
|
||
|
post tracing. Buffer mode can be used in pre- and post-trigger mode. In
|
||
|
pre-trigger mode the trace buffer is filled like a circular buffer. The oldest
|
||
|
events are discarded for new events. The trace process can be stopped at an
|
||
|
arbitrary point in time and the latest trace events become available. In
|
||
|
post-trigger mode the trace process is stopped as soon as the buffer has been
|
||
|
filled for the first time.
|
||
|
|
||
|
A trace device operated in buffer mode is limited by the available trace
|
||
|
memory. The trace memory size of an Infineon TC275 microcontroller
|
||
|
(\autoref{fig:workbench} a)is \unit[2]{MB} which allows for approximately
|
||
|
\unit[33]{ms} of unfiltered function and data trace of a single processor core
|
||
|
running at \unit[200]{MHz} \cite{felixarc2014}. Depending on the measurement
|
||
|
use case this may be sufficient or not. If the trace duration should be
|
||
|
increased tracing in continuous mode is mandatory. Continues tracing requires
|
||
|
a high bandwidth interface such as \gls{agbt} (\glsdesc{agbt}).
|
||
|
|
||
|
\section{Hardware Trace Toolchain}
|
||
|
|
||
|
Multiple steps are required from recording a hardware trace on target to
|
||
|
presenting it to the user on a personal computer as shown in
|
||
|
\autoref{fig:toolchain}. Many different solutions exist for each of those
|
||
|
steps. Nevertheless, the basic functionalities provided by all solutions is
|
||
|
comparable to each other.
|
||
|
|
||
|
\begin{figure}[]
|
||
|
\centering
|
||
|
\includegraphics[width=\textwidth]{./media/trace/toolchain.pdf}
|
||
|
\caption[Trace toolchain]{Recording a hardware trace and making it
|
||
|
available to the user requires multiple steps. Hardware events must be
|
||
|
measured on target via a trace device. Using a trace interface the recorded
|
||
|
data can be readout by the target access hardware and transmitted to a host
|
||
|
computer. Target access hardware vendors provide special software to analyze
|
||
|
and visualize the recorded trace.}
|
||
|
\label{fig:toolchain}
|
||
|
\end{figure}
|
||
|
|
||
|
The basic prerequisite for executing a hardware trace is the availability of an
|
||
|
on-chip trace device. All major chip vendors provide trace devices for their
|
||
|
microcontrollers that support program flow and data trace.
|
||
|
\autoref{tab:trace_devices} gives an overview of the state-of-the-art trace
|
||
|
solutions.
|
||
|
|
||
|
\begin{table}[]
|
||
|
\centering
|
||
|
\begin{tabular}{r|c c c}
|
||
|
Standard & Architecture & Function Trace & Data Trace\\
|
||
|
\hline
|
||
|
Nexus &
|
||
|
PowerPC &
|
||
|
\begin{tabular}[x]{@{}c@{}} Branch Trace \\ Messaging \end{tabular} &
|
||
|
\begin{tabular}[x]{@{}c@{}} Data Trace \\ Messaging \end{tabular} \\
|
||
|
\hline
|
||
|
\gls{etm} &
|
||
|
ARM &
|
||
|
\begin{tabular}[x]{@{}c@{}}Program Trace \\ Macrocell \end{tabular} &
|
||
|
\begin{tabular}[x]{@{}c@{}}Embedded Trace \\ Macrocell \end{tabular} \\
|
||
|
\hline
|
||
|
\gls{imds} &
|
||
|
TriCore &
|
||
|
\begin{tabular}[x]{@{}c@{}}Processor \\ Observation Block \end{tabular} &
|
||
|
\begin{tabular}[x]{@{}c@{}}Bus \\ Observation Block \end{tabular} \\
|
||
|
\end{tabular}
|
||
|
\caption[Trace devices for different architectures]{Trace devices exist for
|
||
|
different CPU architectures. All solutions provide methods for recording
|
||
|
program flow and data traces.}
|
||
|
\label{tab:trace_devices}
|
||
|
\end{table}
|
||
|
|
||
|
Events that have been recorded by the trace device are sent off-chip via a
|
||
|
dedicated trace interface. If the bandwidth provided by an interface is lower
|
||
|
than the transfer rate of created events continuous tracing is not possible.
|
||
|
However, this use case is often required. There are two ways two solve this
|
||
|
problem. The amount of created trace data can be reduced using filters or the
|
||
|
available bandwidth can be increased. If an entire application must be
|
||
|
analyzed as a whole the first way is not an option.
|
||
|
|
||
|
\begin{table}[]
|
||
|
\centering
|
||
|
\begin{tabular}{r|l c}
|
||
|
Interface & Pros/Cons & DAQ rate \small{$[MB/s]$}\\
|
||
|
\hline
|
||
|
JTAG &
|
||
|
\begin{tabular}[x]{@{}l@{}}
|
||
|
$+$ Reuse of existing interface \\
|
||
|
$+$ Small chip area \\
|
||
|
$-$ Low bandwidth \\
|
||
|
\vspace{1mm}
|
||
|
\end{tabular} &
|
||
|
1.2 \\
|
||
|
DAP2/SWD &
|
||
|
\begin{tabular}[x]{@{}l@{}}
|
||
|
$+$ High bandwidth with few pins \\
|
||
|
$+$ Small silicon area \\
|
||
|
$-$ Proprietary \\
|
||
|
\vspace{1mm}
|
||
|
\end{tabular} &
|
||
|
10 \\
|
||
|
\gls{agbt} &
|
||
|
\begin{tabular}[x]{@{}l@{}}
|
||
|
$+$ Very high bandwidth with few pins \\
|
||
|
$-$ Large silicon area \\
|
||
|
$-$ High cost \\
|
||
|
\vspace{1mm}
|
||
|
\end{tabular} &
|
||
|
30 \\
|
||
|
CAN &
|
||
|
\begin{tabular}[x]{@{}l@{}}
|
||
|
$+$ Robust and well known standard \\
|
||
|
$+$ Low cost \\
|
||
|
$-$ Very low bandwidth \\
|
||
|
\end{tabular} &
|
||
|
0.05 \\
|
||
|
\end{tabular}
|
||
|
\caption[Trace interfaces]{Commonly used trace interfaces and their \gls{daq}
|
||
|
(\glsdesc{daq}) rates. \gls{agbt} (\glsdesc{agbt}) is the only interface
|
||
|
capable of recording continuous hardware traces of a complete system.}
|
||
|
\label{tab:interfaces}
|
||
|
\end{table}
|
||
|
|
||
|
Mayer et al.\ \cite{interfaces} give an overview of trace interfaces used in
|
||
|
the automotive industry as shown in \autoref{tab:interfaces}. \gls{jtag}
|
||
|
(\glsdesc{jtag}) is a common debug standard \cite{ieee5001}, suitable for
|
||
|
regular debugging. It can be used to read out a buffered traced post tracing,
|
||
|
but for continuous tracing it is not sufficient due to its low bandwidth of
|
||
|
\unit[1.2]{MB/s}. Because of that DAP and DAP2 were developed by Infineon and
|
||
|
SWD by ARM\@. Both protocols are based on \gls{jtag} but use a higher
|
||
|
frequency and improved communication protocols to provided more bandwidth.
|
||
|
|
||
|
\gls{agbt} is currently the fastest trace interface. It was specified by
|
||
|
XILINX and adopted by the Nexus standard. \gls{agbt} is the only interface
|
||
|
which is theoretically capable of recording a continuous trace of a complete
|
||
|
application running on a processor with a frequency of \unit[200]{MHz}. CAN is
|
||
|
used by some hybrid trace tools but is only mentioned for completeness since
|
||
|
its bandwidth is too low to be considered for hardware tracing.
|
||
|
|
||
|
\begin{figure}[]
|
||
|
\centering
|
||
|
\includegraphics[width=\textwidth]{./media/trace/workbench.png}
|
||
|
\caption[Trace workbench]{A complete trace workbench. An Infineon TriCore
|
||
|
evaluation board (a) can be traced by the iSYSTEM iC6000 (b) or the Lauterbach
|
||
|
PowerTrace-2 (e) via the highspeed \gls{agbt} interface. Host software is
|
||
|
used to control the hardware and to analyze the recorded trace, for example
|
||
|
WinIDEA (c) by iSYSTEM and TRACE32 (d) by Lauterbach \cite{maxmaster}.}
|
||
|
\label{fig:workbench}
|
||
|
\end{figure}
|
||
|
|
||
|
Target access hardware is connected to the hardware interface to readout
|
||
|
recorded trace events. From the target access hardware the data is transmitted
|
||
|
to a host computer for further analysis via USB 3.0 or Ethernet. Examples for
|
||
|
target access hardware are the iC6000 by iSYSTEM \cite{ic6000}
|
||
|
(\autoref{fig:workbench} b) and the PowerTrace-II by Lauterbach
|
||
|
\cite{powertrace2} (\autoref{fig:workbench} e). Both devices support
|
||
|
different architectures and trace interfaces by using architecture specific
|
||
|
debug cables. Besides reading hardware traces those devices also support all
|
||
|
functionalities provided by a regular debugger such as step wise debugging,
|
||
|
reading of memory content, and manipulation of CPU configuration registers.
|
||
|
|
||
|
Dedicated software on the host computer is used to configure and control the
|
||
|
target access hardware and the trace device itself. After recording, this
|
||
|
software transforms the recorded hardware trace into a software trace (see
|
||
|
\autoref{fig:trace_event_levels}). For this process the host software must
|
||
|
have access to the \gls{elf} file of an application. This is required to map
|
||
|
the addresses of hardware trace events to the corresponding software entities.
|
||
|
Based on the software trace, different analysis techniques such as metric
|
||
|
evaluation, performance analysis, and code coverage are supported. Gantt
|
||
|
charts are provided to examine the trace visually. Via export functions a
|
||
|
software level program flow and data trace can be made available for external
|
||
|
tools. \autoref{fig:workbench} shows the toolchain described in this section.
|