MT/content/testbench.tex

\section{Evaluation Test Bench}
\label{section:evaluation_test_bench}

To make the results of the validation process comprehensible and reproducible
for others it is important to document the hardware and software setup, the
configuration of all tools in use, as well as the ways in which the traces are
compared.

\subsection{Software Setup}

\textbf{Simulation} is used to validate the \gls{btf} traces obtained from
hardware via tracing and transformation.  It allows analyzing of embedded
real-time systems by generating an event trace.  A simulation is easy to
configure and executable without hardware.  This is an advantage in
the early design stages of an application when the final target platform is
not yet defined.

Advanced simulation tools allow it to take platform dependent timing behavior
into account.  It is possible to select the \gls{os} and processor platform in
use.  Therefore, more accurate simulation results can be achieved.  For
example, memory access times \cite{christianmaster} and timing overheads caused
by \gls{os} service routines \cite{maxmaster} can be taken into consideration.

\glsdesc{ta} provides the simulation software used for validation
\cite{tasimulator}.  The \gls{ta} Simulator is based on a discrete-event system
simulation approach \cite{cassandras2008introduction, banks2000dm}.  It has
already been used successfully in research projects to evaluate scheduling
algorithms in multi-core systems \cite{deubzer2011robust}, to examine
synchronization protocols \cite{alfranseder2013modified}, and to validate
optimization algorithms for embedded applications \cite{stefanmaster}.  In this
thesis version {15.02.1} of the \gls{ta} Simulator is in use.

\begin{figure}[]
 \centering
 \begin{tikzpicture}[]
   \tikzstyle{level 1}=[sibling distance=30mm]
   \tikzstyle{level 2}=[sibling distance=40mm]
   \node{RTE Model}
       child {node {Hardware Model}}
       child {node {OS Model}}
       child {node {Software Model}}
   ;
 \end{tikzpicture}
 \caption[RTE model parts]{A \gls{rte} model consists of a hardware, an
 \gls{os} and a software part.}
 \label{fig:rte_model}
\end{figure}

\begin{figure}[]
 \centering
 \begin{tikzpicture}[]
   \tikzstyle{level 1}=[sibling distance=20mm]
   \tikzstyle{level 2}=[sibling distance=20mm]
   \node{Software Model}
       child {node {Processes}}
       child {node {Runnables}}
       child {node {Signals}}
       child {node {OS Events}}
       child {node {Stimuli}}
   ;
 \end{tikzpicture}
 \caption[Software model parts]{The software model represents the entities of
 an application that are executed by the \gls{os} and the hardware.}
 \label{fig:software_model}
\end{figure}

\textbf{Timing Models} describe the architecture and timing of an embedded
system.  Model based development is a software development paradigm where the
design of an application is created in form of a timing model.  This can be
done before the actual application software is implemented.  Based on the
timing model requirements and constraints can be specified and validated via
simulation.

Timing models can provide different levels of granularity depending on the use
case.  \gls{ta} uses the \glsdesc{rte} (\gls{rte}) model format which consists
of three parts as shown in \autoref{fig:rte_model}.

The hardware model includes the processor with all cores, quartzes, and memory
modules.  Quartzes are used as a clock source for cores and memory modules.
Memory modules can be connected with each other and to the processor cores to
represent the architecture of the real chip.  Vendor specific hardware models
are available for certain processor families for example, the Infineon Aurix
and the Freescale Matterhorn.

The \gls{os} model defines the scheduling policy for an application as well as
\gls{os} related timing overheads.  Implementation of service routines varies
depending on the \gls{os} vendor.  Consequently, the timing overhead resulting
from this routines is also different which makes it necessary to take their
runtime into account in order to get more accurate simulation results.  Vendor
specific \gls{os} models are available for certain \glspl{os} for example,
Elektrobit Autocore OS \cite{autocore}.

\begin{figure}[]
 \centering
 \begin{tikzpicture}[]
   \tikzstyle{level 1}=[sibling distance=20mm]
   \tikzstyle{level 2}=[sibling distance=20mm]
   \tikzstyle{level 3}=[sibling distance=30mm]

   \node{Application}
     child {node {Task}}
     child {node {Task}
       child {node {Runnable}}
       child {node {Runnable}
         child {node {Instructionblock}}
         child {node {Signal Read}}
         child {node {Instructionblock}}
         child {node {Signal Write}}
       }
       child {node {\lstinline{ActivateTask}}}
       child {node {\lstinline{TerminateTask}}}
     }
     child {node {Task}}
   ;
 \end{tikzpicture}
 \caption[Software model hierarchy]{The software model allows it to represent
 the runtime behavior of an application.  All relevant software entities are
 part of the system model and stand in relation to each other.  For example, a
 task can call a runnable which itself writes a signal value and runs for a
 certain amount of processor instructions which is represented by an
 instruction block.}
 \label{fig:software_hierarchy}
\end{figure}

The software model represents how hardware and \gls{os} are used by an
application.  Hardware and \gls{os} model remain the same for all tests and
only the software part is changed depending on the different test scenarios.
\autoref{fig:software_model} depicts the system entities that are part of the
software model.

Processes and runnables are ordered in a hierarchical structure as shown in
\autoref{fig:software_hierarchy}.  Processes can call system routines and
runnables, while runnables can access signals, request, and release semaphores
and execute instruction blocks.  The latter represents a certain number of clock
cycles required to execute a code section.  It is required to mimic the runtime
behavior of a real application.  The number of instructions taken by an
instruction block can be configured to be static or to vary depending on a
specific distribution, e.g., Weibull distribution.

Stimuli are used to activate process entities.  Similar to alarms they can
activate processes periodically or only once.  Additionally, it is possible to
trigger stimuli to represent more complex activation patterns for example,
arrival curves.  Since runtime and activation patterns based on random
distributions are tough to represent in C code instruction blocks and stimuli
with constant values are used for the test models.

\textbf{Code Generation} is used to create C code based on the timing model of
an application.  A template based model export was specified and implemented in
the context of this thesis.  The solution is already in production and allows
it to create C code and the corresponding \gls{oil} files automatically.

The idea is to iterate over all software entities and create the appropriate
code dependent on the entity type.  Transformation of most model entities is
straightforward.  Runnable calls map to function calls in C.  A signal read
access occurs if one signal is assigned to another variable. Accordingly, a
write access is represented by assigning a value to a signal.  Task, event, and
semaphore actions are created based on the respective \gls{osek} service
routines discussed in \autoref{section:osekvdxos}.

An instruction block is the only software model entity that cannot be mapped to
C code straightforwardly.  As discussed before, an instruction block represents
a certain amount of clock cycles required to execute a code section.  Normally,
this value is set based on measurement results or empirical values from other
applications.  For code generation it is necessary to create code whose
execution takes the same amount of clock cycles as specified in the model.

\begin{code}
\begin{lstlisting}[caption={[Instructionblock]
The function takes the specified amount of clock cycles to be executed.
This code is dependent on hardware and compiler in use and must
therefore be adapted to other platforms.},
label={listing:instructionblock}]
void executeInstructionsConst(int clockCycles) {
    int i;
    clockCycles /= 2;
    for (i = 0; i < clockCycles; i++) {
        __asm("nop");
        __asm("nop");
    }
}
\end{lstlisting}
\end{code}

The obvious way to do so is a for loop however, the exact code is dependent on
compiler and hardware.  \autoref{listing:instructionblock} shows the code
necessary to get the desired behavior for the hardware used in this thesis.  It
works because the Infineon Aurix processor family features zero overhead loops.
This means a for loop with one \lstinline{nop} instruction takes exactly one
clock cycle because loop condition check, loop incrementation, and loop content
are executed in parallel.

It is important to add multiple \lstinline{nop} instructions per loop cycle.
The Aurix trace device implements a compressed program flow trace. This means
trace messages are only created for certain function events.  Since the
\lstinline{loop} assembly instructions is one of the commands that cause the
creation of a trace message, a loop with a single \lstinline{nop} would cause
the trace buffer to overflow if the value of \lstinline{clockCycles} exceeds a
certain value.  By adding additional \lstinline{nop} commands less trace
messages are created per time unit and the function events can be transmitted
off-chip without overflowing.

\textbf{\glsdesc{ee}} is an \gls{osek} compliant real-time operating system.
It is free of charge and open-source which makes it an excellent choice for
this thesis.  Without access to the \gls{os} internal code creation of many
\gls{btf} events would not have been feasible.  The \gls{ee} software packet
contains the complete \gls{os} source code as well as RT-Druid, the code
generation tool to create \gls{os} specific source code from the \gls{oil}
file.  In this thesis the \glsdesc{ee} and RT-Druid 2.4.0 release is used.

\begin{code}
\begin{lstlisting}[caption={[\gls{ee} \gls{oil} config] Subset of the \gls{ee}
\gls{oil} \gls{os} attributes used for validation.  Attributes that are not
mentioned are set to the default value described in the \gls{ee} RT-Druid
reference manual.},
label={listing:oilconfig}]
EE_OPT = "EE_EXECUTE_FROM_RAM";
EE_OPT = "EE_ICACHE_ENABLED";
EE_OPT = "EE_DCACHE_ENABLED";
REMOTENOTIFICATION = USE_RPC;
CFLAGS = "-O2";
STATUS = EXTENDED;
ORTI_SECTIONS = ALL;
KERNEL_TYPE = ECC2;
COMPILER_TYPE = GNU;
\end{lstlisting}
\end{code}

\autoref{listing:oilconfig} shows the \gls{oil} attributes set for validation.
All attributes that are not mentioned take their default value as documented by
the RT-Druid reference manual \cite{rtdruidref}.  The test applications are
executed from RAM, instruction and data caching is enabled, and the
\lstinline{O2} optimization level is configured.  Inter-core communication is
implemented via remote procedure calls.  All \gls{orti} attributes and extended
error codes are logged by the \gls{os}.  The configuration is created in a way
that allows maximum traceability combined with decent performance.
Consequently, a similar configuration could also be used in a production
system.

The \textbf{Hightec Compiler} \cite{hightec} is used to compile the C code
generated by code generation and RT-Druid.  It is based on GCC and \gls{ee}
generates appropriate makefiles automatically if \lstinline{GNU} is set as
compiler.  For the tests Hightec Compiler v4.6.5.0 is used.

\textbf{TRACE32} \cite{trace32} is used as the hardware trace host software.
Configuration of this part of the test setup is the most complex.  Different
vendor specific properties, like the number of processor observation blocks,
must be taken into consideration to create a setup that produces optimal
results.  The used hardware and the corresponding configuration is discussed in
the next section.

\subsection{Hardware Setup}
\label{subsection:hardware_setup}

An \textbf{Infineon TriBoard TC298} evaluation board assembled with the
Infineon \textbf{SAK-TC298TF-128} microcontroller is used for evaluation.  This
board provides an \glsdesc{imds} together with an \glsdesc{agbt}.  According to
\autoref{tab:trace_tool_overview} and \autoref{tab:interfaces} this setup
allows for optimal trace performance.

\begin{code}
\begin{lstlisting}[caption={[\gls{ee} ECU config] \gls{ee} ECU config to
support the Infineon TC27x microcontroller family and the TC2X5 evaluation
board.  Source code changes are necessary to support the hardware used in this
thesis.},
label={listing:ecu_config}]
MCU_DATA = TRICORE {
  MODEL = TC27x;
};
BOARD_DATA = TRIBOARD_TC2X5;
\end{lstlisting}
\end{code}

\gls{ee} provides support for the Infineon TC27x processor family which can be
activated in the \gls{oil} file as shown in \autoref{listing:ecu_config}.
TC27x and TC29x are quite similar. Nevertheless, it is important to adapt the
configuration to the TC298TF processor.  This is done by changing the includes
in \lstinline{./cpu/tricore/inc/ee_tc_cpu.h} from
\lstinline{<tc27xa/Ifx_reg.h>} to \lstinline{<tc29xa/Ifx_reg.h>}.  The layout
of the evaluation board is the same.

Based on \lstinline{MCU_DATA} \gls{ee} configures the controller in the correct
way during system initialization.  The \gls{oil} \lstinline{CPU_CLOCK}
attribute can be used to set the desired CPU frequency.  The configuration done
by \gls{ee} is sufficient to put the controller into a usable state.  However,
there are problems regarding the frequency of the Multi-Core Debug System
($f_{mcds}$).  The TC298TF can run at a frequency up to \unit[300]{MHz}.
\gls{ee} does not configure the MCDS clock divisor at all and consequently
$f_{mcds}$ is equal to the system frequency.  However, the TC29xA user manual
states that the maximum allowed value for $f_{mcds}$ is \unit[160]{MHz}
\cite{tc29xa}.

\begin{figure}[]
 \centering
 \includegraphics[width=\textwidth]{./media/eval/clocks.png}
 \caption[Evaluation clock configuration]{Correct clock settings are essential
 to record valid hardware traces for the Infineon TC298TF microcontroller. The
 multi-core debug system frequency must be lower or equal to \unit[160]{MHz}
 and the ratio between CPU and MCDS frequency must be $1:1$.}
 \label{fig:clocks}
\end{figure}


Incorrect clock configuration may result in data and function events being
dropped randomly.  According to the manual it is necessary to set the
$f_{system}$ to $f_{mcds}$ ratio to $2:1$ to avoid this problem.  Despite using
the proclaimed configuration event dropping still occurred during the
validation.  After consultation with the hardware experts from Lauterbach GmbH
it turned out that a ratio of $1:1$ between system and MCDS clock is the only
way to guarantee the reception of all trace events.  Thus, the \gls{ee} clock
configuration must not be changed, but the system frequency must be smaller or
equal to \unit[160]{MHz}.  \autoref{fig:clocks} shows a configuration with a
system frequency of \unit[100]{MHz} as used in this thesis.

The \textbf{PowerTrace II} by Lauterbach is used for trace recording.  \gls{ee}
creates so called Lauterbach PRACTICE Scripts \cite{cmmref} also called cmm
scripts during the compilation process.  These scripts can be used to operate
the TRACE32 software automatically.  The generated scripts by \gls{ee} are
inadequate for the requirements in this thesis.  Thus, it is necessary to
improve the scripts in a way that allows continues data and function trace as
shown in \autoref{listing:trace32_config}.

\begin{code}
\begin{lstlisting}[caption={[TRACE32 config]
Script to configure TRACE32 and the on-chip trace device.  The setup allows for
continues function and data trace.},
label={listing:trace32_config}]
SYStem.CPU TC298TF
trace.method.analyzer
trace.mode.stream

mcds.source.cpumux0.program on
mcds.source.cpumux0.readaddr on
mcds.source.cpumux0.writeaddr on
mcds.source.cpumux0.writedata on

break.set symbol.begin(foo)--symbol.end(bar) /r /w /tracedata

Go
wait 1.s
break

printer.filetype csv
printer.file data.csv
winprint.trace.findall , cycle readwrite /list %timefixed \
                       ti.zero varsymbol cycle data

trace.export.csvfunc func.csv
\end{lstlisting}
\end{code}

Firstly, it is necessary to select the correct CPU (line 1).  This is
important because otherwise the TRACE32 trace decoder is not able to interpret
the hardware trace events in the correct way.  The trace method
\lstinline{analyzer} is required for real-time tracing and trace mode
\lstinline{stream} means that the trace data is sent to the host computer in a
continuous way (lines 2 and 3).

Next, the processor and bus observation blocks are configured to detect all
function and data events (lines 5-9).  This is done via the multi-core debug
system.  Setting the \lstinline{program} attribute to \lstinline{on} activates
the function trace.  The other three attributes are necessary to record all
data events.

A complete data trace may still overexert the bandwidth of the setup.  Via
\lstinline{break.set} filters as described in
\autoref{section:trace_measurement} can be created (line 10).  The trace device
is configured to record data read and write events for all variables in the
memory range defined by \lstinline{symbol.begin(foo)--symbol.end(bar)}.  Here
\lstinline{foo} is a variable that has a lower address than the variable
\lstinline{bar}.  Using the configuration described in this section, the
compiler allocates the array \lstinline{EE_as_rpc_services_table} at the
beginning of the \gls{os} memory section and \lstinline{EE_th_status} at the
end.  So those two variables provide a convenient boundary to detect all
\gls{os} data events of interest.

Trace recording is started via the \lstinline{Go} command (lines 12-14).  The
\lstinline{wait} command waits for an eligible amount of time and recording is
stopped by the \lstinline{break} command.

Now the data and function traces can be exported (lines 16-21).  For the data
export it is first necessary to configure the desired output file type
(\lstinline{csv}) and output filename (\lstinline{data.cvs}).  Via the
\lstinline{winprint} command the data export process is started and
\lstinline{trace.export.csvfunc} exports the function trace.

TRACE32 creates multiple graphical user interfaces one for each core of the
target platform.  Accordingly, the export commands must be executed for each
core or in other words for each GUI\@.  The resulting files
\lstinline{data.csv} and \lstinline{func.csv} contain one event per line.  The
following listing shows a data event.

\begin{lstlisting}
-0083448136,0.0004372600,"EE_ORTI_servicetrace","wr-data",43
\end{lstlisting}

A Lauterbach data event consists of five comma separated fields.  In
\autoref{eq:data_event} the elements of a data event are defined.  The second
field is the timestamp $t_i$ in seconds, the third field is the name of the
accessed variable $\pi_i$, the fourth field specifies in which way $a_i$ the
variable is accessed (a data write in this case), and the fifth field contains
the value of the data access event $v_i$.  Since one trace data trace file is
exported per core, the core name $c_i$ is the same for all events from one
file.  Accordingly, the next listing shows a Lauterbach function event
consisting of three fields.

\begin{lstlisting}
+437050; EE_as_StartCore; fentry
\end{lstlisting}

In \autoref{eq:function_event} the elements of a function event are defined.
Analogous to data events, the core name $c_j$ is the same for all events within
a file.  The first field maps to the timestamp $t_j$, the second field is the
name of the function $\pi_j$ that is affected by the event, and the third field
indicates the way $\theta_j$ in which the function is affected.


\subsection{Validation Techniques}
\label{subsection:validation_techniques}

Traces can differ in two ways.  A temporal difference exits for two traces
$B^1$ and $B^2$ with the same length $n$ if there is at least one event pair
with the index $i \in (1,2,\dots,n)$ for which $t^1_i \neq t^2_i$.  As
discussed before, the \gls{ta} Simulator is capable of taking hardware and
\gls{os} specific behavior into account.  Nevertheless, simulating a trace for
which all timestamps are equal to the corresponding hardware trace is not
feasible by definition \cite{balci1995principles}.

This problem is bypassed in two steps.  At first the general accuracy of the
trace setup is validated by tracing events whose timing characteristics are
precisely known in advance.  Secondly, for the actual test models, a
plausibility test based on certain metrics such as task activate-to-active and
task response time is conducted.

The second way in which two traces can differ is called semantic difference.
It exists for two traces $B^1$ and $B^2$ with the same length $n$ if there is
an event pair with the index $i \in (1,2,\dots,n)$ for that at least one of the
following cases is true: source or target entity differ ($\Psi^1_i \neq
\Psi^2_i \vee T^1_i \neq T^2_i$), source or target instance differ ($\psi^1_i
\neq \psi^2_i \vee \tau^1_i \neq \tau^2_i$), target type differs ($\iota^1_i
\neq \iota^2_i$), event action differs ($\alpha^1_i \neq \alpha^2_i$), or note
differs ($\nu^1_i \neq \nu^2_i$).

If two traces $B^1$ and $B^2$ have a different length $|B^1| \neq |B^2|$ they
also differ semantically.  Assuming the trace and simulation setup is correct
a difference in length can have two reasons:  either the trace times differ or
one trace includes entities that do not occur in the other trace.  In the
former case, the disparity can be fixed by removing the events at the end of
the longer trace until both traces have the same length.  In the latter case,
events for entities that are not contained in both traces may be removed in
order to achieve semantic equality.