\section{Evaluation Test Bench} \label{section:evaluation_test_bench} To make the results of the validation process comprehensible and reproducible for others it is important to document the hardware and software setup, the configuration of all tools in use, as well as the ways in which the traces are compared. \subsection{Software Setup} \textbf{Simulation} is used to validate the \gls{btf} traces obtained from hardware via tracing and transformation. It allows analyzing of embedded real-time systems by generating an event trace. A simulation is easy to configure and executable without hardware. This is an advantage in the early design stages of an application when the final target platform is not yet defined. Advanced simulation tools allow it to take platform dependent timing behavior into account. It is possible to select the \gls{os} and processor platform in use. Therefore, more accurate simulation results can be achieved. For example, memory access times \cite{christianmaster} and timing overheads caused by \gls{os} service routines \cite{maxmaster} can be taken into consideration. \glsdesc{ta} provides the simulation software used for validation \cite{tasimulator}. The \gls{ta} Simulator is based on a discrete-event system simulation approach \cite{cassandras2008introduction, banks2000dm}. It has already been used successfully in research projects to evaluate scheduling algorithms in multi-core systems \cite{deubzer2011robust}, to examine synchronization protocols \cite{alfranseder2013modified}, and to validate optimization algorithms for embedded applications \cite{stefanmaster}. In this thesis version {15.02.1} of the \gls{ta} Simulator is in use. \begin{figure}[] \centering \begin{tikzpicture}[] \tikzstyle{level 1}=[sibling distance=30mm] \tikzstyle{level 2}=[sibling distance=40mm] \node{RTE Model} child {node {Hardware Model}} child {node {OS Model}} child {node {Software Model}} ; \end{tikzpicture} \caption[RTE model parts]{A \gls{rte} model consists of a hardware, an \gls{os} and a software part.} \label{fig:rte_model} \end{figure} \begin{figure}[] \centering \begin{tikzpicture}[] \tikzstyle{level 1}=[sibling distance=20mm] \tikzstyle{level 2}=[sibling distance=20mm] \node{Software Model} child {node {Processes}} child {node {Runnables}} child {node {Signals}} child {node {OS Events}} child {node {Stimuli}} ; \end{tikzpicture} \caption[Software model parts]{The software model represents the entities of an application that are executed by the \gls{os} and the hardware.} \label{fig:software_model} \end{figure} \textbf{Timing Models} describe the architecture and timing of an embedded system. Model based development is a software development paradigm where the design of an application is created in form of a timing model. This can be done before the actual application software is implemented. Based on the timing model requirements and constraints can be specified and validated via simulation. Timing models can provide different levels of granularity depending on the use case. \gls{ta} uses the \glsdesc{rte} (\gls{rte}) model format which consists of three parts as shown in \autoref{fig:rte_model}. The hardware model includes the processor with all cores, quartzes, and memory modules. Quartzes are used as a clock source for cores and memory modules. Memory modules can be connected with each other and to the processor cores to represent the architecture of the real chip. Vendor specific hardware models are available for certain processor families for example, the Infineon Aurix and the Freescale Matterhorn. The \gls{os} model defines the scheduling policy for an application as well as \gls{os} related timing overheads. Implementation of service routines varies depending on the \gls{os} vendor. Consequently, the timing overhead resulting from this routines is also different which makes it necessary to take their runtime into account in order to get more accurate simulation results. Vendor specific \gls{os} models are available for certain \glspl{os} for example, Elektrobit Autocore OS \cite{autocore}. \begin{figure}[] \centering \begin{tikzpicture}[] \tikzstyle{level 1}=[sibling distance=20mm] \tikzstyle{level 2}=[sibling distance=20mm] \tikzstyle{level 3}=[sibling distance=30mm] \node{Application} child {node {Task}} child {node {Task} child {node {Runnable}} child {node {Runnable} child {node {Instructionblock}} child {node {Signal Read}} child {node {Instructionblock}} child {node {Signal Write}} } child {node {\lstinline{ActivateTask}}} child {node {\lstinline{TerminateTask}}} } child {node {Task}} ; \end{tikzpicture} \caption[Software model hierarchy]{The software model allows it to represent the runtime behavior of an application. All relevant software entities are part of the system model and stand in relation to each other. For example, a task can call a runnable which itself writes a signal value and runs for a certain amount of processor instructions which is represented by an instruction block.} \label{fig:software_hierarchy} \end{figure} The software model represents how hardware and \gls{os} are used by an application. Hardware and \gls{os} model remain the same for all tests and only the software part is changed depending on the different test scenarios. \autoref{fig:software_model} depicts the system entities that are part of the software model. Processes and runnables are ordered in a hierarchical structure as shown in \autoref{fig:software_hierarchy}. Processes can call system routines and runnables, while runnables can access signals, request, and release semaphores and execute instruction blocks. The latter represents a certain number of clock cycles required to execute a code section. It is required to mimic the runtime behavior of a real application. The number of instructions taken by an instruction block can be configured to be static or to vary depending on a specific distribution, e.g., Weibull distribution. Stimuli are used to activate process entities. Similar to alarms they can activate processes periodically or only once. Additionally, it is possible to trigger stimuli to represent more complex activation patterns for example, arrival curves. Since runtime and activation patterns based on random distributions are tough to represent in C code instruction blocks and stimuli with constant values are used for the test models. \textbf{Code Generation} is used to create C code based on the timing model of an application. A template based model export was specified and implemented in the context of this thesis. The solution is already in production and allows it to create C code and the corresponding \gls{oil} files automatically. The idea is to iterate over all software entities and create the appropriate code dependent on the entity type. Transformation of most model entities is straightforward. Runnable calls map to function calls in C. A signal read access occurs if one signal is assigned to another variable. Accordingly, a write access is represented by assigning a value to a signal. Task, event, and semaphore actions are created based on the respective \gls{osek} service routines discussed in \autoref{section:osekvdxos}. An instruction block is the only software model entity that cannot be mapped to C code straightforwardly. As discussed before, an instruction block represents a certain amount of clock cycles required to execute a code section. Normally, this value is set based on measurement results or empirical values from other applications. For code generation it is necessary to create code whose execution takes the same amount of clock cycles as specified in the model. \begin{code} \begin{lstlisting}[caption={[Instructionblock] The function takes the specified amount of clock cycles to be executed. This code is dependent on hardware and compiler in use and must therefore be adapted to other platforms.}, label={listing:instructionblock}] void executeInstructionsConst(int clockCycles) { int i; clockCycles /= 2; for (i = 0; i < clockCycles; i++) { __asm("nop"); __asm("nop"); } } \end{lstlisting} \end{code} The obvious way to do so is a for loop however, the exact code is dependent on compiler and hardware. \autoref{listing:instructionblock} shows the code necessary to get the desired behavior for the hardware used in this thesis. It works because the Infineon Aurix processor family features zero overhead loops. This means a for loop with one \lstinline{nop} instruction takes exactly one clock cycle because loop condition check, loop incrementation, and loop content are executed in parallel. It is important to add multiple \lstinline{nop} instructions per loop cycle. The Aurix trace device implements a compressed program flow trace. This means trace messages are only created for certain function events. Since the \lstinline{loop} assembly instructions is one of the commands that cause the creation of a trace message, a loop with a single \lstinline{nop} would cause the trace buffer to overflow if the value of \lstinline{clockCycles} exceeds a certain value. By adding additional \lstinline{nop} commands less trace messages are created per time unit and the function events can be transmitted off-chip without overflowing. \textbf{\glsdesc{ee}} is an \gls{osek} compliant real-time operating system. It is free of charge and open-source which makes it an excellent choice for this thesis. Without access to the \gls{os} internal code creation of many \gls{btf} events would not have been feasible. The \gls{ee} software packet contains the complete \gls{os} source code as well as RT-Druid, the code generation tool to create \gls{os} specific source code from the \gls{oil} file. In this thesis the \glsdesc{ee} and RT-Druid 2.4.0 release is used. \begin{code} \begin{lstlisting}[caption={[\gls{ee} \gls{oil} config] Subset of the \gls{ee} \gls{oil} \gls{os} attributes used for validation. Attributes that are not mentioned are set to the default value described in the \gls{ee} RT-Druid reference manual.}, label={listing:oilconfig}] EE_OPT = "EE_EXECUTE_FROM_RAM"; EE_OPT = "EE_ICACHE_ENABLED"; EE_OPT = "EE_DCACHE_ENABLED"; REMOTENOTIFICATION = USE_RPC; CFLAGS = "-O2"; STATUS = EXTENDED; ORTI_SECTIONS = ALL; KERNEL_TYPE = ECC2; COMPILER_TYPE = GNU; \end{lstlisting} \end{code} \autoref{listing:oilconfig} shows the \gls{oil} attributes set for validation. All attributes that are not mentioned take their default value as documented by the RT-Druid reference manual \cite{rtdruidref}. The test applications are executed from RAM, instruction and data caching is enabled, and the \lstinline{O2} optimization level is configured. Inter-core communication is implemented via remote procedure calls. All \gls{orti} attributes and extended error codes are logged by the \gls{os}. The configuration is created in a way that allows maximum traceability combined with decent performance. Consequently, a similar configuration could also be used in a production system. The \textbf{Hightec Compiler} \cite{hightec} is used to compile the C code generated by code generation and RT-Druid. It is based on GCC and \gls{ee} generates appropriate makefiles automatically if \lstinline{GNU} is set as compiler. For the tests Hightec Compiler v4.6.5.0 is used. \textbf{TRACE32} \cite{trace32} is used as the hardware trace host software. Configuration of this part of the test setup is the most complex. Different vendor specific properties, like the number of processor observation blocks, must be taken into consideration to create a setup that produces optimal results. The used hardware and the corresponding configuration is discussed in the next section. \subsection{Hardware Setup} \label{subsection:hardware_setup} An \textbf{Infineon TriBoard TC298} evaluation board assembled with the Infineon \textbf{SAK-TC298TF-128} microcontroller is used for evaluation. This board provides an \glsdesc{imds} together with an \glsdesc{agbt}. According to \autoref{tab:trace_tool_overview} and \autoref{tab:interfaces} this setup allows for optimal trace performance. \begin{code} \begin{lstlisting}[caption={[\gls{ee} ECU config] \gls{ee} ECU config to support the Infineon TC27x microcontroller family and the TC2X5 evaluation board. Source code changes are necessary to support the hardware used in this thesis.}, label={listing:ecu_config}] MCU_DATA = TRICORE { MODEL = TC27x; }; BOARD_DATA = TRIBOARD_TC2X5; \end{lstlisting} \end{code} \gls{ee} provides support for the Infineon TC27x processor family which can be activated in the \gls{oil} file as shown in \autoref{listing:ecu_config}. TC27x and TC29x are quite similar. Nevertheless, it is important to adapt the configuration to the TC298TF processor. This is done by changing the includes in \lstinline{./cpu/tricore/inc/ee_tc_cpu.h} from \lstinline{} to \lstinline{}. The layout of the evaluation board is the same. Based on \lstinline{MCU_DATA} \gls{ee} configures the controller in the correct way during system initialization. The \gls{oil} \lstinline{CPU_CLOCK} attribute can be used to set the desired CPU frequency. The configuration done by \gls{ee} is sufficient to put the controller into a usable state. However, there are problems regarding the frequency of the Multi-Core Debug System ($f_{mcds}$). The TC298TF can run at a frequency up to \unit[300]{MHz}. \gls{ee} does not configure the MCDS clock divisor at all and consequently $f_{mcds}$ is equal to the system frequency. However, the TC29xA user manual states that the maximum allowed value for $f_{mcds}$ is \unit[160]{MHz} \cite{tc29xa}. \begin{figure}[] \centering \includegraphics[width=\textwidth]{./media/eval/clocks.png} \caption[Evaluation clock configuration]{Correct clock settings are essential to record valid hardware traces for the Infineon TC298TF microcontroller. The multi-core debug system frequency must be lower or equal to \unit[160]{MHz} and the ratio between CPU and MCDS frequency must be $1:1$.} \label{fig:clocks} \end{figure} Incorrect clock configuration may result in data and function events being dropped randomly. According to the manual it is necessary to set the $f_{system}$ to $f_{mcds}$ ratio to $2:1$ to avoid this problem. Despite using the proclaimed configuration event dropping still occurred during the validation. After consultation with the hardware experts from Lauterbach GmbH it turned out that a ratio of $1:1$ between system and MCDS clock is the only way to guarantee the reception of all trace events. Thus, the \gls{ee} clock configuration must not be changed, but the system frequency must be smaller or equal to \unit[160]{MHz}. \autoref{fig:clocks} shows a configuration with a system frequency of \unit[100]{MHz} as used in this thesis. The \textbf{PowerTrace II} by Lauterbach is used for trace recording. \gls{ee} creates so called Lauterbach PRACTICE Scripts \cite{cmmref} also called cmm scripts during the compilation process. These scripts can be used to operate the TRACE32 software automatically. The generated scripts by \gls{ee} are inadequate for the requirements in this thesis. Thus, it is necessary to improve the scripts in a way that allows continues data and function trace as shown in \autoref{listing:trace32_config}. \begin{code} \begin{lstlisting}[caption={[TRACE32 config] Script to configure TRACE32 and the on-chip trace device. The setup allows for continues function and data trace.}, label={listing:trace32_config}] SYStem.CPU TC298TF trace.method.analyzer trace.mode.stream mcds.source.cpumux0.program on mcds.source.cpumux0.readaddr on mcds.source.cpumux0.writeaddr on mcds.source.cpumux0.writedata on break.set symbol.begin(foo)--symbol.end(bar) /r /w /tracedata Go wait 1.s break printer.filetype csv printer.file data.csv winprint.trace.findall , cycle readwrite /list %timefixed \ ti.zero varsymbol cycle data trace.export.csvfunc func.csv \end{lstlisting} \end{code} Firstly, it is necessary to select the correct CPU (line 1). This is important because otherwise the TRACE32 trace decoder is not able to interpret the hardware trace events in the correct way. The trace method \lstinline{analyzer} is required for real-time tracing and trace mode \lstinline{stream} means that the trace data is sent to the host computer in a continuous way (lines 2 and 3). Next, the processor and bus observation blocks are configured to detect all function and data events (lines 5-9). This is done via the multi-core debug system. Setting the \lstinline{program} attribute to \lstinline{on} activates the function trace. The other three attributes are necessary to record all data events. A complete data trace may still overexert the bandwidth of the setup. Via \lstinline{break.set} filters as described in \autoref{section:trace_measurement} can be created (line 10). The trace device is configured to record data read and write events for all variables in the memory range defined by \lstinline{symbol.begin(foo)--symbol.end(bar)}. Here \lstinline{foo} is a variable that has a lower address than the variable \lstinline{bar}. Using the configuration described in this section, the compiler allocates the array \lstinline{EE_as_rpc_services_table} at the beginning of the \gls{os} memory section and \lstinline{EE_th_status} at the end. So those two variables provide a convenient boundary to detect all \gls{os} data events of interest. Trace recording is started via the \lstinline{Go} command (lines 12-14). The \lstinline{wait} command waits for an eligible amount of time and recording is stopped by the \lstinline{break} command. Now the data and function traces can be exported (lines 16-21). For the data export it is first necessary to configure the desired output file type (\lstinline{csv}) and output filename (\lstinline{data.cvs}). Via the \lstinline{winprint} command the data export process is started and \lstinline{trace.export.csvfunc} exports the function trace. TRACE32 creates multiple graphical user interfaces one for each core of the target platform. Accordingly, the export commands must be executed for each core or in other words for each GUI\@. The resulting files \lstinline{data.csv} and \lstinline{func.csv} contain one event per line. The following listing shows a data event. \begin{lstlisting} -0083448136,0.0004372600,"EE_ORTI_servicetrace","wr-data",43 \end{lstlisting} A Lauterbach data event consists of five comma separated fields. In \autoref{eq:data_event} the elements of a data event are defined. The second field is the timestamp $t_i$ in seconds, the third field is the name of the accessed variable $\pi_i$, the fourth field specifies in which way $a_i$ the variable is accessed (a data write in this case), and the fifth field contains the value of the data access event $v_i$. Since one trace data trace file is exported per core, the core name $c_i$ is the same for all events from one file. Accordingly, the next listing shows a Lauterbach function event consisting of three fields. \begin{lstlisting} +437050; EE_as_StartCore; fentry \end{lstlisting} In \autoref{eq:function_event} the elements of a function event are defined. Analogous to data events, the core name $c_j$ is the same for all events within a file. The first field maps to the timestamp $t_j$, the second field is the name of the function $\pi_j$ that is affected by the event, and the third field indicates the way $\theta_j$ in which the function is affected. \subsection{Validation Techniques} \label{subsection:validation_techniques} Traces can differ in two ways. A temporal difference exits for two traces $B^1$ and $B^2$ with the same length $n$ if there is at least one event pair with the index $i \in (1,2,\dots,n)$ for which $t^1_i \neq t^2_i$. As discussed before, the \gls{ta} Simulator is capable of taking hardware and \gls{os} specific behavior into account. Nevertheless, simulating a trace for which all timestamps are equal to the corresponding hardware trace is not feasible by definition \cite{balci1995principles}. This problem is bypassed in two steps. At first the general accuracy of the trace setup is validated by tracing events whose timing characteristics are precisely known in advance. Secondly, for the actual test models, a plausibility test based on certain metrics such as task activate-to-active and task response time is conducted. The second way in which two traces can differ is called semantic difference. It exists for two traces $B^1$ and $B^2$ with the same length $n$ if there is an event pair with the index $i \in (1,2,\dots,n)$ for that at least one of the following cases is true: source or target entity differ ($\Psi^1_i \neq \Psi^2_i \vee T^1_i \neq T^2_i$), source or target instance differ ($\psi^1_i \neq \psi^2_i \vee \tau^1_i \neq \tau^2_i$), target type differs ($\iota^1_i \neq \iota^2_i$), event action differs ($\alpha^1_i \neq \alpha^2_i$), or note differs ($\nu^1_i \neq \nu^2_i$). If two traces $B^1$ and $B^2$ have a different length $|B^1| \neq |B^2|$ they also differ semantically. Assuming the trace and simulation setup is correct a difference in length can have two reasons: either the trace times differ or one trace includes entities that do not occur in the other trace. In the former case, the disparity can be fixed by removing the events at the end of the longer trace until both traces have the same length. In the latter case, events for entities that are not contained in both traces may be removed in order to achieve semantic equality.