MT/content/tests.tex

\section{Test Cases}

As discussed in the previous section traces can differ in a temporal and in a
semantic way.  To exclude the appearance of temporal discrepancies due to a
wrong trace setup, the timing accuracy is tested based on code with known
event-to-event durations.  Next, the semantic correctness of the trace mapping
is validated based on manually created test models.  Finally, randomized models
are generated in order to detect semantic errors that may not be detected by
the manually created models due to selection bias \cite{geddes1990cases}.


\subsection{Timing Precision}

In \autoref{listing:instructionblock} code to execute a fixed number of
instructions is introduced.  This code is now used to evaluate the timing
precision of the trace setup.  According to
\autoref{subsection:hardware_tracing} the setup should allow for cycle accurate
trace measurement.

The Infineon Aurix processor family provides performance counters
\cite{tc29xa}.  Once started, these counters are incremented based on the CPU
core frequency. A frequency of \unit[100]{MHz} is used for the validation,
consequently an increment occurs every \unit[10]{ns}.  The counter can be
started at an arbitrary point in time for example, at program start.  By
reading the counter value at the beginning and at the end of a critical
section the clock cycles that expired between these two points can be
determined.

\begin{code}
\begin{lstlisting}[caption={[Trace setup accuracy validation]
Code to validate the timing precision of the trace setup.},
label={listing:accuracy_validation}]
EE_UINT32 i;
EE_UINT32 ccntStart;
EE_UINT32 ccntEnd;
EE_UINT32 n = N / 4;

__asm("nop");
ccntStart = EE_tc_get_CCNT();
__asm("nop");
for (i = 0; i < n; i++) {%
  __asm("nop");
  __asm("nop");
  __asm("nop");
  __asm("nop");
}
__asm("nop");
ccntEnd = EE_tc_get_CCNT();
\end{lstlisting}
\end{code}

\autoref{listing:accuracy_validation} shows the code that is used to check the
timing precision.  \gls{ee} provides the API function
\lstinline{EE_tc_get_CCNT} to read out the performance counter register.  As
described above, the performance counters are read out before and after the
critical section.

The critical section is guarded with two additional \lstinline{nop} assembly
instruction to avoid compiler optimization.  Additionally, the generated
assembly code was examined manually to verify that no unwanted instructions
were added by the compiler.  A for loop is used to execute a predefined number
of instructions.  The number of repetitions is depended on the define
\lstinline{N} which should be a multiple of four.

The code is now executed for different values of \lstinline{N}.  For each event
the expected number of clock cycles $c_e$, the actual number of clock cycles
$c_a$, the expected time difference $t_e$ in nanoseconds, and the actual time
difference $t_a$ in nanoseconds between the writes to \lstinline{ccntStart} and
\lstinline{ccntEnd} are listed in \autoref{tab:precision_validation}.

The expected number of clock cycles is calculated by $c_e = N + 2$.  The value
two is added because of the additional \lstinline{nop} instructions.  The
expected time is calculated by $t_e = c_e * \frac{1}{f}$ where $f$ is the
processor frequency.

The actual number of clock cycles is calculated by $c_a = ccntEnd - ccntStart$.
The actual time is calculated by $t_a = t_j - t_i$ where $j$ is the index of
the write event to \lstinline{ccntEnd} and $i$ is the index of the write event
to \lstinline{ccntStart}.

Four different values for \lstinline{N}, $128$, $1024$, $4096$, and $65536$
are chosen and for each value $101$ measurement samples are taken.  The
results for all samples with the same value of \lstinline{N} are equal.  It can
be observed that for all values of \lstinline{N} the execution of the critical
section takes four ticks more than the expected value $e_c$.  This is because
the additional instruction executed by the second call to
\lstinline{EE_tc_get_CCNT} are not taken into consideration.

Consequently, the expected and the actual execution time differ by
\unit[40]{ns}.  Besides this differences, the result is as expected and the
conclusion that the setup is in fact able to measure hardware events on a
cycle accurate basis can be drawn.

\begin{table}[]
  \centering
  \begin{tabular}{c|c c c c}
    N            & 128    & 1024   & 4096   & 65536 \\
    \hline
    $c_e\, [1]$  & 130    & 1026   & 4098   & 65538   \\
    $c_a\, [1]$  & 134    & 1030   & 4102   & 65542   \\
    $t_e\, [us]$ & 1.300  & 10.260 & 40.980 & 655.380 \\
    $t_a\, [us]$ & 1.340  & 10.300 & 41.020 & 655.420 \\
    samples      & 101    & 101    & 101    & 101     \\
  \end{tabular}
  \caption[Trace setup measurement precision]{Experiment to validate the
  accuracy of the trace setup.  A code snippet that takes a known number of
  instructions $c_e$ is executed.  Based on the number of instructions the
  expected execution time $t_e$ can be calculated.  If cycle accurate
  measurement is supported, the actual execution time $t_a$ should be equal to
  $t_e$.  The execution times differ by \unit[40]{ns} because the expected
  number of instructions is off by four cycles.  If this deviation is taken
  into consideration $t_e$ and $t_a$ coincide.}
  \label{tab:precision_validation}
\end{table}


\subsection{Systematic Tests}
\label{subsection:systematic_tests}

In this section test models are created systematically to validate the complete
software to \gls{btf} event mapping discussed in \autoref{chapter:mapping}.
For each test application a simulated and a hardware based \gls{btf} trace is
generated as shown in \autoref{fig:eval_idea}.  The traces are then compared in
three steps.

\begin{itemize}
  \item A basic plausibility test based on the Gantt chart of the TA Tool Suite
  is conducted.
  \item The semantic equality is validated.
  \item Different real-time metrics are compared and discussed.
\end{itemize}

Five test models as shown in the following list are required to cover all
\gls{btf} actions for which a mapping has been provided.

\begin{itemize}
  \item task-runnable-signal
  \item task-event
  \item task-resource-release-parking
  \item task-resource-poll-parking
  \item task-MTA
\end{itemize}

Each model represents a periodic system where a defined sequence of events is
executed every \unit[10]{ms}.  UML sequence diagrams \cite{fowler2004uml} are
used to illustrate the behavior of the test applications during one period.


\subsubsection{Task-Runnable-Signal Test}

\begin{figure}[]
 \centering
 \centerline{\includegraphics[width=\textwidth]{./media/eval/task_runnable_signal.pdf}}
 \caption[Task-runnable-signal test sequence]{Test application to validate
 basic task and signal read and write events.}
 \label{fig:task_runnable_signal}
\end{figure}

The task-runnable-signal application is depicted in
\autoref{fig:task_runnable_signal}.  Task \lstinline{T_1} is activated
periodically by the stimulus \lstinline{STI_T_1} every \unit[10]{ms}.
\lstinline{T_1} activates \lstinline{T_2} on another core via \gls{ipa} and
then executes runnable \lstinline{R_1}.  \lstinline{T_2} executes a runnable
\lstinline{R_2_1} which executes another runnable \lstinline{R_2_2}.  Once
execution of \lstinline{R_1} has finished, \lstinline{T_1} activates another
task \lstinline{T_3} on the second core which has a higher priority then
\lstinline{T_2}.  Consequently, \lstinline{T_2}, \lstinline{R_2_1}, and
\lstinline{R_2_2} are preempted as indicated by the light green and light blue
colors.  \lstinline{T_3} calls a runnable \lstinline{R_3}.  The runnables
\lstinline{R_1} and \lstinline{R_3} both read and write the signal
\lstinline{SIG_1}.  Once \lstinline{T_3} has terminate, \lstinline{T_2} and the
corresponding runnables resume execution.  The purpose of this test application
is to cover the following \gls{btf} actions:

\begin{itemize}
  \item Stimulus: trigger by alarm and \gls{ipa}
  \item Task: activate, start, preempt, resume, terminate
  \item ISR: activate, start, terminate
  \item Runnable: start, resume, suspend, terminate
  \item Signal: read, write
\end{itemize}

\begin{figure}[]
 \centering
 \centerline{\includegraphics[width=\textwidth]{./media/eval/task_runnable_signal.png}}
 \caption[Task-runnable-signal test gantt chart]{Hardware and software trace
 for the task-runnable-signal test model.  Attention must be directed to the
 signal read and write accesses to \lstinline{SIG_1}.  Additionally, the nested
 runnables must be suspended when the respective task \lstinline{T_2} is
 preempted.}
 \label{fig:task_runnable_signal_gantt}
\end{figure}

Based on the Gantt chart of the TA Tool Suite the \gls{btf} trace can be
compared visually.  The hardware trace is shown in the upper part and the
simulated trace in the lower part of each picture.  Both traces use the same
time scale so that semantic and temporal comparison is feasible.

\autoref{fig:task_runnable_signal_gantt} shows one period of the
task-runnable-signal test application in the Gantt chart of the \gls{ta} Tool
Suite.  The figure depicts that \lstinline{R_2_2} is called from the context of
\lstinline{R_2_1}.  When \lstinline{T_2} is preempted, both runnables must be
suspended too, indicated by the light blue color in contrast to the stronger
blue when a runnable is running.  Runnable entities are not shown in the traces
for the other test models for clarity.  A running task is colored in dark
green, while preempted tasks are shown in light green.

A separate row in the Gantt chart is used to depict signal accesses from the
context of tasks.  Whenever a horizontal line is drawn the corresponding signal
is read or written.  The former is indicated by an arrow pointing up at the
bottom of the row.  The latter is indicated by an arrow pointing down at the
top of the row.  It can be seen that the signal accesses are recorded on
hardware as expected.

The hardware trace shows two additional \glspl{isr} that are not part of the
simulation trace.  \lstinline{EE_tc_system_timer_handler} is a timer interrupt
which is executed every \unit[1]{ms} and serves as clock source for the system
counter. \lstinline{EE_TC_iirq_handler} is used for remote procedure calls.

Two traces can not be semantically identical if entities exist in one trace
that are not part of the other trace.  There are two ways two solve this
problem.  Either the \glspl{isr} are added to the system model and therefore
considered during simulation or all \gls{btf} events related to the
\glspl{isr} are removed from the hardware trace.

A script that checks the semantic equality of two traces based on the criteria
established in \autoref{subsection:validation_techniques} is used for the
second validation step.  However, semantic equality could not be shown for the
test cases in this and the next section.  The reason for this is discussed in
\autoref{subsection:randomized_tests}.

The TA Inspector is capable of calculating a variety of real-time metrics based
on \gls{btf} traces.  Selected metrics are shown to discuss the similarities
and discrepancies between hardware and simulation trace.  Common metric types
are activate-to-activate (A2A), response time (RT), net execution time (NET),
and CPU core load.  The upper part of each metric table shows the hardware
trace metrics abbreviated by \emph{HW} and the lower part shows the
simulation trace metrics abbreviated by \emph{Sim}.

\begin{table}[]
  \centering
  \begin{tabular}{c c|c c c c}
          &       &  A2A $[ms]$ & RT $[ms]$ & Load Core\_1 $[\%]$ & Load Core\_2 $[\%]$ \\
    \hline
          & T\_1	& 10.005998	& 3.025510	& 30.124423	& 0.000000	\\
    HW    & T\_2	& 10.005990	& 6.516440	& 0.000000	& 49.950032	\\
          & T\_3	& 10.005987	& 1.506300	& 0.000000	& 15.000495	\\
    \hline
          & Sum   & -         & -         & 30.12     & 64.95     \\
          &&&&& \\
          & T\_1  &	10.000000	& 3.000100	& 30.000000	& 0.000000  \\
    Sim   & T\_2  &	10.000000	& 6.500200	& 0.000000	& 50.000000 \\
          & T\_3  &	10.000000	& 1.500100	& 0.000000	& 15.000000 \\
    \hline
          & Sum   & -         & -         & 30.00     & 65.00     \\
  \end{tabular}
  \caption[Task-runnable-signal metrics table]{Metrics of the
  task-runnable-signal test application.  Activation-to-activation (A2A) and
  response time (RT) are average values calculated over all instances of the
  respective entity.}
  \label{tab:task_runnable_signal}
\end{table}

\autoref{tab:task_runnable_signal} shows selected real-time metrics for the
task-runnable-signal application.  In the first approximation all values seem
identical so the basic configuration of the complete setup is likely to be
correct.  Nevertheless, the activate-to-activate times between hardware and
simulation differ by almost \unit[6]{us} which is non-negligible.

The reason for this deviation can be found by examining the
activate-to-activate times of the timer \gls{isr}
\lstinline{EE_tc_system_timer_handler}.  The average A2A time for the \gls{isr}
is \unit[600]{ns} greater than expected.  Since \lstinline{T_1} is activated
every \unit[10]{ms} or in other words for every tenth instance of the  timer
\gls{isr}, the expected deviation can be calculated as $d_{A2A} = 10 \cdot
600\,ns = 6\,us$.

To detect why the A2A times of the timer \gls{isr} diverge, it is necessary to
read the corresponding source code.  Whenever the timer \gls{isr} is executed
the time delta to the next instance is calculated based on the current number
of counter ticks in the timer register.  There is a time delta between the
point where the last counter ticks value is read and the point where the newly
calculated value is written.  This is the delta that causes the delay of
\unit[600]{ns}.  By doubling the frequency the delta reduces to \unit[300]{ns}
by halving the frequency it increases to \unit[1200]{ns} as expected.


\subsubsection{Task-Event Test}

\begin{figure}[]
 \centering
 \centerline{\includegraphics[width=\textwidth]{./media/eval/task_event.pdf}}
 \caption[Task-event test sequence]{Test application to validate \gls{btf}
 event actions.}
 \label{fig:task_event}
\end{figure}

\autoref{fig:task_event} shows the task-event test case.  \lstinline{T_1} is
activated in the same way as in the first test case.  Again, it activates
\lstinline{T_2} on a second core via \gls{ipa}.  \lstinline{T_2} executes a
runnable \lstinline{R_2}.  After execution of the runnable \lstinline{T_2}
waits for the event \lstinline{EVENT_1}.  Since the event is not set it
changes into the waiting state indicated by the orange color.  After
activating \lstinline{T_2}, \lstinline{T_1} executes a runnable \lstinline{R_1}
and sets the event \lstinline{EVENT_1}.  \lstinline{T_2} returns from the
waiting state, calls \lstinline{R_2} again, and clears the event
\lstinline{EVENT_1}.  The purpose of this test application is to cover the
following \gls{btf} actions:

\begin{itemize}
  \item Process: wait, release
  \item Event: wait\_event, set\_event, clear\_event
\end{itemize}

\begin{figure}[]
 \centering
 \centerline{\includegraphics[width=\textwidth]{./media/eval/task_event.png}}
 \caption[Task-event test gantt chart]{Comparison of hardware (top) and
 simulated (bottom) trace of the task event test application.}
 \label{fig:task_event_gantt}
\end{figure}

\autoref{fig:task_event_gantt} shows the Gantt chart for the task-event test
case.  As before \lstinline{T_1} is interrupted by the timer \gls{isr} multiple
times.  A separate row in the Gantt chart is used to indicate the current state
of the event entity.  An upward pointing arrow indicates that a process starts
waiting for an event.  The waiting period is colored in orange.  A downward
pointing arrow indicates that a process sets an event.  Finally, the event is
cleared which is indicated by an downward pointing arrow in red.

\begin{table}[]
  \centering
  \begin{tabular}{c c|c c c c}
          &       &  A2A $[ms]$ & RT $[ms]$ & CPU Waiting Core\_2 $[\%]$ \\
    \hline
      HW  & T\_1	& 10.006198	& 2.023460	& 0.000000 \\
          & T\_2	& 10.006189	& 3.018570	& 10.046955\\
      Sim & T\_1	& 10.000000	& 2.000100	& 0.000000 \\
          & T\_2	& 10.000000	& 3.000100	& 9.999000 \\
          \hline
  \end{tabular}
  \caption[Task-event metrics table]{Metrics of the task-event test application.}
  \label{tab:task_event}
\end{table}

\autoref{tab:task_event} shows the resulting metrics for the task-event test
case.  The activate-to-activate times depict the same behavior like the
previous test application as expected.  The relative waiting time on hardware
is greater than it is for the simulated trace.

A possible reason might be the longer runtime of the \lstinline{set_event}
routine on-target.  The task on core \lstinline{Core_1} sets the event for the
task on the second core.  Therefore, a \glsdesc{rpc} is necessary to
set the event.  Since the \gls{rpc} via \lstinline{EE_TC_iirq_handler} is not
taken into consideration in the simulation, the time in the waiting state is
longer on hardware.

Response times are also significantly longer on real hardware compared to
the simulated trace.  The response time measures the period from task
activation to termination of a task instance.  The difference in response time
sums up from different factors.

Firstly, the initial ready time, i.e.\ the period from task activation to start
is longer on hardware.  It takes about \unit[2]{us}.  Secondly, \lstinline{T_1}
is preempted by the timer \gls{isr} two times.  Category two \glspl{isr}
require a context switch which costs additional task execution time.  Finally,
the \gls{ipa} and \lstinline{TaskTerminate} routines take longer on real
hardware.  By measuring the execution times of the respective system services
it could be shown that the response times are equal if the measured overhead is
taken into consideration.  As mentioned before, these effects could be
respected for the simulation by adding the execution times of the routines to
the \gls{os} part of the timing model.


\subsubsection{Task-Resource Tests}

\begin{figure}[]
 \centering
 \centerline{\includegraphics[width=\textwidth]{./media/eval/task_resource_release_polling.pdf}}
 \caption[Task-resource-poll-parking test sequence]{Test application to validate
 semaphore events, especially the poll\_parking action.}
 \label{fig:task_resource_poll_parking}
\end{figure}

\begin{figure}[]
 \centering
 \centerline{\includegraphics[width=\textwidth]{./media/eval/task_resource_release_parking.pdf}}
 \caption[Task-resource-release-parking test sequence]{Test application to validate
 semaphore events, especially the release\_parking action.}
 \label{fig:task_resource_release_parking}
\end{figure}

The third and fourth test case are similar except for one difference as shown
in \autoref{fig:task_resource_poll_parking} and
\autoref{fig:task_resource_release_parking}.  As before, \lstinline{T_1} is
activated by a periodic stimulus and activates \lstinline{T_2} on another core
via \gls{ipa}.  \lstinline{T_1} executes the runnable \lstinline{R_1_1} which
requests the semaphore \lstinline{SEM_1}.  \lstinline{T_2} tries to request the
same semaphore which is now locked and changes into the active polling state
indicated by the red color.  As soon as \lstinline{R_1_1} finishes,
\lstinline{T_1} activates the task \lstinline{T_3} which has a higher priority
than \lstinline{T_2}, on the second core.  Consequently, \lstinline{T_2} is
deallocated and changed into the parking state.

At this point the two models differ.  In first model
\emph{task-resource-poll-parking} \lstinline{T_3} has a shorter execution time
than in the model \emph{task-resource-release-parking}.  Consequently, in the
former model \lstinline{T_2} is resumed while \lstinline{SEM_1} is still locked
and a poll\_parking action takes place.

In the latter case when \lstinline{T_3} has a longer execution time,
\lstinline{SEM_1} becomes free while \lstinline{T_2} is still preempted.  This
results in a release\_parking action and \lstinline{T_2} changes into the ready
state.  Once \lstinline{T_3} has terminated \lstinline{T_2} continues running
immediately.  The purpose of these applications is it to test the following
actions.

\begin{itemize}
  \item Process: park, poll\_parking, release\_parking, poll, run
  \item Semaphore: ready, lock, unlock, full, overfull
  \item Process-Semaphore: requestsemaphore, assigned, waiting, released
\end{itemize}

\begin{figure}[]
 \centering
 \centerline{\includegraphics[width=\textwidth]{./media/eval/task_resource_release_polling.png}}
 \caption[Task-resource-poll-parking test gantt chart]{Comparison of hardware
 (top) and simulated (bottom) trace of the task-resource-poll-parking test
 application.}
 \label{fig:task_resource_poll_parking_gantt}
\end{figure}

\begin{figure}[]
 \centering
 \centerline{\includegraphics[width=\textwidth]{./media/eval/task_resource_release_parking.png}}
 \caption[Task-resource-release-parking test gantt chart]{Comparison of hardware (top) and
 simulated (bottom) trace of the task-resource-release-parking test application.}
 \label{fig:task_resource_release_parking_gantt}
\end{figure}

\begin{table}[]
  \centering
  \begin{tabular}{c c|c c c}
          &       & RT $[ms]$ & Polling Time $[ms]$ & Parking Time $[ms]$ \\
    \hline
          & T\_1 & 2.524897 & 0.000000 & 0.000000 \\
    HW    & T\_2 & 3.269190 & 0.751730 & 0.508011 \\
          & T\_3 & 0.506321 & 0.000000 & 0.000000 \\
    \hline
          & T\_1 & 2.500140 & 0.000000 & 0.000000 \\
    Sim   & T\_2 & 3.250040 & 0.749800 & 0.500100 \\
          & T\_3 & 0.500100 & 0.000000 & 0.000000 \\
  \end{tabular}
  \caption[Task-resource-poll-parking metrics table]{Metrics of the
  task-resource-poll-parking test application.}
  \label{tab:task_resource_poll_parking}
\end{table}

\begin{table}[]
  \centering
  \begin{tabular}{c c|c c c}
          &       &  A2A $[ms]$ & RT $[ms]$ & CPU Parking Core\_2 $[\%]$ \\
    \hline
          & T\_1 & 10.005997 & 2.026420 & 0.000000 \\
    HW    & T\_2 & 10.005989 & 2.772670 & 4.984965 \\
          & T\_3 & 10.005984 & 0.756450 & 0.000000 \\
    \hline
          & T\_1 & 10.000000 & 2.000140 & 0.000000 \\
    Sim   & T\_2 & 10.000000 & 2.750240 & 4.949010 \\
          & T\_3 & 10.000000 & 0.750100 & 0.000000 \\
  \end{tabular}
  \caption[Task-resource-release-parking metrics table]{Metrics of the
  task-resource-release-parking test application.}
  \label{tab:task_resource_release_parking}
\end{table}


\autoref{fig:task_resource_poll_parking_gantt} and
\autoref{fig:task_resource_release_parking_gantt} show the comparison of the
traces for the two resource test applications.  For both test cases
\lstinline{T_1} requests \lstinline{SEM_1} as indicated by an upward pointing
arrow.  The semaphore is now locked and \lstinline{T_2} changes into the
polling mode when requesting it.  This is indicated by the yellow color.  Once
\lstinline{T_3} is activated \lstinline{T_2} changes into the parking mode
indicated by the orange color.

In \autoref{fig:task_resource_poll_parking_gantt} \lstinline{T_3} has a runtime
of \unit[500]{us} and resumes running before the semaphore is released.  Thus,
it returns into the polling state until the semaphore is released.  The release
event is depicted by a downward pointing arrow.
In \autoref{fig:task_resource_release_parking_gantt} the execution time is longer
and \lstinline{T_1} releases the semaphore earlier.  Consequently,
\lstinline{SEM_1} becomes free while \lstinline{T_2} is still deallocated from
the core and changes into the ready state.

For both resource test applications the \gls{btf} traces recorded from hardware
match the simulated traces as shown in the previous figures.  The metrics in
\autoref{tab:task_resource_poll_parking} and
\autoref{tab:task_resource_release_parking} show similar results compared to the
previous tables and are therefore not discussed again. 


\subsubsection{Task-MTA Test}

\begin{figure}[]
 \centering
 \centerline{\includegraphics[width=\textwidth]{./media/eval/task_mta.pdf}}
 \caption[Task-MTA test sequence]{Test application to validate mtalimitexceeded
 events.}
 \label{fig:task_mta}
\end{figure}

The purpose of the last specified test application is to validate the
correctness of \gls{mta} and mtalimitexceeded events.  \autoref{fig:task_mta}
shows the sequence diagram of the respective test model.  In this example
\lstinline{T_2} is allowed to have two activations.  This means two instances
of the task may be active in the system at the same point in time.

Like in the previous tests \lstinline{T_1} is activated by \lstinline{STI_T_1}
periodically.  \lstinline{T_1} then activates \lstinline{T_2} three consecutive
times via inter-core \gls{ipa}.  The runnable \lstinline{R_1} is executed to
consume some time between the activations.  After the first activation the task
starts running as expected.  The second activation is stored by the \gls{os}.
Once \lstinline{T_2} terminates, it changes into the ready state and starts
running again.  The third activation is not allowed by the \gls{os} as
indicated by the red box.  An error message is created and a mtalimitexceeded
event must be added to the \gls{btf} trace.

\begin{figure}[]
 \centering
 \centerline{\includegraphics[width=\textwidth]{./media/eval/task_mta.png}}
 \caption[Task-MTA test gantt chart]{Comparison of hardware (top) and
 simulated (bottom) trace of the task-MTA test application.}
 \label{fig:task_mta_gantt}
\end{figure}

\autoref{fig:task_mta_gantt} shows the comparison of the \gls{btf} traces
created by simulation and from hardware for the task-MTA test model.  The
hardware traces illustrates the procedure for an inter-core process activation
really well.  At first the activation is triggered on \lstinline{Core_1} as
shown in the row \lstinline{IPA_T_1}.  This results in the execution of the
inter-core communication \gls{isr} \lstinline{EE_TC_iirq_handler}.

The \gls{isr} then activates \lstinline{T_2} which changes into the ready state
indicated by the gray color.  During the second activation \lstinline{T_2} is
already in the running state.  Consequently, the activation is only illustrated
by a downward pointing arrow.  In the simulated trace the task keeps running
during the activation process.  In the hardware trace the task is preempted by
the inter-core \gls{isr} and the activation takes place while the task is in
the ready state.

During the third activation two instances of \lstinline{T_2} are already active
in the system.  Thus, no further activations are allowed and a mtalimitexceeded
event is created.  This is indicated by a downward pointing red arrow.  At
around \unit[81925]{us} the first instance of \lstinline{T_2} terminates and
the next instances becomes ready immediately.  Shortly after that the next
instance starts running.


\subsection{Randomized Tests}
\label{subsection:randomized_tests}

Randomized tests are used to avoid insufficient test coverage due to selection
bias in the creation of the test applications.  A tool for generating random
models automatically with respect to predefined constraints has been developed
in previous research projects \cite{sailer2014reconstruction}.  It allows the
creation of an arbitrary number of test models and works with respect to
user-defined distributions for example,  for the number of cores, tasks, and
runnables.  Based on these values models can be generated randomly.  

\begin{table}[]
  \centering
  \begin{tabular}{c|c c c c}
    Entities             & min & max & average & distrbution \\
    \hline
    Cores $[1]$                   &  2  &  -  &  -      & const       \\  
    Tasks $[1]$                   &  9  &  22 &  15     & weibull     \\
    Runnables/Task        $[1]$   &  6  &  13 &  -      & uniform     \\
    Instructions/Runnable $[10^3]$&  10 &  50 &  30     & weibull     \\
    Activation  $[ms]$            &  1  &  20 &  1000   & weibull     \\
    Signals $[1]$                 &  3  &  11 &  17     & weibull     \\
    Signals/Runnable $[1]$        & 3   &  7  &  -      & uniform     \\
  \end{tabular}
  \caption[Randomized model configuration]{The configuration used for creating
  test models randomly.}
  \label{tab:rand_config}
\end{table}

\autoref{tab:rand_config} shows the distributions for the number of entities
that should be created for each entity type.  This configuration is used for
each of the ten models that are tested in this section.  The distributions for
\emph{cores} and \emph{tasks} represent the number of entities of the
respective type in the system.  The metric \emph{runnables per task} determines
how many runnables are called from the context of each task.  Each task is
activated by a periodic stimulus with a period depending on the
\emph{activation} value.  \emph{Signals} specifies the number of signal
entities in the system and \emph{signals per runnable} the accesses to these
signals within the context of each runnable.  Event and resource entities
cannot be generated by the random model generator and are therefore not covered
by randomized tests.

Validating these models manually is not feasible.  Therefore, only the semantic
equality is tested because this can be done without user interaction.  In
previous work a closed loop model based development process was created to
conduct the proceeding shown in \autoref{fig:eval_idea} automatically
\cite{felixproject2}.  This process was extended to support the model generator
and semantic comparison of two traces.

\begin{figure}[]
 \centering
 \centerline{\includegraphics[width=0.55\textwidth]{./media/eval/semantic_impossible.pdf}}
 \caption[Semantic comparision problem]{Semantic comparison of multi-core
 systems is not feasible if the execution time of service routines varies
 between hardware and simulation.}
 \label{fig:task_runnable_signal}
\end{figure}

As mentioned before semantic equality could not be shown for any of the test
applications.  The reason for this is depicted in
\autoref{fig:task_runnable_signal}.  Assuming that one task activates another
task on a different core and executes multiple other actions afterwards.  The
position in which the start event of the second task is added depends on the
time that vanishes between activation and start.  This means two traces may be
semantically different even though they show the same behavior.  Consequently,
the definition of semantic equality used in this thesis is not sufficient for
the comparison of multi-core systems.  Nevertheless, by randomized comparison
of the traces the correctness of the mappings could be validated manually.
However, this fallback solution is not sufficient for validating a wide range
of test cases.