\section{Test Cases} As discussed in the previous section traces can differ in a temporal and in a semantic way. To exclude the appearance of temporal discrepancies due to a wrong trace setup, the timing accuracy is tested based on code with known event-to-event durations. Next, the semantic correctness of the trace mapping is validated based on manually created test models. Finally, randomized models are generated in order to detect semantic errors that may not be detected by the manually created models due to selection bias \cite{geddes1990cases}. \subsection{Timing Precision} In \autoref{listing:instructionblock} code to execute a fixed number of instructions is introduced. This code is now used to evaluate the timing precision of the trace setup. According to \autoref{subsection:hardware_tracing} the setup should allow for cycle accurate trace measurement. The Infineon Aurix processor family provides performance counters \cite{tc29xa}. Once started, these counters are incremented based on the CPU core frequency. A frequency of \unit[100]{MHz} is used for the validation, consequently an increment occurs every \unit[10]{ns}. The counter can be started at an arbitrary point in time for example, at program start. By reading the counter value at the beginning and at the end of a critical section the clock cycles that expired between these two points can be determined. \begin{code} \begin{lstlisting}[caption={[Trace setup accuracy validation] Code to validate the timing precision of the trace setup.}, label={listing:accuracy_validation}] EE_UINT32 i; EE_UINT32 ccntStart; EE_UINT32 ccntEnd; EE_UINT32 n = N / 4; __asm("nop"); ccntStart = EE_tc_get_CCNT(); __asm("nop"); for (i = 0; i < n; i++) {% __asm("nop"); __asm("nop"); __asm("nop"); __asm("nop"); } __asm("nop"); ccntEnd = EE_tc_get_CCNT(); \end{lstlisting} \end{code} \autoref{listing:accuracy_validation} shows the code that is used to check the timing precision. \gls{ee} provides the API function \lstinline{EE_tc_get_CCNT} to read out the performance counter register. As described above, the performance counters are read out before and after the critical section. The critical section is guarded with two additional \lstinline{nop} assembly instruction to avoid compiler optimization. Additionally, the generated assembly code was examined manually to verify that no unwanted instructions were added by the compiler. A for loop is used to execute a predefined number of instructions. The number of repetitions is depended on the define \lstinline{N} which should be a multiple of four. The code is now executed for different values of \lstinline{N}. For each event the expected number of clock cycles $c_e$, the actual number of clock cycles $c_a$, the expected time difference $t_e$ in nanoseconds, and the actual time difference $t_a$ in nanoseconds between the writes to \lstinline{ccntStart} and \lstinline{ccntEnd} are listed in \autoref{tab:precision_validation}. The expected number of clock cycles is calculated by $c_e = N + 2$. The value two is added because of the additional \lstinline{nop} instructions. The expected time is calculated by $t_e = c_e * \frac{1}{f}$ where $f$ is the processor frequency. The actual number of clock cycles is calculated by $c_a = ccntEnd - ccntStart$. The actual time is calculated by $t_a = t_j - t_i$ where $j$ is the index of the write event to \lstinline{ccntEnd} and $i$ is the index of the write event to \lstinline{ccntStart}. Four different values for \lstinline{N}, $128$, $1024$, $4096$, and $65536$ are chosen and for each value $101$ measurement samples are taken. The results for all samples with the same value of \lstinline{N} are equal. It can be observed that for all values of \lstinline{N} the execution of the critical section takes four ticks more than the expected value $e_c$. This is because the additional instruction executed by the second call to \lstinline{EE_tc_get_CCNT} are not taken into consideration. Consequently, the expected and the actual execution time differ by \unit[40]{ns}. Besides this differences, the result is as expected and the conclusion that the setup is in fact able to measure hardware events on a cycle accurate basis can be drawn. \begin{table}[] \centering \begin{tabular}{c|c c c c} N & 128 & 1024 & 4096 & 65536 \\ \hline $c_e\, [1]$ & 130 & 1026 & 4098 & 65538 \\ $c_a\, [1]$ & 134 & 1030 & 4102 & 65542 \\ $t_e\, [us]$ & 1.300 & 10.260 & 40.980 & 655.380 \\ $t_a\, [us]$ & 1.340 & 10.300 & 41.020 & 655.420 \\ samples & 101 & 101 & 101 & 101 \\ \end{tabular} \caption[Trace setup measurement precision]{Experiment to validate the accuracy of the trace setup. A code snippet that takes a known number of instructions $c_e$ is executed. Based on the number of instructions the expected execution time $t_e$ can be calculated. If cycle accurate measurement is supported, the actual execution time $t_a$ should be equal to $t_e$. The execution times differ by \unit[40]{ns} because the expected number of instructions is off by four cycles. If this deviation is taken into consideration $t_e$ and $t_a$ coincide.} \label{tab:precision_validation} \end{table} \subsection{Systematic Tests} \label{subsection:systematic_tests} In this section test models are created systematically to validate the complete software to \gls{btf} event mapping discussed in \autoref{chapter:mapping}. For each test application a simulated and a hardware based \gls{btf} trace is generated as shown in \autoref{fig:eval_idea}. The traces are then compared in three steps. \begin{itemize} \item A basic plausibility test based on the Gantt chart of the TA Tool Suite is conducted. \item The semantic equality is validated. \item Different real-time metrics are compared and discussed. \end{itemize} Five test models as shown in the following list are required to cover all \gls{btf} actions for which a mapping has been provided. \begin{itemize} \item task-runnable-signal \item task-event \item task-resource-release-parking \item task-resource-poll-parking \item task-MTA \end{itemize} Each model represents a periodic system where a defined sequence of events is executed every \unit[10]{ms}. UML sequence diagrams \cite{fowler2004uml} are used to illustrate the behavior of the test applications during one period. \subsubsection{Task-Runnable-Signal Test} \begin{figure}[] \centering \centerline{\includegraphics[width=\textwidth]{./media/eval/task_runnable_signal.pdf}} \caption[Task-runnable-signal test sequence]{Test application to validate basic task and signal read and write events.} \label{fig:task_runnable_signal} \end{figure} The task-runnable-signal application is depicted in \autoref{fig:task_runnable_signal}. Task \lstinline{T_1} is activated periodically by the stimulus \lstinline{STI_T_1} every \unit[10]{ms}. \lstinline{T_1} activates \lstinline{T_2} on another core via \gls{ipa} and then executes runnable \lstinline{R_1}. \lstinline{T_2} executes a runnable \lstinline{R_2_1} which executes another runnable \lstinline{R_2_2}. Once execution of \lstinline{R_1} has finished, \lstinline{T_1} activates another task \lstinline{T_3} on the second core which has a higher priority then \lstinline{T_2}. Consequently, \lstinline{T_2}, \lstinline{R_2_1}, and \lstinline{R_2_2} are preempted as indicated by the light green and light blue colors. \lstinline{T_3} calls a runnable \lstinline{R_3}. The runnables \lstinline{R_1} and \lstinline{R_3} both read and write the signal \lstinline{SIG_1}. Once \lstinline{T_3} has terminate, \lstinline{T_2} and the corresponding runnables resume execution. The purpose of this test application is to cover the following \gls{btf} actions: \begin{itemize} \item Stimulus: trigger by alarm and \gls{ipa} \item Task: activate, start, preempt, resume, terminate \item ISR: activate, start, terminate \item Runnable: start, resume, suspend, terminate \item Signal: read, write \end{itemize} \begin{figure}[] \centering \centerline{\includegraphics[width=\textwidth]{./media/eval/task_runnable_signal.png}} \caption[Task-runnable-signal test gantt chart]{Hardware and software trace for the task-runnable-signal test model. Attention must be directed to the signal read and write accesses to \lstinline{SIG_1}. Additionally, the nested runnables must be suspended when the respective task \lstinline{T_2} is preempted.} \label{fig:task_runnable_signal_gantt} \end{figure} Based on the Gantt chart of the TA Tool Suite the \gls{btf} trace can be compared visually. The hardware trace is shown in the upper part and the simulated trace in the lower part of each picture. Both traces use the same time scale so that semantic and temporal comparison is feasible. \autoref{fig:task_runnable_signal_gantt} shows one period of the task-runnable-signal test application in the Gantt chart of the \gls{ta} Tool Suite. The figure depicts that \lstinline{R_2_2} is called from the context of \lstinline{R_2_1}. When \lstinline{T_2} is preempted, both runnables must be suspended too, indicated by the light blue color in contrast to the stronger blue when a runnable is running. Runnable entities are not shown in the traces for the other test models for clarity. A running task is colored in dark green, while preempted tasks are shown in light green. A separate row in the Gantt chart is used to depict signal accesses from the context of tasks. Whenever a horizontal line is drawn the corresponding signal is read or written. The former is indicated by an arrow pointing up at the bottom of the row. The latter is indicated by an arrow pointing down at the top of the row. It can be seen that the signal accesses are recorded on hardware as expected. The hardware trace shows two additional \glspl{isr} that are not part of the simulation trace. \lstinline{EE_tc_system_timer_handler} is a timer interrupt which is executed every \unit[1]{ms} and serves as clock source for the system counter. \lstinline{EE_TC_iirq_handler} is used for remote procedure calls. Two traces can not be semantically identical if entities exist in one trace that are not part of the other trace. There are two ways two solve this problem. Either the \glspl{isr} are added to the system model and therefore considered during simulation or all \gls{btf} events related to the \glspl{isr} are removed from the hardware trace. A script that checks the semantic equality of two traces based on the criteria established in \autoref{subsection:validation_techniques} is used for the second validation step. However, semantic equality could not be shown for the test cases in this and the next section. The reason for this is discussed in \autoref{subsection:randomized_tests}. The TA Inspector is capable of calculating a variety of real-time metrics based on \gls{btf} traces. Selected metrics are shown to discuss the similarities and discrepancies between hardware and simulation trace. Common metric types are activate-to-activate (A2A), response time (RT), net execution time (NET), and CPU core load. The upper part of each metric table shows the hardware trace metrics abbreviated by \emph{HW} and the lower part shows the simulation trace metrics abbreviated by \emph{Sim}. \begin{table}[] \centering \begin{tabular}{c c|c c c c} & & A2A $[ms]$ & RT $[ms]$ & Load Core\_1 $[\%]$ & Load Core\_2 $[\%]$ \\ \hline & T\_1 & 10.005998 & 3.025510 & 30.124423 & 0.000000 \\ HW & T\_2 & 10.005990 & 6.516440 & 0.000000 & 49.950032 \\ & T\_3 & 10.005987 & 1.506300 & 0.000000 & 15.000495 \\ \hline & Sum & - & - & 30.12 & 64.95 \\ &&&&& \\ & T\_1 & 10.000000 & 3.000100 & 30.000000 & 0.000000 \\ Sim & T\_2 & 10.000000 & 6.500200 & 0.000000 & 50.000000 \\ & T\_3 & 10.000000 & 1.500100 & 0.000000 & 15.000000 \\ \hline & Sum & - & - & 30.00 & 65.00 \\ \end{tabular} \caption[Task-runnable-signal metrics table]{Metrics of the task-runnable-signal test application. Activation-to-activation (A2A) and response time (RT) are average values calculated over all instances of the respective entity.} \label{tab:task_runnable_signal} \end{table} \autoref{tab:task_runnable_signal} shows selected real-time metrics for the task-runnable-signal application. In the first approximation all values seem identical so the basic configuration of the complete setup is likely to be correct. Nevertheless, the activate-to-activate times between hardware and simulation differ by almost \unit[6]{us} which is non-negligible. The reason for this deviation can be found by examining the activate-to-activate times of the timer \gls{isr} \lstinline{EE_tc_system_timer_handler}. The average A2A time for the \gls{isr} is \unit[600]{ns} greater than expected. Since \lstinline{T_1} is activated every \unit[10]{ms} or in other words for every tenth instance of the timer \gls{isr}, the expected deviation can be calculated as $d_{A2A} = 10 \cdot 600\,ns = 6\,us$. To detect why the A2A times of the timer \gls{isr} diverge, it is necessary to read the corresponding source code. Whenever the timer \gls{isr} is executed the time delta to the next instance is calculated based on the current number of counter ticks in the timer register. There is a time delta between the point where the last counter ticks value is read and the point where the newly calculated value is written. This is the delta that causes the delay of \unit[600]{ns}. By doubling the frequency the delta reduces to \unit[300]{ns} by halving the frequency it increases to \unit[1200]{ns} as expected. \subsubsection{Task-Event Test} \begin{figure}[] \centering \centerline{\includegraphics[width=\textwidth]{./media/eval/task_event.pdf}} \caption[Task-event test sequence]{Test application to validate \gls{btf} event actions.} \label{fig:task_event} \end{figure} \autoref{fig:task_event} shows the task-event test case. \lstinline{T_1} is activated in the same way as in the first test case. Again, it activates \lstinline{T_2} on a second core via \gls{ipa}. \lstinline{T_2} executes a runnable \lstinline{R_2}. After execution of the runnable \lstinline{T_2} waits for the event \lstinline{EVENT_1}. Since the event is not set it changes into the waiting state indicated by the orange color. After activating \lstinline{T_2}, \lstinline{T_1} executes a runnable \lstinline{R_1} and sets the event \lstinline{EVENT_1}. \lstinline{T_2} returns from the waiting state, calls \lstinline{R_2} again, and clears the event \lstinline{EVENT_1}. The purpose of this test application is to cover the following \gls{btf} actions: \begin{itemize} \item Process: wait, release \item Event: wait\_event, set\_event, clear\_event \end{itemize} \begin{figure}[] \centering \centerline{\includegraphics[width=\textwidth]{./media/eval/task_event.png}} \caption[Task-event test gantt chart]{Comparison of hardware (top) and simulated (bottom) trace of the task event test application.} \label{fig:task_event_gantt} \end{figure} \autoref{fig:task_event_gantt} shows the Gantt chart for the task-event test case. As before \lstinline{T_1} is interrupted by the timer \gls{isr} multiple times. A separate row in the Gantt chart is used to indicate the current state of the event entity. An upward pointing arrow indicates that a process starts waiting for an event. The waiting period is colored in orange. A downward pointing arrow indicates that a process sets an event. Finally, the event is cleared which is indicated by an downward pointing arrow in red. \begin{table}[] \centering \begin{tabular}{c c|c c c c} & & A2A $[ms]$ & RT $[ms]$ & CPU Waiting Core\_2 $[\%]$ \\ \hline HW & T\_1 & 10.006198 & 2.023460 & 0.000000 \\ & T\_2 & 10.006189 & 3.018570 & 10.046955\\ Sim & T\_1 & 10.000000 & 2.000100 & 0.000000 \\ & T\_2 & 10.000000 & 3.000100 & 9.999000 \\ \hline \end{tabular} \caption[Task-event metrics table]{Metrics of the task-event test application.} \label{tab:task_event} \end{table} \autoref{tab:task_event} shows the resulting metrics for the task-event test case. The activate-to-activate times depict the same behavior like the previous test application as expected. The relative waiting time on hardware is greater than it is for the simulated trace. A possible reason might be the longer runtime of the \lstinline{set_event} routine on-target. The task on core \lstinline{Core_1} sets the event for the task on the second core. Therefore, a \glsdesc{rpc} is necessary to set the event. Since the \gls{rpc} via \lstinline{EE_TC_iirq_handler} is not taken into consideration in the simulation, the time in the waiting state is longer on hardware. Response times are also significantly longer on real hardware compared to the simulated trace. The response time measures the period from task activation to termination of a task instance. The difference in response time sums up from different factors. Firstly, the initial ready time, i.e.\ the period from task activation to start is longer on hardware. It takes about \unit[2]{us}. Secondly, \lstinline{T_1} is preempted by the timer \gls{isr} two times. Category two \glspl{isr} require a context switch which costs additional task execution time. Finally, the \gls{ipa} and \lstinline{TaskTerminate} routines take longer on real hardware. By measuring the execution times of the respective system services it could be shown that the response times are equal if the measured overhead is taken into consideration. As mentioned before, these effects could be respected for the simulation by adding the execution times of the routines to the \gls{os} part of the timing model. \subsubsection{Task-Resource Tests} \begin{figure}[] \centering \centerline{\includegraphics[width=\textwidth]{./media/eval/task_resource_release_polling.pdf}} \caption[Task-resource-poll-parking test sequence]{Test application to validate semaphore events, especially the poll\_parking action.} \label{fig:task_resource_poll_parking} \end{figure} \begin{figure}[] \centering \centerline{\includegraphics[width=\textwidth]{./media/eval/task_resource_release_parking.pdf}} \caption[Task-resource-release-parking test sequence]{Test application to validate semaphore events, especially the release\_parking action.} \label{fig:task_resource_release_parking} \end{figure} The third and fourth test case are similar except for one difference as shown in \autoref{fig:task_resource_poll_parking} and \autoref{fig:task_resource_release_parking}. As before, \lstinline{T_1} is activated by a periodic stimulus and activates \lstinline{T_2} on another core via \gls{ipa}. \lstinline{T_1} executes the runnable \lstinline{R_1_1} which requests the semaphore \lstinline{SEM_1}. \lstinline{T_2} tries to request the same semaphore which is now locked and changes into the active polling state indicated by the red color. As soon as \lstinline{R_1_1} finishes, \lstinline{T_1} activates the task \lstinline{T_3} which has a higher priority than \lstinline{T_2}, on the second core. Consequently, \lstinline{T_2} is deallocated and changed into the parking state. At this point the two models differ. In first model \emph{task-resource-poll-parking} \lstinline{T_3} has a shorter execution time than in the model \emph{task-resource-release-parking}. Consequently, in the former model \lstinline{T_2} is resumed while \lstinline{SEM_1} is still locked and a poll\_parking action takes place. In the latter case when \lstinline{T_3} has a longer execution time, \lstinline{SEM_1} becomes free while \lstinline{T_2} is still preempted. This results in a release\_parking action and \lstinline{T_2} changes into the ready state. Once \lstinline{T_3} has terminated \lstinline{T_2} continues running immediately. The purpose of these applications is it to test the following actions. \begin{itemize} \item Process: park, poll\_parking, release\_parking, poll, run \item Semaphore: ready, lock, unlock, full, overfull \item Process-Semaphore: requestsemaphore, assigned, waiting, released \end{itemize} \begin{figure}[] \centering \centerline{\includegraphics[width=\textwidth]{./media/eval/task_resource_release_polling.png}} \caption[Task-resource-poll-parking test gantt chart]{Comparison of hardware (top) and simulated (bottom) trace of the task-resource-poll-parking test application.} \label{fig:task_resource_poll_parking_gantt} \end{figure} \begin{figure}[] \centering \centerline{\includegraphics[width=\textwidth]{./media/eval/task_resource_release_parking.png}} \caption[Task-resource-release-parking test gantt chart]{Comparison of hardware (top) and simulated (bottom) trace of the task-resource-release-parking test application.} \label{fig:task_resource_release_parking_gantt} \end{figure} \begin{table}[] \centering \begin{tabular}{c c|c c c} & & RT $[ms]$ & Polling Time $[ms]$ & Parking Time $[ms]$ \\ \hline & T\_1 & 2.524897 & 0.000000 & 0.000000 \\ HW & T\_2 & 3.269190 & 0.751730 & 0.508011 \\ & T\_3 & 0.506321 & 0.000000 & 0.000000 \\ \hline & T\_1 & 2.500140 & 0.000000 & 0.000000 \\ Sim & T\_2 & 3.250040 & 0.749800 & 0.500100 \\ & T\_3 & 0.500100 & 0.000000 & 0.000000 \\ \end{tabular} \caption[Task-resource-poll-parking metrics table]{Metrics of the task-resource-poll-parking test application.} \label{tab:task_resource_poll_parking} \end{table} \begin{table}[] \centering \begin{tabular}{c c|c c c} & & A2A $[ms]$ & RT $[ms]$ & CPU Parking Core\_2 $[\%]$ \\ \hline & T\_1 & 10.005997 & 2.026420 & 0.000000 \\ HW & T\_2 & 10.005989 & 2.772670 & 4.984965 \\ & T\_3 & 10.005984 & 0.756450 & 0.000000 \\ \hline & T\_1 & 10.000000 & 2.000140 & 0.000000 \\ Sim & T\_2 & 10.000000 & 2.750240 & 4.949010 \\ & T\_3 & 10.000000 & 0.750100 & 0.000000 \\ \end{tabular} \caption[Task-resource-release-parking metrics table]{Metrics of the task-resource-release-parking test application.} \label{tab:task_resource_release_parking} \end{table} \autoref{fig:task_resource_poll_parking_gantt} and \autoref{fig:task_resource_release_parking_gantt} show the comparison of the traces for the two resource test applications. For both test cases \lstinline{T_1} requests \lstinline{SEM_1} as indicated by an upward pointing arrow. The semaphore is now locked and \lstinline{T_2} changes into the polling mode when requesting it. This is indicated by the yellow color. Once \lstinline{T_3} is activated \lstinline{T_2} changes into the parking mode indicated by the orange color. In \autoref{fig:task_resource_poll_parking_gantt} \lstinline{T_3} has a runtime of \unit[500]{us} and resumes running before the semaphore is released. Thus, it returns into the polling state until the semaphore is released. The release event is depicted by a downward pointing arrow. In \autoref{fig:task_resource_release_parking_gantt} the execution time is longer and \lstinline{T_1} releases the semaphore earlier. Consequently, \lstinline{SEM_1} becomes free while \lstinline{T_2} is still deallocated from the core and changes into the ready state. For both resource test applications the \gls{btf} traces recorded from hardware match the simulated traces as shown in the previous figures. The metrics in \autoref{tab:task_resource_poll_parking} and \autoref{tab:task_resource_release_parking} show similar results compared to the previous tables and are therefore not discussed again. \subsubsection{Task-MTA Test} \begin{figure}[] \centering \centerline{\includegraphics[width=\textwidth]{./media/eval/task_mta.pdf}} \caption[Task-MTA test sequence]{Test application to validate mtalimitexceeded events.} \label{fig:task_mta} \end{figure} The purpose of the last specified test application is to validate the correctness of \gls{mta} and mtalimitexceeded events. \autoref{fig:task_mta} shows the sequence diagram of the respective test model. In this example \lstinline{T_2} is allowed to have two activations. This means two instances of the task may be active in the system at the same point in time. Like in the previous tests \lstinline{T_1} is activated by \lstinline{STI_T_1} periodically. \lstinline{T_1} then activates \lstinline{T_2} three consecutive times via inter-core \gls{ipa}. The runnable \lstinline{R_1} is executed to consume some time between the activations. After the first activation the task starts running as expected. The second activation is stored by the \gls{os}. Once \lstinline{T_2} terminates, it changes into the ready state and starts running again. The third activation is not allowed by the \gls{os} as indicated by the red box. An error message is created and a mtalimitexceeded event must be added to the \gls{btf} trace. \begin{figure}[] \centering \centerline{\includegraphics[width=\textwidth]{./media/eval/task_mta.png}} \caption[Task-MTA test gantt chart]{Comparison of hardware (top) and simulated (bottom) trace of the task-MTA test application.} \label{fig:task_mta_gantt} \end{figure} \autoref{fig:task_mta_gantt} shows the comparison of the \gls{btf} traces created by simulation and from hardware for the task-MTA test model. The hardware traces illustrates the procedure for an inter-core process activation really well. At first the activation is triggered on \lstinline{Core_1} as shown in the row \lstinline{IPA_T_1}. This results in the execution of the inter-core communication \gls{isr} \lstinline{EE_TC_iirq_handler}. The \gls{isr} then activates \lstinline{T_2} which changes into the ready state indicated by the gray color. During the second activation \lstinline{T_2} is already in the running state. Consequently, the activation is only illustrated by a downward pointing arrow. In the simulated trace the task keeps running during the activation process. In the hardware trace the task is preempted by the inter-core \gls{isr} and the activation takes place while the task is in the ready state. During the third activation two instances of \lstinline{T_2} are already active in the system. Thus, no further activations are allowed and a mtalimitexceeded event is created. This is indicated by a downward pointing red arrow. At around \unit[81925]{us} the first instance of \lstinline{T_2} terminates and the next instances becomes ready immediately. Shortly after that the next instance starts running. \subsection{Randomized Tests} \label{subsection:randomized_tests} Randomized tests are used to avoid insufficient test coverage due to selection bias in the creation of the test applications. A tool for generating random models automatically with respect to predefined constraints has been developed in previous research projects \cite{sailer2014reconstruction}. It allows the creation of an arbitrary number of test models and works with respect to user-defined distributions for example, for the number of cores, tasks, and runnables. Based on these values models can be generated randomly. \begin{table}[] \centering \begin{tabular}{c|c c c c} Entities & min & max & average & distrbution \\ \hline Cores $[1]$ & 2 & - & - & const \\ Tasks $[1]$ & 9 & 22 & 15 & weibull \\ Runnables/Task $[1]$ & 6 & 13 & - & uniform \\ Instructions/Runnable $[10^3]$& 10 & 50 & 30 & weibull \\ Activation $[ms]$ & 1 & 20 & 1000 & weibull \\ Signals $[1]$ & 3 & 11 & 17 & weibull \\ Signals/Runnable $[1]$ & 3 & 7 & - & uniform \\ \end{tabular} \caption[Randomized model configuration]{The configuration used for creating test models randomly.} \label{tab:rand_config} \end{table} \autoref{tab:rand_config} shows the distributions for the number of entities that should be created for each entity type. This configuration is used for each of the ten models that are tested in this section. The distributions for \emph{cores} and \emph{tasks} represent the number of entities of the respective type in the system. The metric \emph{runnables per task} determines how many runnables are called from the context of each task. Each task is activated by a periodic stimulus with a period depending on the \emph{activation} value. \emph{Signals} specifies the number of signal entities in the system and \emph{signals per runnable} the accesses to these signals within the context of each runnable. Event and resource entities cannot be generated by the random model generator and are therefore not covered by randomized tests. Validating these models manually is not feasible. Therefore, only the semantic equality is tested because this can be done without user interaction. In previous work a closed loop model based development process was created to conduct the proceeding shown in \autoref{fig:eval_idea} automatically \cite{felixproject2}. This process was extended to support the model generator and semantic comparison of two traces. \begin{figure}[] \centering \centerline{\includegraphics[width=0.55\textwidth]{./media/eval/semantic_impossible.pdf}} \caption[Semantic comparision problem]{Semantic comparison of multi-core systems is not feasible if the execution time of service routines varies between hardware and simulation.} \label{fig:task_runnable_signal} \end{figure} As mentioned before semantic equality could not be shown for any of the test applications. The reason for this is depicted in \autoref{fig:task_runnable_signal}. Assuming that one task activates another task on a different core and executes multiple other actions afterwards. The position in which the start event of the second task is added depends on the time that vanishes between activation and start. This means two traces may be semantically different even though they show the same behavior. Consequently, the definition of semantic equality used in this thesis is not sufficient for the comparison of multi-core systems. Nevertheless, by randomized comparison of the traces the correctness of the mappings could be validated manually. However, this fallback solution is not sufficient for validating a wide range of test cases.