Add final version of master thesis.

This commit is contained in:
2020-08-20 11:39:46 -04:00
parent 6ba2097e6b
commit 6ff06af4ff
94 changed files with 30356 additions and 1 deletions

28
content/abstract.tex Normal file
View File

@@ -0,0 +1,28 @@
\chapter*{Abstract}
Embedded real-time multi-core systems must adhere to strict timing requirements
in order to guarantee correct execution. Timing requirements are specified to
document system execution paths that are safety critical with respect to the
timing behavior of an application.
Via tracing it is possible to validate the fulfillment of timing requirements
in the native environment of a microcontroller. However, trace tools produce a
trace on hardware or software level, whereas requirements are specified on
system level. A transformation of the former to the latter is required to
close this gap.
Additionally, not all trace techniques are capable of producing results
suitable for the real-time analysis of embedded applications. Most techniques
are not sufficient for one or several reasons: limited trace duration,
inadequate number of recordable objects, and limited timing accuracy.
Therefore, this thesis examines different trace techniques and shows why
hardware tracing is the most sufficient for real-time analysis. Next, the
coherence between hardware, software, and system level entities is examined.
Based on the results a mapping from software level to system level is
introduced and validated.
The thesis concludes that it is possible to record cycle accurate system traces
of arbitrary length via hardware tracing. However, this requires detailed
knowledge about hardware tracing and the operating system underlying an
application.

83
content/conclusion.tex Normal file
View File

@@ -0,0 +1,83 @@
\chapter{Conclusion}
\label{chapter:conclusion}
\subsubsection{Cycle Accurate Tracing}
Hypothesis~\ref{hyp:1} asks whether there is a trace technique capable of
recording cycle accurate traces with a duration of at least one second. There
exists three general measurement techniques. Hybrid and software based trace
tools rely on instrumentation. Thus, they change the runtime behavior of
an application and do not allow cycle accurate trace recording.
Additionally, an on-chip memory to buffer the recorded trace events is
required. Hence, the trace duration is strongly limited by the available
memory. An application with 28 tasks can only be traced for \unit[350]{ms}
using the Gliwa T1 hybrid trace tool \cite{kastner2011integrated} providing
events solely on task level. Runnables were not even considered.
Hardware tracing is the only trace technique that allows cycle accurate traces
with a duration of at least one second. Actually, durations of over ten
seconds are possible with the correct hardware configuration.
However, there are certain limitations for the hardware platform used in this
thesis. Depending on the clock configuration not all data events are recorded.
This can be avoided by using a CPU core clock frequency smaller or equal than
\unit[160]{MHz}. Therefore, Hypothesis~\ref{hyp:1} is true.
\subsubsection{ORTI Based Software to System Mapping}
Hardware trace tools create traces on software level. This level is not
sufficient for the real-time analysis of embedded systems. A transformation
from software to system level is therefore required. \gls{orti} was designed
to give third party tools additional information for the trace recording of
applications that use an \gls{osek} compliant \gls{os}. Hypothesis~\ref{hyp:2}
asks if \gls{orti} is sufficient to create a complete mapping from software to
system level.
It has been shown that \gls{orti} can be used to cover only a subset of the
\gls{os} entity types specified in the \gls{btf} standard. Even for those
entities covered by \gls{orti} no complete mapping is feasible. For example,
information about task entities is included in the \gls{orti} file, but it is
not feasible to determine the source entity for a \emph{mtalimitexceeded}
event. Consequently, Hypothesis~\ref{hyp:2} does not hold.
However, it should be noted that \gls{orti} allows it to specify \gls{os}
vendor specific attributes. This means in case a mapping is basically possible
as claimed by Hypothesis~\ref{hyp:3} then it would be possible to include the
required information in the \gls{orti} file.
Nevertheless, to the best of my knowledge this thesis is the first work to show
that \gls{btf} \emph{trigger} actions and all process actions except
\emph{mtalimitexceeded} can be created based on the \gls{orti} sections
specified by \gls{osek}.
\subsubsection{Software to System Mapping}
No complete mapping from software to system entities is feasible by relying
solely on the information in the \gls{orti} file. Additional information is
required to achieve a complete mapping. On the one hand a detailed
understanding of the \gls{os} internals is required, on the other hand meta
information must be provided to the transformation algorithm.
The concept of runnables and signals is not specified by \gls{osek}.
Basically, runnables are functions and signals are variables. It is possible
to create runnable and signal events via function and data tracing. A list of
all entities is required to distinguish regular functions from runnables and
regular variables from signals.
To create \gls{btf} events for the event entity type it is necessary to
understand the respective code of the \gls{os}. By parsing the statically
created C header files the event \glspl{id} can be retrieved and the correct
events can be created.
Semaphore events are the most complex entity types to reconstruct via
hardware tracing. \gls{btf} supports all possible types of semaphore like
synchronization mechanisms. Hence, a variety of different actions are
specified. A possible mapping for \gls{osek} resource entities is nevertheless
provided in this thesis.
To the best of my knowledge this is the first work to show that all \gls{btf}
signal, runnable, event, and semaphore actions can be recreated from an
\gls{osek} compliant \gls{os}. Therefore, Hypothesis~\ref{hyp:3} is true.

16
content/erkl.tex Normal file
View File

@@ -0,0 +1,16 @@
\section*{Declaration of original authorship}
\addcontentsline{toc}{section}{\protect\numberline{\thesection}Declaration of original authorship}
\stepcounter{section}
\begin{itemize}
\item[] Mir ist bekannt, dass dieses Exemplar der Masterarbeit als Prüfungsleistung in das Eigentum des Freistaates Bayern übergeht.
\item[] Ich versichere, dass ich die vorliegende Arbeit selbständig verfasst und außer den angeführten keine weiteren Hilfsmittel benützt habe.
\item[] Soweit aus den im Literaturverzeichnis angegebenen Werken und Internetquellen einzelne Stellen dem Wortlaut oder dem Sinn nach entnommen sind, sind sie in jedem Fall unter der Angabe der Entlehnung kenntlich gemacht.
\item[] Die Versicherung der selbständigen Arbeit bezieht sich auch auf die in der Arbeit enthaltenen Zeichen-, Kartenskizzen und bildlichen Darstellungen.
\item[] Ich versichere, dass meine Masterarbeit bis jetzt bei keiner anderen Stelle veröffentlicht wurde. Mir ist bewusst, dass eine Veröffentlichung vor der abgeschlossenen Bewertung nicht erfolgen darf.
\item[] Ich bin mir darüber im Klaren, dass ein Verstoß hiergegen zum Ausschluss von der Prüfung führt oder die Prüfung ungültig macht.
\end{itemize}
\vspace{2cm}
Regensburg, den 28.10.2015

10
content/fundamentals.tex Normal file
View File

@@ -0,0 +1,10 @@
\chapter{Fundamentals}
\label{chapter:fundamentals}
This thesis discusses the transformation of hardware events to system events
for \gls{osek} compliant real-time \glspl{os}. Hence, the parts of \gls{osek}
that are relevant for this thesis are described in the following.
Additionally, a well-defined format is required to represent the resulting
traces consisting of entities on system level. The \gls{btf} format which is
discussed in \autoref{chapter:btf} is used within the context of this work.

85
content/future.tex Normal file
View File

@@ -0,0 +1,85 @@
\chapter{Future Work}
\label{chapter:future_work}
\subsubsection{Improve Trace Interface Standard}
It has been shown that a complete software to system mapping is possible for an
\gls{osekos} and should accordingly also be possible for an \gls{autosaros}.
However, detailed knowledge of the \gls{os} is required to understand and
implement this mapping. \gls{osek} tries to minimize this effort via the
\gls{orti} trace interface. Unfortunately, this interface is only regulated
for a subset of all \gls{os} entity types.
Some entities like spinlocks, semaphores, and inter-process communication
techniques like \gls{autosar} sender-receiver-communication are not covered at
all. In theory \gls{osek} allows it to add additional attributes to the
\gls{orti} file, but this option is currently not comprehensively used by the
\gls{os} vendors. To solve this problem further efforts to reach a common
trace interface standard for all \gls{autosar} system entities should be made.
\subsubsection{Evaluate Different Hardware Platforms}
In this thesis the feasibility of recording cycle accurate hardware traces was
validated for the Infineon Aurix TriCore processor family using the Infineon
Multi-Core Debug System. As described in \autoref{subsection:hardware_tracing}
there exists different trace standards for other processor families.
In order to achieve a better understanding of the trace capabilities of various
hardware platforms different other processor families should be tested in the
future. It has been shown that cycle accurate recording of data events on the
Infineon TC298TF processor is only feasible for certain clock settings. It
would be interesting to know if similar constraints also exist for other
platforms.
\subsubsection{Evaluate Different Operating Systems}
\glsdesc{ee} is used as a representative for an \gls{osek} compliant \gls{os}
in this thesis. It is a sufficient choice because of the available source code
and the permissive license. For \gls{ee}, it could be shown that a mapping
from software to system entities is feasible.
However, \gls{osek} has been taken over from \gls{autosar}. Since
\gls{autosar} is a superset of \gls{osek} the reasoning for most system
entities is legitimate for both \gls{os} standards. Nevertheless, \gls{autosar}
introduces new synchronization patterns (of which some have been adopted by
\gls{ee}) and it would be interesting to know if a mapping is possible for
those new techniques as well.
Additionally, a complete mapping could only be created because the source code
of \gls{ee} is freely available. It would be interesting to know if the same
approach is feasible for a commercial \gls{os} that does not make its source
code available. This is an important question to answer since the automotive
industry relies predominantly on commercial \glspl{os}.
\subsubsection{Validate Mapping With Real World Applications}
Finally, the feasibility of the software to system mapping has been shown and
validated for several test applications. One part of those applications was
created manually to cover specific test cases, the other part was created
randomly. However, all test applications have in common that they do not
execute real functionality. Instead, dummy instructions are used to simulate
runtime that would emerge on real hardware due the computation of algorithms
and feedback loops.
It may be possible that the trace capability of the tested hardware is limited
for real applications. If this is the case the mapping introduced in this
thesis may not be completely applied in the real world for example because
the bandwidth for recording \gls{os} data events is limited. To investigate
this question industrial case studies should be conducted based on the
approaches discussed in this thesis.
\subsubsection{Trace a Multi-ECU Setup}
In many environments microcontrollers operate in big networks. For example, in
modern cars up to 70 ECUs are installed and connected via at least five
different field bus systems \cite{maxmaster}. In such systems correct system
performance is not only dependent on the behavior of a single controller, but
also on the interaction of the system as a whole. The ability to trace
multiple ECUs in parallel would provided enormous benefits in the analysis and
validation of multi-ECU systems.
In order to get meaningful results from the analysis of a multi-ECU trace it is
mandatory that the timestamps from all ECUs are synchronous. Otherwise, the
delay between different processor would result in wrong evaluation metrics and
no valid conclusions could be drawn. Therefore, the feasibility of a multi-ECU
trace environment is an interesting and important topic for future work.

View File

@@ -0,0 +1,515 @@
\chapter{Hardware Trace Measurement}
\label{section:trace_measurement}
Computer systems can be analyzed with measurement tools that detect events,
i.e.\ changes in the state of a system \cite[p. 28]{ferrari1978computer}. The
same event can be interpreted on different levels as shown in
\autoref{fig:trace_event_levels}. A hardware trace tool can detect a voltage
change in memory, e.g.\ triggered by the processor which is a hardware event.
Accordingly, the variable that maps to the changed memory register changes too
which is a software event. If this variable is related to the state of a task,
a change of the variable also means a change of the task state which is then
called a system event.
In many cases, the event of interest cannot be measured directly. One or more
transformation steps are required to retrieve the required result. If a
transformation process is executed the measurement is said to be indirect
\cite[p. 28]{ferrari1978computer}. Considering the previous example a task
termination event cannot be measured directly. However, a variable that
contains the current task state can be measured. If the task corresponding
to the variable and the mapping from value to task state is known, a change of
the variable can be transformed into a higher level event the termination of a
task. After the transformation process the measurement results can be
displayed to the user as shown in \autoref{fig:concept_measurement}.
\begin{figure}[]
\centering
\includegraphics[width=\textwidth]{./media/trace/concept_measurement.pdf}
\caption[Measurement process]{The conceptual parts of a measurement process
according to Ferrari \cite{ferrari1978computer}. A sensor measures data. One
or more transformation steps are required if the data is not yet in the
desired format. Finally the result can be presented to the user.}
\label{fig:concept_measurement}
\end{figure}
During the transformation step the collected data may be manipulated which is
called prereduction. Prereduction may for example be used when the actual
event is not required, but rather the amount of events of a certain type that
occurred. For this case the transformer would increment a counter whenever a
certain event type is collected. If no prereduction is executed, the
measurement process is called tracing. Tracing is the process of recording a
sequence of events in chronological order of occurrence \cite[p.
30]{ferrari1978computer}. The result of this process is called a trace.
\section{Trace Tools}
Ferrari \cite[p. 31ff]{ferrari1978computer} distinguishes three trace
measurement tools: software, hybrid, and hardware tools. All tools are meant
to examine the behavior of a system. However, there are differences in
interference, resolution, and cost as summarized in
\autoref{tab:trace_tool_overview}.
If a measurement tool uses resources of the target system it causes
interference by using computational power and memory that could otherwise be
utilized by the application. A tool that causes interference is said to be
intrusive and may cause degradation, a reduction in performance of the target
system \cite[p. 29]{ferrari1978computer}. Consequently, intrusive trace tools
change the real-time behavior of an application.
An event can be represented on different levels. A voltage level change in
memory can map to a variable which can map to the state of a task as
visualized in \autoref{fig:trace_event_levels}. Those levels are called
hardware level, software level, and system level. To clarify the level of a
trace, it can be mentioned explicitly. For instance, a trace consisting of
hardware level events is a hardware level trace \cite[p. 29f]{felixproject2}.
Tools that can detect hardware events occurring at a microscopic level are
said to have a higher resolution than tools that can detect software events
only.
\begin{figure}[]
\centering
\includegraphics[width=\textwidth]{./media/trace/trace_event_levels.pdf}
\caption[Measurement levels]{A measurement event can be interpreted on
different levels. A voltage change in memory can be detected by a hardware
trace tool capable of supervising the memory bus that triggers the voltage
change. The memory section can relate to a variable, that changes in
consequence of the voltage change, which is a software event. If the variable
is related to the state of a task, a change of the variable also means a
change of the task state which is then called a system event.}
\label{fig:trace_event_levels}
\end{figure}
Different trace techniques can detect and record events with different
frequencies. The maximum frequency is usually not limited by the speed with
which events can be detected, but by the available bandwidth to process and
record the detected events.
The cost of different trace tools depends on several factors, the price for
hardware and software licenses, the price for installing and maintaining the
tool, educational costs, like training for the users of a tool, and the costs
of operating the tool.
\textbf{Software tools} add instructions to a hardware-software system in order
to detect and record events of interest. Added instructions are called
instrumentation. The simplest kind of instrumentation is a classical write to
the standard output interface, e.g.\ a \lstinline{printf} statement in the C
programming language. Instructions may be added to the application code
directly, via the compiler or post compilation via dynamic binary
instrumentation \cite{trumper2012maintenance}\cite{felixarc2015}. If no
standard output interface is available, events are recorded into memory on
target. From there they can be read out via debugger or serial interface.
Instrumentation always interferes with the application. There are two
components of interference, a space, and a time component \cite[p.
44]{ferrari1978computer}. Execution of instrumentation code takes time and
storing detected events uses memory space. Software tools have a low
resolution because they cannot detect events on a hardware level. Event
detection frequency is limited by the available computational resources. On
the upside they are usually cheap and easy to implement and use.
\textbf{Hardware tools} do not rely on instrumentation which means that they
are non intrusive and do not interfere with the application
\cite{felixarc2014}. Hardware tracing works via a dedicated trace device chip
that is located on the silicon of the CPU\@. Trace devices provide a very high
resolution since they are capable of detecting events at hardware level
\cite{mink1989performance}. Additionally the event detection frequency can be
as high as the actual system frequency, thus it is possible to record a
complete hardware-software system in real-time. Hardware tools are more
expensive compared to software solutions. Installation and maintenance are
more complex and require properly qualified users.
\textbf{Hybrid tools} rely on instrumentation and a dedicated hardware
interface to record events. The boundary between software, hybrid, and
hardware tools can be fuzzy in certain cases. Software tools need some kind of
hardware interface to send recorded traces off-chip. In this sense, all
software tools are hybrid tools. However, industry hybrid solutions often
require proprietary target interfaces which justifies why these tools fit into
a separate category \cite{richterganzheitliche}. Compared to pure software
tools, hybrid tools interfere with the system to a lesser extent
\cite{nacht1989hardware}. A dedicated hardware interface allows it to send
events off-chip in real-time. Consequently, more memory becomes available on
target.
As shown in \autoref{tab:trace_tool_overview} hardware trace tools have many
advantages over hybrid and software based solutions. Hardware tracing does not
interfere with the system, which is especially important for real-time systems.
Hardware trace tools are capable of detecting events with a higher resolution
and frequency. Additionally the trace duration of software and hybrid traces
is limited to the available memory on target and to the trace interface
bandwidth. When the same quantity can be measured by a hardware and a software
tool, the values obtained by the hardware tool are usually to be considered
more accurate because of the lower interference \cite[p.
45]{ferrari1978computer}.
\begin{table}[]
\centering
\begin{tabular}{r|c c c}
& Software & Hybrid & Hardware \\
\hline
Interference & high & low & no \\
Resolution & low & low & high \\
Cost & low & low & high \\
Frequency & low & low & high \\
\end{tabular}
\caption[Trace techniques]{Properties of different trace
measurement tools \cite[p. 6]{felixproject1}. Hardware tools are superior
to software and hybrid tools but come with higher expenses.}
\label{tab:trace_tool_overview}
\end{table}
\section{Hardware Tracing}
\label{subsection:hardware_tracing}
Hardware tracing is capable of recording events on hardware level. A dedicated
on-chip trace device and trace interface is required to record hardware events
and send them off-chip \cite{mink1990multiprocessor}. Target access hardware
is connected to the trace interface to readout the trace measurement results.
From there the events are forwarded to a host computer for further processing.
Software that runs on the host computer in order to analyze the recorded trace
data is provided by the target access hardware vendor \cite{winidea}. The term
host software is used to refer to such applications.
The on-chip trace device is designed to record hardware events executed by the
microcontroller. It occupies a separate section on the silicon. Usually a
controller is delivered in two versions, one with and one without trace device.
In production the ability to execute trace measurement is not required
\cite{felixarc2014}. Therefore, the trace device would only increase chip
costs without providing any benefits.
\begin{figure}[]
\centering
\includegraphics[width=\textwidth]{./media/trace/tc27_emulation_device.png}
\caption[Infineon TC27x trace device]{A microcontroller with hardware trace
support consists of two sections. A regular product chip part and the trace
device part. The trace device part can be omitted in the production version
of a chip to save costs \cite{tc27block}.}
\label{fig:tc27_emulation_device}
\end{figure}
\autoref{fig:tc27_emulation_device} shows the trace device of the Infineon
TC27x microcontroller family \cite{tc27x}. The upper part belongs to the
product chip while the lower part displays the trace device. The trace device
can gather data from the product part via two interfaces. \glspl{pob}
(\glsdesc{pob}) record processor events while \glspl{bob} record bus events.
All events are collected, enhanced with a timestamp and buffered in the on-chip
trace memory. From there they are sent off-chip via the dedicated trace
interface.
\begin{figure}[]
\centering
\includegraphics[width=\textwidth]{./media/trace/timestamp_generation_event.pdf}
\caption[Timestamp per event]{Each trace event is assigned a timestamp
relative to the previous event. By summing up the relative timestamps
absolute values can be generated.}
\label{fig:timestamp_generation_event}
\end{figure}
\begin{figure}[]
\centering
\includegraphics[width=\textwidth]{./media/trace/timestamp_generation_dedicated.pdf}
\caption[Dedicated timestamp generation]{Via dedicated timestamp events, the
timestamps of the other events can be interpolated. In this example two
events are recorded between the previous and the next timestamp event. This
is why both events get the same timestamp, based on these events. The value
is calculated via \autoref{eq:timestamp_interpolation} as $t_i = 5 +
\frac{(15-5)}{2}=10$.}
\label{fig:timestamp_generation_dedicated}
\end{figure}
\begin{figure}[]
\centering
\includegraphics[width=\textwidth]{./media/trace/timestamp_generation_io.pdf}
\caption[Timestamp via \gls{io}]{Dedicated \gls{io} pins can be used to output
a timestamp value whenever a measurement event is sent off-chip.}
\label{fig:timestamp_generation_io}
\end{figure}
There exist different techniques to add timestamp information to a trace event.
The obvious way is shown in \autoref{fig:timestamp_generation_event}. A
timestamp is added to each trace event that is sent off-chip. To save
bandwidth timestamps are provided relatively to the previous event. An
absolute value is computed by summing up all previous timestamp.
Another way is to send dedicated timestamp messages as shown in
\autoref{fig:timestamp_generation_dedicated}. The timestamps for the actual
trace events are then interpolated, e.g., via the equation
\begin{equation}
\label{eq:timestamp_interpolation}
t_{i} = t_p + \frac{(t_n - t_p)}{2},
\end{equation}
where $t_p$ is the previous timestamp (the latest timestamp before the event),
$t_n$ the next timestamp (the soonest timestamp after the event) and $t_i$ the
timestamp interpolated based on the dedicated timestamp events.
Finally, timestamps can also be created via dedicated \gls{io} pins as
specified by the Nexus \cite{turley2004nexus} standard. This means that
whenever a trace event is sent off-chip via the trace interface, the current
timestamp is provided via the \gls{io} pins as shown in
\autoref{fig:timestamp_generation_io}.
Cycle accurate timestamps are feasible with all timestamp generation
techniques. However, timestamp accuracy and resolution are only partly
dependent on the generation technique. More important factors are CPU and
trace device clock frequency, as well as the design of CPU and trace device.
For cycle accurate timestamps, trace device frequency must be greater or equal
to CPU frequency. Even if this is the case, cycle accurate time\-stamps cannot
necessarily be guaranteed.
For example, super scalar processors like the Infineon TC277 \cite{tc27x} are
capable of executing more than one instructions per cycle. However, only one
event can be processed per cycle by the trace device as shown in
\autoref{fig:timestamp_cycle}. The processor observation block filters the
instructions according to user specified filter rules and forwards them for
further processing. If two instructions, executed during the same processor
cycle, match the filter and are thus forwarded to the trace device, one of
those instructions is delayed by one cycle (in this example Instruction 2.1).
For a processor running at \unit[100]{MHz} this would set the timestamp off by
\unit[10]{ns} for this particular event.
\begin{figure}[]
\centering
\includegraphics[width=\textwidth]{./media/trace/timestamp_cycle.pdf}
\caption[Timestamp generation accuracy]{Even if the trace device runs at CPU
clock frequency, cycle accurate timestamps cannot be guaranteed.}
\label{fig:timestamp_cycle}
\end{figure}
The design of trace devices differs depending on the processor family and the
processor vendor. However, the general concept and provided functionality are
the same for all devices. Various standards for the implementation of
trace devices are specified and used by chip vendors. Three common standards
are Nexus used by PowerPC processors \cite{turley2004nexus}, \gls{etm}
(\glsdesc{etm}) used by ARM processors \cite[p. 476]{yiu2013definitive}, and
the \glsdesc{imds} \cite{stollon2011infineon} discussed here and shown in
\autoref{fig:tc27_emulation_device}.
According to \autoref{fig:concept_measurement}, a measurement process starts
with the detection of an event by a sensor. In case of the trace process the
sensors are the \glspl{pob} and \glspl{bob}. Each \gls{pob} monitors the
instructions executed by one processor core. This means the complete program
flow executed by a processor core can be recorded. \glspl{bob} are connected
to the data busses of the microcontroller and can detect memory access events.
A memory access event may be for example, writing to a variable or reading
from a special function register. A typical data trace event contains in
addition to the timestamp, details like address, data value, transfer size, and
whether a read or write access occurred \cite{hopkins2006debug}.
Filters can be specified by the user to reduce the amount of recorded trace
events. They can be set for an address or for an address range. Different
events can be executed if an address filter matches: the corresponding event
can be recorded, discarded or another event can be triggered. For example, it
is possible to start or stop the trace process if a specific function is
accessed or a variable is written. Filter configuration is done via the host
software.
Corresponding to the two main hardware event types, instruction, and data
access events, two hardware trace techniques can be distinguished, program flow
trace and data trace \cite{felixarc2014}. The two trace techniques can be
executed in parallel or individually as configured by the user.
A \textbf{program flow trace} (also called function trace) shows the complete
execution path of an application for the duration of the trace recording. This
means it is possible to detect when a certain function is called or which
branch of an if statement is executed. The amount of instructions and the
resulting data stream bandwidth produced by a modern CPU is too big to be
transmitted via the trace interface. To solve this problem trace devices use
trace compression. The most commonly used program flow trace compression
technique works by detecting and recording only such instructions that cause a
change in program flow such as conditional jumps and traps
\cite{hopkins2006debug}. Using the application binary the host software is
able to reconstruct the complete program flow.
A \textbf{data trace} is a sequence of data access events. Data tracing allows
it to supervise and to debug the state of variables in memory. Data tracing of
all active units is becoming increasingly important because not all data
interactions involve a processor \cite{mayer2003debug}. Thus, trace devices
must also be able to detect memory accesses via \gls{dma} (\glsdesc{dma}) and
accesses to memory of special on-chip modules like FlexRay or Ethernet. The
units that are supported by a microcontroller are depended on the trace device,
but all trace devices support tracing the main memory of a controller.
Compression is also applied to data traces. However, those techniques are
usually not sufficient to record a complete data trace of significant length
since the amount of generated data is too big. The best way to solve this
problem is to apply filters to avoid detecting and recording data events in
memory sections that are not of interest \cite{hopkins2006debug}.
A recorded hardware trace event is buffered into an on-chip trace memory. From
there the events can be read via the trace interface. On-chip trace memories
can be operated in different modes \cite{felixarc2014}. In continuous mode
the trace data is streamed of chip in real-time. This technique is limited by
the bandwidth of the trace interface. If it is high enough the trace duration
is only depended on the available memory on the host computer and traces of
arbitrary length can be recorded. If the bandwidth is too small to process the
recorded trace stream \emph{buffer mode} must be used. This means the recorded
trace is written into trace memory and read out by the target access hardware
post tracing. Buffer mode can be used in pre- and post-trigger mode. In
pre-trigger mode the trace buffer is filled like a circular buffer. The oldest
events are discarded for new events. The trace process can be stopped at an
arbitrary point in time and the latest trace events become available. In
post-trigger mode the trace process is stopped as soon as the buffer has been
filled for the first time.
A trace device operated in buffer mode is limited by the available trace
memory. The trace memory size of an Infineon TC275 microcontroller
(\autoref{fig:workbench} a)is \unit[2]{MB} which allows for approximately
\unit[33]{ms} of unfiltered function and data trace of a single processor core
running at \unit[200]{MHz} \cite{felixarc2014}. Depending on the measurement
use case this may be sufficient or not. If the trace duration should be
increased tracing in continuous mode is mandatory. Continues tracing requires
a high bandwidth interface such as \gls{agbt} (\glsdesc{agbt}).
\section{Hardware Trace Toolchain}
Multiple steps are required from recording a hardware trace on target to
presenting it to the user on a personal computer as shown in
\autoref{fig:toolchain}. Many different solutions exist for each of those
steps. Nevertheless, the basic functionalities provided by all solutions is
comparable to each other.
\begin{figure}[]
\centering
\includegraphics[width=\textwidth]{./media/trace/toolchain.pdf}
\caption[Trace toolchain]{Recording a hardware trace and making it
available to the user requires multiple steps. Hardware events must be
measured on target via a trace device. Using a trace interface the recorded
data can be readout by the target access hardware and transmitted to a host
computer. Target access hardware vendors provide special software to analyze
and visualize the recorded trace.}
\label{fig:toolchain}
\end{figure}
The basic prerequisite for executing a hardware trace is the availability of an
on-chip trace device. All major chip vendors provide trace devices for their
microcontrollers that support program flow and data trace.
\autoref{tab:trace_devices} gives an overview of the state-of-the-art trace
solutions.
\begin{table}[]
\centering
\begin{tabular}{r|c c c}
Standard & Architecture & Function Trace & Data Trace\\
\hline
Nexus &
PowerPC &
\begin{tabular}[x]{@{}c@{}} Branch Trace \\ Messaging \end{tabular} &
\begin{tabular}[x]{@{}c@{}} Data Trace \\ Messaging \end{tabular} \\
\hline
\gls{etm} &
ARM &
\begin{tabular}[x]{@{}c@{}}Program Trace \\ Macrocell \end{tabular} &
\begin{tabular}[x]{@{}c@{}}Embedded Trace \\ Macrocell \end{tabular} \\
\hline
\gls{imds} &
TriCore &
\begin{tabular}[x]{@{}c@{}}Processor \\ Observation Block \end{tabular} &
\begin{tabular}[x]{@{}c@{}}Bus \\ Observation Block \end{tabular} \\
\end{tabular}
\caption[Trace devices for different architectures]{Trace devices exist for
different CPU architectures. All solutions provide methods for recording
program flow and data traces.}
\label{tab:trace_devices}
\end{table}
Events that have been recorded by the trace device are sent off-chip via a
dedicated trace interface. If the bandwidth provided by an interface is lower
than the transfer rate of created events continuous tracing is not possible.
However, this use case is often required. There are two ways two solve this
problem. The amount of created trace data can be reduced using filters or the
available bandwidth can be increased. If an entire application must be
analyzed as a whole the first way is not an option.
\begin{table}[]
\centering
\begin{tabular}{r|l c}
Interface & Pros/Cons & DAQ rate \small{$[MB/s]$}\\
\hline
JTAG &
\begin{tabular}[x]{@{}l@{}}
$+$ Reuse of existing interface \\
$+$ Small chip area \\
$-$ Low bandwidth \\
\vspace{1mm}
\end{tabular} &
1.2 \\
DAP2/SWD &
\begin{tabular}[x]{@{}l@{}}
$+$ High bandwidth with few pins \\
$+$ Small silicon area \\
$-$ Proprietary \\
\vspace{1mm}
\end{tabular} &
10 \\
\gls{agbt} &
\begin{tabular}[x]{@{}l@{}}
$+$ Very high bandwidth with few pins \\
$-$ Large silicon area \\
$-$ High cost \\
\vspace{1mm}
\end{tabular} &
30 \\
CAN &
\begin{tabular}[x]{@{}l@{}}
$+$ Robust and well known standard \\
$+$ Low cost \\
$-$ Very low bandwidth \\
\end{tabular} &
0.05 \\
\end{tabular}
\caption[Trace interfaces]{Commonly used trace interfaces and their \gls{daq}
(\glsdesc{daq}) rates. \gls{agbt} (\glsdesc{agbt}) is the only interface
capable of recording continuous hardware traces of a complete system.}
\label{tab:interfaces}
\end{table}
Mayer et al.\ \cite{interfaces} give an overview of trace interfaces used in
the automotive industry as shown in \autoref{tab:interfaces}. \gls{jtag}
(\glsdesc{jtag}) is a common debug standard \cite{ieee5001}, suitable for
regular debugging. It can be used to read out a buffered traced post tracing,
but for continuous tracing it is not sufficient due to its low bandwidth of
\unit[1.2]{MB/s}. Because of that DAP and DAP2 were developed by Infineon and
SWD by ARM\@. Both protocols are based on \gls{jtag} but use a higher
frequency and improved communication protocols to provided more bandwidth.
\gls{agbt} is currently the fastest trace interface. It was specified by
XILINX and adopted by the Nexus standard. \gls{agbt} is the only interface
which is theoretically capable of recording a continuous trace of a complete
application running on a processor with a frequency of \unit[200]{MHz}. CAN is
used by some hybrid trace tools but is only mentioned for completeness since
its bandwidth is too low to be considered for hardware tracing.
\begin{figure}[]
\centering
\includegraphics[width=\textwidth]{./media/trace/workbench.png}
\caption[Trace workbench]{A complete trace workbench. An Infineon TriCore
evaluation board (a) can be traced by the iSYSTEM iC6000 (b) or the Lauterbach
PowerTrace-2 (e) via the highspeed \gls{agbt} interface. Host software is
used to control the hardware and to analyze the recorded trace, for example
WinIDEA (c) by iSYSTEM and TRACE32 (d) by Lauterbach \cite{maxmaster}.}
\label{fig:workbench}
\end{figure}
Target access hardware is connected to the hardware interface to readout
recorded trace events. From the target access hardware the data is transmitted
to a host computer for further analysis via USB 3.0 or Ethernet. Examples for
target access hardware are the iC6000 by iSYSTEM \cite{ic6000}
(\autoref{fig:workbench} b) and the PowerTrace-II by Lauterbach
\cite{powertrace2} (\autoref{fig:workbench} e). Both devices support
different architectures and trace interfaces by using architecture specific
debug cables. Besides reading hardware traces those devices also support all
functionalities provided by a regular debugger such as step wise debugging,
reading of memory content, and manipulation of CPU configuration registers.
Dedicated software on the host computer is used to configure and control the
target access hardware and the trace device itself. After recording, this
software transforms the recorded hardware trace into a software trace (see
\autoref{fig:trace_event_levels}). For this process the host software must
have access to the \gls{elf} file of an application. This is required to map
the addresses of hardware trace events to the corresponding software entities.
Based on the software trace, different analysis techniques such as metric
evaluation, performance analysis, and code coverage are supported. Gantt
charts are provided to examine the trace visually. Via export functions a
software level program flow and data trace can be made available for external
tools. \autoref{fig:workbench} shows the toolchain described in this section.

302
content/introduction.tex Normal file
View File

@@ -0,0 +1,302 @@
\chapter{Introduction}
\label{chapter:Introduction}
Embedded applications are increasingly required to provide real-time
performance \cite{hopkins2006debug}. This means that the correct behavior of a
system is not only dependent on the logical results of a computation, but also
on the physical instant in which these are produced \cite{kopetz2011real}. For
hard real-time applications violation of a deadline will result in damage of
the system or its environment \cite{tokuda1990real}.
Due to the pervasive nature of embedded systems and their use for critical
applications, e.g., medical devices or advanced driver assistance systems,
measures to ensure the correctness of time dependent functionality must be
taken \cite{konrad2005real}. Therefore, debugging and validation are a
fundamental part of the development process of such applications
\cite{dixon2013advantages}.
Different techniques to debug embedded systems exist \cite{schneider2004ten}.
The simplest one is a classical \lstinline{printf} statement in C (or the
equivalent in another language). More sophisticated debug technologies require
on-chip debug logic in the embedded processor. On-chip debug generally
supports two different types of functionality: run-control debug and real-time
trace \cite{dixon2013advantages}.
The former allows it to stop and examine the state of a system at points of
interest, so called breakpoints. This approach is intrusive, or in other words
changes the runtime behavior of an application. This is not acceptable for
time critical applications, e.g., engine control units that require continuous
execution of the processor in order to control feedback loops and to maintain
mechanical stability \cite{dixon2013advantages}.
Real-time trace recording or tracing however, allows it to analyze and debug
a system without stopping the execution. It works by recording processor
events such as function calls and data accesses. The captured events can be
used to reconstruct and analyze the runtime behavior of an application.
Since timing is an integral part in the development of safe and secure
real-time applications, timing dependencies should be included in the software
interface specifications \cite{lutz1993analyzing}. One way to specify these
dependencies are timing requirements, e.g., the maximum response time for a
certain task \cite{deubzer2011robust}. Via tracing system engineers are
capable of validating those requirements on target.
\glsdesc{ta} (\gls{ta}) provides the \gls{ta} Tool Suite, a collection of tools
for the system design, simulation, automated optimization, and target
verification of embedded real-time multi-core and many-core systems
\cite{tatoolsuite}. These features work on the basis of system models.
Consequently, requirements are defined for system entities such as tasks,
runnables, signals, and semaphores.
On the contrary, trace recording produces events on software level. This means
a trace contains information about function entries and exits, and data read
and write accesses. As a consequence, the specified system requirements cannot
be evaluated.
However, by mapping software events to the corresponding system events it is
possible to transform a software to a system level trace. \glsdesc{btf}
(\gls{btf}) is a trace format on system level and is used in this thesis
because of its native support for multi-core environments. To the best of my
knowledge, the possibility of a software to system mapping has only been shown
for a small subset of all entities specified by \gls{btf}.
In this thesis the feasibility of mapping all event actions contained in the
\gls{btf} standard is discussed, evaluated and validated. Furthermore,
different real-time trace techniques are discussed with respect to their
versatility for the timing analysis of embedded multi-core real-time
applications.
\section{Motivation}
\label{section:motivation}
Transformation of software events to system events is required for the timing
analysis of embedded real-time systems as discussed in the previous section.
Moreover, system traces can also be used for different other use cases which
are covered in the following.
\subsubsection{Simulation Validation}
A simulation can be executed for a timing model by the TA Simulator. The
resulting simulated trace can be evaluated to validate the compliance of an
application with the specified requirements.
A simulated and a hardware based system trace will never be equal by
definition because a model is an abstraction of reality. Nevertheless,
simulation supports engineers in validating system behavior in early design
stages. It can abstract complex problems and analyze non-deterministic system
behavior \cite{sifakis2003building}.
However, a simulation is still a software which is vulnerable to bugs and can
potentially produce wrong results. A deviation to reality due to the
abstraction cannot be classified as a wrong result, on the other hand an
implementation error can be.
Via tracing it is possible to validate the correctness of simulated traces.
This is especially useful if a new simulation feature is implemented. In this
case a system trace recorded from hardware can provide valuable insights in the
actual behavior.
\subsubsection{OS Overhead Measurement}
Another aspect that is relevant for the development of embedded applications is
the overhead caused by the operating system (\gls{os})
\cite{zeng2011mechanisms}. Overheads are execution periods where the processor
is not used by the actual application but by the \gls{os} for example,
context switches and inter-core communication mechanisms.
Especially for applications with a high processor utilization the additional
overhead caused by the \gls{os} plays a critical role. Fulfillment of timing
requirements may be feasible or not depending on the overhead
\cite{maxmaster}. In order to take this into consideration a good
understanding of the execution times required by \gls{os} routines is
necessary. System traces recorded on hardware allow it to determine the exact
execution times for these overheads easily.
\subsubsection{Model Reconstruction}
The initial creation of a timing model for an existing application is a tedious
process if it must be done manually. Model reconstruction can simplify this
task by creating a timing model automatically. It works by analyzing a
system trace recorded from hardware. By detecting common timing patterns in
the trace a model of the application can be created
\cite{sailer2014reconstruction}.
\section{Related Work}
The two main topics discussed in this thesis are tracing and hardware to system
mapping. While the former has been an important topic in the literature over
the last three decades, the necessity for the latter has only become important
in recent years.
\subsubsection{Tracing}
Ferrari \cite{ferrari1978computer} gives an comprehensive overview of major
computer performance evaluation techniques and their application to various
types of performance problems. In his book \emph{Computer Systems Performance
Evaluation} he distinguishes between three trace measurement techniques:
software, hybrid, and hardware based trace measurement. It is important to
understand that these techniques do not directly relate to the trace
abstraction levels discussed in the previous sections. The concepts described
in his book which was released in 1978 are still relevant today, the
implementation is outdated.
Mink et al.\ \cite{mink1989performance} discuss hardware based performance
measurement in more detail. They argue that hardware tracing is the only
sufficient trace technique for recording resource utilization information
because of the high signal speeds involved and the fact that not all signals
are visible to software measurement techniques. Resource utilization is
concerned with detailed information about the operation of the hardware such as
cache hit ratios and access delays. Moreover, they mention that software based
tracing is intrusive and thus changes the runtime characteristics of an
application.
Kraft et al.\ \cite{kraft2010trace} discuss trace measurement in the context of
five industrial projects. They argue that hardware trace solutions require
large, expensive equipment mainly intended for lab use. Additionally, they
claim that software based trace solutions can also remain active in
applications post-release. Based on this arguments they use a software based
trace measurement approach in their paper. They introduce a software
instrumentation approach with a very low overhead according to their
measurement results.
\subsubsection{Hardware to System Mapping}
Lauterbach \cite{lauterbach2015third} provides a possibility to export task and
runnable system events for traces recorded via hardware tracing. However,
their approach is limited to a subset of the existing task and runnable events.
For example, runnable preempt and resume, and task wait events are not covered
by the Lauterbach export even though this information is relevant for the
real-time analysis. Lauterbach uses the information from the \glsdesc{orti}
(\gls{orti}) files and relies solely on function trace events for the export.
Kraft et al.\ \cite{kraft2010trace} also discuss how task events on system
level can be recorded. They argue that it is difficult to detect which entity
blocks a task because the scheduling status of the \gls{os} only provides
information about the entity type blocking the task not the entity itself.
They suggest code instrumentation as a pragmatic solution to work around this
problem, admitting that this approach is problematic because the
instrumentation points have to be maintained by the developer.
\section{Interrogation}
\label{section:interrogation}
Timing analysis of embedded system requires a trace, i.e., a sequence of
events, with sufficient duration and timestamp accuracy. The minimum trace
duration is dependent on the application and requirements that should be
validated. Fundamentally, the longer the trace duration the more information
for the real-time analysis of the application are acquired. However, more data
requires longer processing times. Therefore, a trace duration of at least one
second is demanded in this thesis to provide a tradeoff between processing time
and sufficient length for the real-life use-cases discussed in
\autoref{section:motivation}.
Timestamp accuracy is important for the real-time analysis because if the
resolution is too low no meaningful analysis may be feasible. For example, if
events can only be recorded in the range of milliseconds, the analysis of
requirements in the microseconds range is not feasible.
Kraft et al. \cite{kraft2010trace} also state that a timestamp accuracy in the
milliseconds range is too coarse-grained for embedded systems timing analysis.
Especially for validation of simulation tools and model reconstruction cycle
accurate timestamps would provide enormous benefits. From these requirements
the first hypothesis that should be evaluated in this thesis can be derived.
\begin{hyp}
\label{hyp:1}
There exists a trace technique that allows recording of cycle accurate traces
for embedded multi-core real-time system with a duration of at least one
second.
\end{hyp}
Trace techniques output a trace on software level, i.e., a sequence of software
events. These events provide information about the code segments executed by
an application and the memory regions accessed. This information allows deep
insights into the runtime behavior of an embedded system, but is not sufficient
for its real-time analysis.
Traces on system level or in other words, sequences of system events are
required for the real-time analysis of embedded multi-core applications. In
the context of this thesis system events are defined as all events that are
contained in the \gls{btf} specification and not explicitly excluded in
\autoref{subsection:btf_entity_types}. With an understanding of the underlying
\gls{os} mechanisms it may be possible to map software to system events.
\gls{osek} and \gls{autosar} are common standards for the development of
applications in the automotive industry. These standards are discussed in more
detail later. \gls{osek} compliant operating systems feature a so-called
\gls{orti} file.
The aim of \gls{orti} is to make \gls{os} internal data visible to external
tools \cite{osekortia}. This means it is possible via \gls{orti} to relate
software level entities to their respective interpretation on system level. It
must be examined if a mapping for all \gls{btf} entities is feasible.
\begin{hyp}
\label{hyp:2}
A complete mapping from software to system entities is feasible based on the
information included in the \gls{orti} file for an \gls{osek} compliant
\glsdesc{os}.
\end{hyp}
If Hypothesis~\ref{hyp:2} does not hold other ways to achieve a complete
software to system mapping must be found. An \gls{os} must keep track of the
states of all relevant system objects internally. Otherwise, it would not be
possible to execute appropriate actions if required. For example, if one task
activates another one the \gls{os} must determine whether the corresponding
task is allowed to be activated or if the maximum number of activations has
already been exceeded.
By analyzing the internal data structures of an \gls{os} it may be possible to
construct a mapping from software to system entities. Considering the previous
example, there might be an \gls{os} data structure that keeps track of the
remaining activations for each task entity. If the field for a task is
incremented, an entity of the corresponding task terminates. If it is
decremented a new task instance is activated.
\begin{hyp}
\label{hyp:3}
A complete mapping from software to system entities is feasible for an
\gls{osek} compliant \glsdesc{os}.
\end{hyp}
\section{Outline}
In oder to transform a trace recorded from hardware to a trace on system level
an understanding of the underlying operating system mechanisms is required. An
\gls{os} standard commonly used in the automotive industry is \gls{osekos}. It
is discussed in \autoref{section:osekvdxos}.
The real-time behavior of an embedded multi-core application can be represented
by a system trace. Based on a system trace an application can be examined and
specified timing requirements can be validated. \gls{btf} is a system level
trace format and used in this thesis. It is discussed in
\autoref{chapter:btf}.
There exist different techniques to record traces of embedded applications. In
\autoref{section:trace_measurement} an overview of these techniques is
provided. It is then argued why hardware tracing is the only technique
sufficient for the validation of embedded real-time applications. Accordingly,
hardware tracing is then discussed in more detail.
On the basis of the information in \autoref{chapter:fundamentals} the mapping
between software entities and system entities is described in
\autoref{chapter:mapping}. Mapping is done for all \gls{btf} entities that are
relevant for the analysis of embedded multi-core applications as discussed in
\autoref{subsection:btf_entity_types}.
In \autoref{chapter:validation} the mapping is validated. For that reason
criteria to compare \gls{btf} traces are established in
\autoref{subsection:validation_techniques}. Based on these criteria simulated
traces and traces recorded hardware are compared and evaluated. This is done
in two steps. Firstly, test applications are created manually to cover all
possible \gls{btf} actions in \autoref{subsection:systematic_tests}. Secondly,
applications are created randomly to avoid selection bias in the creation of
test cases in \autoref{subsection:randomized_tests}.
Finally, the results of this thesis are discussed in
\autoref{chapter:conclusion} and possible topics for future work are outlined
in \autoref{chapter:future_work}.

927
content/mapping.tex Normal file
View File

@@ -0,0 +1,927 @@
\chapter{Mapping}
\label{chapter:mapping}
% {{{ Mapping Intro
Systems are analyzable on different levels of abstraction as shown in
\autoref{fig:trace_event_levels}. Depending on the use case, one or another
level is more sufficient to perform the required analysis. For example, a
hardware designer does not care about task states while a system engineer is
usually not interested in voltage levels of transistors in memory.
For the timing analysis of an embedded system a trace on system level is
required because timing requirements are usually specified for system entities
such as tasks or signals. Hence, system level traces contain the information
necessary to validate an application with respect to its timing behavior.
A trace long enough, so that all relevant entities appear with sufficient
frequency for the timing analysis, is required. For example, at least two task
instances must be activated in one trace to calculate the activate-to-activate
time. Additionally, it is important not to influence the timing of an
application by trace measurement. Consequently, the only sufficient trace
technique for the timing analysis of embedded systems is hardware tracing
according to \autoref{tab:trace_tool_overview}.
\begin{figure}[]
\centering
\centerline{\includegraphics[width=\textwidth]{./media/mapping/concept_measurement_btf.pdf}}
\caption[Hardware to \gls{btf} trace basic idea]{Hardware tracing records
events on hardware level. This is not sufficient for the timing analysis of
an embedded system. Thus, it is necessary to transform the hardware events to
system events. This requires two steps. In the first step hardware events
are transformed to software events. This step is done by the trace software
and requires the application binary. The next transformation step produces a
trace on system level, e.g.\ in the \gls{btf} format. An \gls{orti} file as
well as additional information that can for example, come from a timing
model file (\gls{rte}) are required for this step.}
\label{fig:mapping_concept}
\end{figure}
Hardware tracing records events on hardware level. As stated above this level
is not sufficient for the timing analysis of an embedded system. Thus, it is
necessary to transform hardware events to system events as shown in
\autoref{fig:mapping_concept}. Two steps are required for this transformation.
Hardware level events must be transformed into software level events which are
then further processed into system level events.
The first step is done by the trace software. It is capable of analyzing and
interpreting the hardware events that are recorded from the processor. Via the
application binary files it is possible to map the raw memory addresses
contained in the hardware events to the corresponding symbols of the real
application as depicted in \autoref{fig:hardware_software_idea}.
\begin{figure}[]
\centering
\centerline{\includegraphics[width=\textwidth]{./media/mapping/hardware_software_idea.pdf}}
\caption[Hardware event to software event idea]{The trace software is capable
of transforming a hardware level event to a software level event. This
involves for example, changing memory addresses with the actual symbol names
based on the application binary (\gls{elf} file). Further actions may be
required depending on the trace device. Note that the displayed hardware
event is just a generalization, the actual structure can be different
depending on the trace device vendor.}
\label{fig:hardware_software_idea}
\end{figure}
Depending of the trace device further steps may be required. For example, some
trace devices produce timestamps, relative to the previous event which must
then be transformed into absolute timestamps. Another example are program flow
traces. Hardware level program flow events are usually only recorded for
instructions that change the flow of an application as described in
\autoref{subsection:hardware_tracing}. Only with the application binary it is
possible for the software to reconstruct a complete program flow trace.
Based on the software level trace, a system level trace can be generated in the
next step. A suitable system level trace format is \gls{btf} which is
described in \autoref{chapter:btf}. It is capable of representing the behavior
of an application in a way that is eligible for its timing analysis. Different
additional information, e.g.\ the \gls{orti} file is required to
execute the transformation from software to system level trace.
% }}}
% {{{ Mapping Proceeding
\section{Mapping Proceedings}
Transformation from hardware to software level is done by the trace software.
Corresponding to the composition of an on-chip trace device it creates two
types of traces on software level: a data trace and a function trace.
Let $i$ be an index in $\mathbb{N}_{0}$ denoting an individual event
occurrence. Then a data event can be defined as an octuple
\begin{equation}
\label{eq:data_event}
d_{i} = (t_i, \pi_i, a_i, v_i, c_i)
\end{equation}
where $t_i \in \mathbb{N}_{0}$ is the timestamp in nanoseconds, $\pi_i$ is the
name of the accessed variable, $a_i \in \{R, W\}$ is the way in which the
variable is accessed either $R$ for read or $W$ for write, $v_i \in
\mathbb{N}$ is the value that was read or written and $c_i$ is the core name
on which the access has occurred.
Consequently, a data trace can be defined as a sequence of data events where
$n \in \mathbb{N}_{0}$ is the number of events in the trace.
\begin{equation}
\label{eq:data_trace}
D = (d_1, d_2, \dots, d_n)
\end{equation}
Let $j$ be an index in $\mathbb{N}_{0}$ denoting an individual event
occurrence. Then a function event can be defined as a quadruple
\begin{equation}
\label{eq:function_event}
f_j = (t_j, \pi_j, \theta_j, c_j)
\end{equation}
where $t_j \in \mathbb{N}_{0}$ is the timestamp in nanoseconds, $\pi_j$ is the
name of the accessed function, $\theta_j \in \{A, \Omega\}$ indicates
whether the function has started ($A$) or terminated ($\Omega$), and $c_j$
is the core name on which the function event has occurred.
After that a function trace can be defined as a sequence of function events
where $m \in \mathbb{N}_{0}$ is the numbers of events in the trace.
\begin{equation}
\label{eq:function_trace}
F = (f_1, f_2, \dots, f_m)
\end{equation}
Based on \autoref{eq:btf_trace}, \autoref{eq:data_trace}, and
\autoref{eq:function_trace} the goal is to describe a function $g$ so that
\begin{equation}
g: (D,\, F) \rightarrow B,
\end{equation}
where the timestamps $t$ of the events in $D$, $F$, and $B$ are relative to the
same point in time. However, $D$ and $F$ alone are not sufficient for the
transformation from software to hardware level because of three reasons.
Firstly, the events on software level do not provide enough information to
decide which variable maps to a certain entity on system level. For example,
the state of each task is stored in a certain variable. Whenever the state
changes, this variable changes too and a data event is generated. However,
the transformation function does not know that the variable maps to the state
of a task. Because of that the \gls{orti} file described in
\autoref{subsection:osek_oil_and_orti} is required. Via this file it is
possible to relate variables to the corresponding system objects.
Secondly, not all entity types specified by \gls{btf} for example,
runnables and signals are included in the \gls{orti} file. The former are
included in the function trace, the latter in the data trace. But if the
transformation function is not able to distinguish regular functions from
runnables and regular variables from signals this information cannot be used.
Thus, it is necessary to provide a list of those entities to the transformation
function.
Finally, it is necessary to keep track of the internal state of an application.
If the \gls{orti} file is available it can be detected that a certain task has
changed its state. Consequently, a \gls{btf} event must be generated. Without
the knowledge about the previous task state however, it is not possible to
decide which task action has occurred. If the task changes into the running
state, this could mean that the task has started for the first time resumed
from ready state or continued to run after polling a resource.
Because of this reasons the function $g$ must be redefined as
\begin{equation}
g': (D,\, F,\, o,\, l,\, S) \rightarrow (B,\, S')
\end{equation}
where $o$ is the \gls{orti} file of the traced application, $l = (l_r,\, l_s)$
is a tuple that contains a list of runnables $l_r$ and a list of signal names
$l_s$, and $S$ and $S'$ are the system states before and after the
transformation. The information must be part of the system state $S$ is
discussed in the next sections.
% }}}
% {{{ ORTI Mapping
\section{ORTI Mappings}
\textbf{Task} entities are capable of executing twelve actions according to
\autoref{fig:process_state_chart} plus the additional notification event if the
\gls{mta} limit is exceeded. The lifecycle of a task entity starts with its
activation.
An \textbf{activation} can be detected via the \gls{orti} \emph{task status}
attribute. If no other task instance of the same task entity is active in the
system, a task whose state changes to ready is activated. However, this does
not work if a task instance of the same task is already active in the system.
This can happen if multiple task activations are allowed by the
\glsdesc{osekcc}. In case of a \gls{mta} the corresponding \gls{osek}
\emph{task status} attribute already indicates an active state (any state that
is not suspended) and will not change to ready again.
Consequently, another way to detect task activations is required. Via the
\gls{orti} \emph{currentactivations} attribute, the number of open activations
for each task can be detected. Whenever this attribute is incremented, a new
task activation \gls{btf} event must be created. Therefore, it is necessary to
keep track of the number of activations for each task entity in the system.
Only if the previous number of activations for a task is known, it is possible
to decide whether the value is incremented or decremented when a new data write
event occurs. Thus, the number of current activations for each process is a
relevant information and must be part of the system state $s$.
Since tasks have a lifecycle it is necessary to keep track of the instances for
each task entity. Whenever a new task is activated the instance counter must
be incremented and the counter value is assigned to the task. The same
procedure is necessary for all other entities that have a lifecycle. The
latest instance counter value for each entity must be available in the system
state $s$ to create correct \gls{btf} events. Additionally, it is necessary to
add newly created tasks to a list of task instances active in the system. When
a task's lifecycle ends, i.e., the task terminates, it is removed from this
list.
A \textbf{stimulus} is required to activate a task. Stimuli can be
\textbf{triggered} by process and by simulation entities. A stimulus triggered
by another process represents an \glsdesc{ipa} (\gls{ipa}). An \gls{ipa} is
implemented via the \lstinline{ActivateTask} service routine. The \gls{orti}
\emph{servicetrace} attribute can be used to detect when this routine is
executed. Whenever the \lstinline{ActivateTask} routine is entered and a task
is running on the same core a stimulus event is created with the task as the
source entity.
Alarms are the second way to activate tasks. The \emph{alarmtime} attribute
indicates how many ticks are left until an alarm expires. The \gls{orti} file
also contains the action that is executed by an alarm. Thus, a stimulus can be
triggered whenever an alarm that activates a task reaches an \emph{alarmtime}
value of zero.
\begin{table}[]
\centering
\begin{tabular}{r|l l}
Action & ORTI attribute & System state \\
\hline
trigger (ipa) & servicetrace (ActivateTask) & running task \\
trigger (alarm) & alarmtime & - \\
\end{tabular}
\caption[Stimulus event mapping]{In \gls{btf}, a stimulus must be triggered
so that it can activate a task. On target a task can be triggered via an
\gls{ipa} or by an alarm. The first can be detected via the
\emph{servicetrace} attribute, while the latter is indicated if the
\emph{alarmtime} attribute reaches the value zero.}
\label{tab:stimulus_mapping}
\end{table}
A triggered stimulus must be added to the system state. Later, when the actual
task activation is executed by the \gls{os} the latest stimulus is removed
from the system state and used to create a correct \gls{btf} event.
\autoref{tab:stimulus_mapping} summarizes how stimulus events are detected.
A \textbf{task} \textbf{start} event occurs if a task which was previously
active changes to running. There are two cases for which preempt and resume
actions must be created. The first case is a normal state change that can be
detected via the \emph{task status} attribute. A task is \textbf{preempted} if
the state changes from running to ready and \textbf{resumed} if the state
changes from ready to running.
However, the task state is not updated by the \gls{os} when a task is preempted
by an \gls{isr}. Consequently, a task preempt event must also be created, if
the \emph{runningisr2} attribute indicates that a new \gls{isr} is running on
the core and a resume event must follow once the \gls{isr} terminates
execution.
A task \textbf{terminate} event occurs if a running task changes into the
suspended state. The previous state must not be known because a task can only
be terminated from the running state.
However, there is a special case for task terminate events. As mentioned in
\autoref{subsection:osek_architecture}, a task with pending activations
switches directly into the ready state, after the current instance terminates.
To work around this problem it is necessary to detect when a certain task
instance executes the \lstinline{TerminateTask} service routine via the
\emph{servicetrace} attribute. If this happens a flag in the system state must
be set to indicate that the respective task instance has been terminated.
Whenever a task changes from running to ready this flag must be checked
to decide whether the corresponding event is a preemption or a termination.
A \textbf{wait} event occurs if a running task waits for an event that is not
set. In this case the \gls{os} will change the task state to waiting and the
task is removed from the core. A \textbf{release} event occurs once the event is set
and the \gls{os} changes the task state to ready.
\begin{code}
\begin{lstlisting}[caption={[Resource polling] The \gls{btf} polling state
indicates that a process is actively waiting for a resource. This listing
shows how this might be impolemented in C.},
label={listing:resource_polling}]
TASK(EngineManager) {
/* Wait actively until EngineResource becomes available. */
while(GetResource(EngineResource) != E_OK);
engineRPM = calculateEngineRPM();
ReleaseResource(EngineResource);
TerminateTask();
}
\end{lstlisting}
\end{code}
\textbf{Poll} actions are more difficult to detect, since they are not directly
related to a concept specified by \gls{osekos}. The idea of the \gls{btf}
polling state is to indicate that a task is actively waiting for a resource.
In code this can be implemented via a loop in which a resource is requested
repeatedly until it becomes available as shown in
\autoref{listing:resource_polling}.
Via \emph{servicetrace} and \emph{lasterror}
it can be detected that a process has requested a locked resource: the
\emph{servicetrace} attribute indicates when the \lstinline{GetResource}
service routine is called and \lstinline{E_OS_RESOURCE} is written to the
\emph{lasterror} attributed in case the resource is locked.
However, a single request does not necessarily mean that a change into the
polling state is happening. Instead a task might just execute one code
segment, if the resource is available and a different one, if it is not.
Therefore, it is necessary to set a \emph{previous request} flag for a task
instance that has requested a locked resource once. If another request follows
in the same running interval a poll event is generated. Once there are no more
requests, the last request must have been successful and a run event is created
to indicated the state change from polling to running. Then the previous
request flag must be cleared.
A \textbf{park} action must be created if a task that is in polling state is
changed into the ready state. Next, it is necessary to detect resource state
changes of the resource which the parking task has been polling. If the
respective resource changes into an unlocked state, a \textbf{release\_parking}
event is created. On the other hand, if the resource stays locked and the task
changes back into running state, a \textbf{poll\_parking} event is required.
The \textbf{mtalimitexceeded} notification event is the last task event that
must be detected. This event is created, if a task activation gets triggered,
but no actual task instance is added to the system. An \gls{osek} compliant
\gls{os} writes an \lstinline{E_OS_LIMIT} error into the \emph{lasterror}
attribute, if a task activation is triggered, but the maximal \gls{mta} value
is already reached. To create a valid \gls{btf} event it is necessary to know
for which task entity the error is created. Since \gls{orti} does not provide
this information the creation of \emph{mtalimitexceeded} events is not
feasible. \autoref{tab:task_mapping} gives an overview of the task mapping.
\begin{table}[]
\centering
\begin{tabular}{r|l l}
Action & ORTI attribute & System state \\
\hline
activate & currentactivations & currentactivations, last stimulus \\
start & state (running) & state (active) \\
resume & state (running) & state (ready) \\
resume & runningisr2 & running task \\
preempt & state (ready) & task not terminated \\
preempt & runningisr2 & running task \\
terminate & state (suspended) & active tasks \\
terminate & state (ready) & task terminated \\
wait & state (waiting) & - \\
release & state (ready) & state (waiting) \\
poll & lasterror & servicetrace, previous request \\
run & servicetrace & state (polling) \\
park & state (ready) & state (polling) \\
poll\_parking & state (running) & state (parking) \\
release\_parking & resource state & state (parking) \\
mtalimitexceeded & lasterror & entity cannot be detected \\
\end{tabular}
\caption[Task event mapping]{Different pieces of information are required to
detect all possible task actions. The states in the \gls{orti} attributes
column are \gls{osek} task states while the states in the system information
column are \gls{btf} process states. The previous state is necessary to
create correct events. For example, a task state change to running could
mean a \gls{btf} start, resume or run event.
For some actions, it is necessary to use multiple approaches to detect them.
For example, a task terminate event happens if the \gls{osek} state of
changes to suspended. However, if another entity of the same task is already
activated, a change to suspended does not occur. To catch this case it is
necessary to set a \emph{task terminated} attribute for a task instance when
it calls the \lstinline{TerminateTask} service routine.}
\label{tab:task_mapping}
\end{table}
\begin{table}[]
\centering
\begin{tabular}{r|l l}
Action & ORTI attribute & System state \\
\hline
activate & - & - \\
start & runningisr2 & \gls{isr} stack \\
resume & runningisr2 & \gls{isr} stack \\
preempt & runningisr2 & \gls{isr} stack \\
terminate & runningisr2 & \gls{isr} stack \\
\end{tabular}
\caption[\gls{isr} event mapping]{The \emph{runningisr2} attribute is used to
detect basic \gls{isr} actions. Because \glspl{isr} are not allowed to wait
for events, waiting state related actions must not be created. All other
actions can be detected in the same way as for task instances as shown in
\autoref{tab:task_mapping}.}
\label{tab:isr_mapping}
\end{table}
\textbf{\glspl{isr}} and tasks share the same \gls{btf} state model. However,
\gls{osek} does not specify a detailed state model for \glspl{isr} as it does
for tasks. Consequently, the basic process actions activate, start, resume,
preempt, and terminate are detected differently compared to task actions as
shown in \autoref{tab:isr_mapping}. \glspl{isr} are not allowed to wait for
events. Therefore, waiting related process state transitions must not be
considered. The detection of semaphore polling events works equally to task
events and is therefore not discussed again.
An \glsdesc{isr} is triggered by a hardware interrupt. This means if the
hardware detects a certain condition, e.g., an \gls{io} pin state changes from
high to low, the program flow is interrupted and a certain code section that is
mapped to this interrupt is executed. Depending on the trace device, it may or
may not be feasible to detect the activation of an interrupt via the
corresponding \gls{isr} control register.
In the former case, it is possible to create a stimulus and the resulting
activate event by detecting when the interrupt activate bit is set in the
corresponding control register. Otherwise, the \textbf{activate} event must be
created when the \gls{isr} changes into the running state for the first time.
In this case trigger, activate, and start event are all created with the same
timestamp.
\begin{figure}[]
\centering
\centerline{\includegraphics[width=\textwidth]{./media/mapping/isr_stacking.pdf}}
\caption[Running \gls{isr} stacking]{A stack can be used to track the active
\glspl{isr} in a system. This is necessary to create appropriate \gls{btf}
events. For example, the event when \emph{isr\_foo} is set as the running
\gls{isr}, is different, depending on the current state of the stack. If the
\gls{isr} is already on the stack, a resume event must be created, otherwise a
start event.}
\label{fig:isr_stacking}
\end{figure}
The currently running category two \gls{isr} is indicated by the
\emph{runninngisr2} \gls{orti} attribute. Each \gls{isr} has an unique
\gls{id} that is written into the variable, if the respective entity is
running. Otherwise runningisr2 is zero which indicates that no \gls{isr} is
active. Mapping from \gls{id} to name is included in the \gls{orti} file. If
\emph{runningisr2} changes to the \gls{id} of a certain \gls{isr}, it is not
possible to decide whether this instance runs for the first time or whether it
is resumed, after it has been preempted by an \gls{isr} with higher priority as
shown in \autoref{fig:isr_stacking}.
Therefore, it is necessary to keep track of the active \gls{isr} instances in
the system, e.g.\ via a stack. Whenever the value of \emph{runningisr2}
changes it is checked whether the corresponding \gls{id} is already on the
stack. If so, the \gls{isr} was already running and has been
\textbf{preempted}. Consequently, the \gls{isr} that caused the preemption has
terminated and must be popped off the stack. The \gls{isr} that has been
preempted must be \textbf{resumed}.
The other case is that the new \gls{isr} has not been running yet, i.e.\ is not
on the stack. This means that the \gls{isr} on top of the stack, if there is
one gets \textbf{preempted} and the new \gls{isr} is \textbf{started} and
pushed on the stack. If \emph{runningisr2} becomes zero the last \gls{isr} is
popped of the stack and \textbf{terminated}.
As the name indicates, \emph{runningisr2} is only written for category two
interrupt routines. Regular \glspl{isr} are not managed by the \gls{os} and
therefore not detectable via \gls{orti} attributes. Instead function trace
must be utilized to detect when a category one \gls{isr} is started or
terminated. To map the function names to actual \gls{isr} entities, a list of
category one \glspl{isr} is required. If such a list is available, the
proceeding is the same as described above.
\begin{table}[]
\centering
\begin{tabular}{r|l l}
Action & ORTI attribute & System state \\
\hline
start & - & running process \\
terminate & - & running process \\
suspend & task state & running process, process runnables \\
resume & task state & running process, process runnables \\
\end{tabular}
\caption[Runnable event mapping]{Runnable start and stop events can be
detected via function tracing. The source entity for a runnable event is the
process in whose context the runnable is executed. A runnable is suspended
when the corresponding process is preempted. If the process resumes, the
runnable is resumed, too.
One runnable can be called in the context of another runnable. This means
multiple runnables can be running within the same process context at the same
point in time. If this is the case, all running runnables must be suspended
and resumed.}
\label{tab:runnable_mapping}
\end{table}
\textbf{Runnable} actions are detectable via function events. Start and
terminate events must be created for function entry and function exit events.
A program flow trace contains the information about all functions in the
system. A list of runnable entity names is thus required to check whether a
function is a runnable or not.
Suspend events must be created, if the process context in which a runnable is
running is preempted and a resume event is required if the corresponding
process resumes. This means that whenever a process is deallocated, a
potentially active runnable must be suspended. Once the process is
reallocated the runnable also resumes.
Additionally, runnables can be nested, i.e.\ one runnable can be executed by
another runnable. If this happens it is important to suspend and resume all
running runnables, if the corresponding process is preempted and resumed.
\begin{table}[]
\centering
\begin{tabular}{r|l l}
Action & ORTI attribute & System state \\
\hline
write & - & running process \\
read & - & running process \\
\end{tabular}
\caption[Signal event mapping]{Signals can be read or written. To create
valid \gls{btf} signal events, it is necessary to know which process is
currently running on the core, i.e., which process executed the read or
write.}
\label{tab:signal_mapping}
\end{table}
\textbf{Signal} events are detectable via data events. To decide which data
event corresponds to a signal event a list of signal names must be available.
With this list it can be decided if a certain data event results in a signal
event or not. The source entity for signal read events is the currently
running process as shown in \autoref{tab:signal_mapping}. If no process is
running an entity of type simulation can be used to set the value of the
signal.
\textbf{Event} actions are easily detectable via the \emph{servicetrace}
attribute. Via this attribute it is possible to create set, wait, and clear
event actions. However, in order to create valid event actions, it is also
necessary to know the event entity that relates to the respective action.
\gls{orti} does not specify event related attributes. Because \gls{orti} does
not specify OS event related attributes, it is not possible to create valid
actions for this entity type.
\begin{table}[]
\centering
\begin{tabular}{r|l l}
Action & ORTI attribute & System state \\
\hline
ready & resource object & - \\
lock & resource locker & - \\
unlock & resource locker & - \\
full & resource locked, servicetrace & - \\
overfull & resource locked, servicetrace & - \\
\end{tabular}
\caption[Resource event mapping]{\gls{osek} resources can only be locked or
unlocked which means they do not support all semaphore actions. Lock and
unlock actions can be detected via the \gls{orti} locker attribute.
Full and overfull events are created if an already locked resource is
requested again. This is detectable via the \emph{servicetrace} attribute.
The resource for which the \emph{resource locked} attributed was read the
last time is the resource for which the error has occurred.}
\label{tab:resource_mapping}
\end{table}
\begin{table}[]
\centering
\begin{tabular}{r l l}
Action & ORTI attribute & System state \\
\hline
requestsemaphore & resource locker & - \\
assigned & resource locker & - \\
waiting & resource locked & - \\
released & resource locker & previous locker \\
\end{tabular}
\caption[Semaphore process event mapping]{Via the resource locker attribute
it is possible to detect if a resource has successfully requested a
semaphore.
The \emph{resource locker} attribute changes to the no task \gls{id} if the
resource is no longer locked. For this case it is necessary to know the task
that has previously locked the resource in order to create the correct
release event.
Waiting actions can be created by detecting data read events to the
\emph{resource locked} attribute.}
\label{tab:semaphore_process_mapping}
\end{table}
\textbf{Resource} entities must be initialized via the ready action before they
can be used in a \gls{btf} trace. This can be done at the beginning of a trace
with the timestamp zero. The \gls{orti} file contains a list of all resource
objects that are part of the application.
Since resources can only be locked or unlocked, they cannot change into the
semaphore used state. Consequently, only the state transition actions shown in
\autoref{tab:resource_mapping} can occur for resource events. Additionally,
only a subset of the process semaphore actions are required to represent the
behavior of resources.
Via the \gls{orti} \emph{resource locker} attribute it is possible to detect by
which task entity a resource is locked. This means a lock event can be
generated whenever the \gls{id} of a certain task is written to this attribute.
On the other hand, an unlock event is created when \emph{resource locker}
indicates that the respective entity is currently not locked by any task.
Moreover, it is necessary to assign a process to the locked resource once it
is locked by the task and to release it when the resource is released as
shown in \autoref{tab:semaphore_process_mapping}.
Full and overfull actions are created when a locked resource is polled by a
process. The semaphore waiting action is used to indicated the identity of the
polling process. As shown above, it is possible to detect whether a process is
polling a resource via the \emph{servicetrace} and \emph{lasterror} \gls{orti}
attributes. \emph{Lasterror} is set to \lstinline{E_OS_ACCESS} in case a
resource is already locked. The resource for which the polling occurs is
detectable via the \emph{resource locked} attribute. Whenever a certain
resource is requested the \gls{os} will read this attribute to decide whether
a request is allowed or not.
% }}}
% {{{ OS Specific Mapping
\section{OS Specific Mappings}
It is not feasible to create all \gls{btf} events relying solely on the
\gls{orti} file. For example, it is necessary to have a list of runnable and
signal names in order to create valid events for those entity types. But even
for entities that are supported by the \gls{orti} interface not all events
can be generated. It is possible to detect if the activation limit
of a task is exceeded however, it is not possible to determine for which task
entity this happens.
Nevertheless, even though not all events are detectable via \gls{orti} alone,
an \gls{osekos} stores the information of interest internally. During a task
activation the \gls{os} must decide whether the \gls{mta} limit is reached or
not. To do so it is necessary to compare the current amount of
pending activations to the value of maximal allowed activations. Consequently,
the \gls{os} has to read certain information from memory which results in data
trace events.
Based on this argument all other events can be reconstructed, if the
corresponding \gls{os} specific operations are known. On the downside, it is
no longer possible to rely on a standardized interface like \gls{orti}. This
means the algorithm that does the transformation must be customized depending
on the \gls{os}. In this section the adaptations required to create a
\gls{btf} trace for the \gls{osek} compliant Erika Enterprise (\gls{ee})
\glsdesc{os} \cite{erika} are shown. In
\autoref{section:evaluation_test_bench} the reasons for choosing \gls{ee} are
discussed.
\textbf{Task} \emph{mtalimitexceeded} events cannot be created based on
\gls{orti} alone because the task entity for which the event occurs is not
detectable. One way to get this information is to remember which task's
\emph{currentactivations} attribute was read the last time. The \gls{os} has to
decide whether a task instance can be created once an activation is triggered.
To do so it compares the maximum allowed activations with the current number
of activations of a task. In other words, the \gls{os} reads the
\emph{currentactivations} attribute for the task that should be activated. If
the \gls{mta} limit is exceeded an error code is written.
\begin{figure}[]
\centering
\centerline{\includegraphics[width=\textwidth]{./media/mapping/mtalimitexceeded.pdf}}
\caption[Call stack for inter-core process activation]{A
\emph{mtalimitexceeded} event must be created if the \lstinline{E_OS_LIMIT}
error is set via the \emph{lasterror} \gls{orti} attribute. However, this is not
correct for Erika Enterprise multi-core applications. For a failing
inter-core inter-process activation the error code is written two times, once
on the source and once on the target core. Therefore, special care must be
taken, so that the \gls{btf} event is created only once.}
\label{fig:mtalimitexceeded}
\end{figure}
\begin{code}
\begin{lstlisting}[caption={[Task activations limit exceeded] Erika Enterprise
keeps track of the remaining activations that are allowed for a task entity.
If the value is zero and another activation occurs an \lstinline{E_OS_LIMIT}
error is set.}, label={listing:mtalimitexceeded}]
if ( EE_th_rnact[TaskID] == 0U ) {
ev = E_OS_LIMIT;
} else {
/* Do activation. Code removed for clarity. */
ev = E_OK;
}
if (ev != E_OK ) {
EE_ORTI_set_lasterror(ev);
EE_oo_notify_error_ActivateTask(TaskID, ev);
}
\end{lstlisting}
\end{code}
As it turns out this approach is not sufficient for multi-core systems.
Activation of a task entity by a task on another core via
\lstinline{ActivateTask} is implemented by a \glsdesc{rpc} (\gls{rpc}) as shown
in \autoref{fig:mtalimitexceeded}. The \gls{rpc} triggers an \gls{isr} on the
other core which performs the required action. In case of an inter-process
activation the \lstinline{ActivateTask} routine is executed again, but this
time on the core the target task is allocated to. If the \gls{mta} limit of
the task is exceeded an \lstinline{E_OS_LIMIT} error event is written and a
\lstinline{mtalimitexceed} event is created.
However, the remote procedure call is notified by the remote \gls{isr} once the
service routine has finished. The corresponding error code is also returned
back to the initial core and written to the \emph{lasterror} attribute. The
resulting problem is that the transformation algorithm would create another
\emph{mtalimitexceeded} event based on the last read from the pending
activations variable on the initial core which is not correct.
A way to work around this problem can be derived by looking at a part of the
source code of the \lstinline{ActivateTask} implementation shown in
\autoref{listing:mtalimitexceeded}. It shows that \gls{ee} keeps track of the
remaining activations of each task in an array called \lstinline{EE_th_rnact}.
If the field for a specific task becomes zero, an \lstinline{E_OS_LIMIT} error
is written. This means if a task should be activated on one core and this
activation fails due to too many pending activations this will become clear by
a data read event to \lstinline{EE_th_rnact} directly followed by a write event
to the \emph{lasterror} attribute. For a remote activation there are multiple
other data events between the error and the previous read to
\lstinline{EE_th_rnact}. Therefore, no incorrect \emph{mtalimitexceeded} event
is created.
\begin{figure}[]
\centering
\centerline{\includegraphics[width=\textwidth]{./media/mapping/deltaqueue.pdf}}
\caption[Alarm delta queue implementation]{\gls{ee} implements alarms
via a delta queue. There is one queue, containing of the corresponding
alarms, for each counter. Each alarm has a delta value that indicates after
how many ticks in relation to the previous alarm it must be executed. Only
the delta of the first alarm in the queue must be decremented for each counter
tick. If an alarm expires it is removed from the queue, and inserted again in
case it is cyclic.
In this example Alarm 2 expires after three ticks. Since Alarm 5 has a
delta of zero it expires at the same counter cycle. Alarm 4 expires after
six cycles, i.e.\ the sum of its own and all previous deltas.
}
\label{fig:deltaqueue}
\end{figure}
\begin{table}[]
\centering
\begin{tabular}{r|l l}
Action & Variable & Additional Information \\
\hline
mtalimitexceeded & lasterror & previous data read event \\
trigger (alarm) & alarm action type & \gls{orti} \\
\end{tabular}
\caption[OS task and stimulus event mapping]{Via \gls{orti} it is not
possible to detect for which task an \lstinline{E_OS_LIMIT} event has been
created. However, the data read event before this error can be used to get
this information.
Additionally, alarm trigger events cannot be created via the \emph{alarmtime}
attribute in Erika Enterprise, because it is not implemented in an \gls{osek}
compliant way. Instead, read events to the \lstinline{ActionType} attribute
of an alarm can be used to detect when a stimulus event must be created.}
\label{tab:task_mapping_os}
\end{table}
\textbf{Stimulus} events must be created for inter-process and alarm
activations as shown in \autoref{tab:stimulus_mapping}. An alarm activation
stimulus is created if the \gls{orti} \emph{alarmtime} attribute becomes zero.
However, \gls{ee} \gls{os} does not update this attribute in compliance with
the \gls{osek} specification \cite{erikaaltick}. Hence, another technique is
required to detect alarm events.
\gls{ee} keeps track of all active alarms in a delta queue as shown in
\autoref{fig:deltaqueue}. There is one queue for each counter. Whenever a
counter is incremented the delta of the first element in the queue is
decremented. If the delta of the first alarm in the queue becomes zero this
alarm and all following alarms with a delta of zero expire and the
corresponding actions are executed.
For an expiring alarm the \gls{os} is required to execute the corresponding
action. As shown in \autoref{tab:task_mapping_os} each alarm has an
\lstinline{ActionType} attribute. Via this attribute the \gls{os} determines
the correct action for an alarm. In other words, if an alarm expires this
attribute must be read and a data read event is generated. Consequently, a
\gls{btf} stimulus event is created whenever the action type attribute of
an alarm is read. The exact action executed by an alarm, e.g.\ which task is
activated for a process activation is read from the \gls{orti} file.
\textbf{Event} actions must include the information about the affected event.
For example, if a task sets an event it is necessary to know the target task
and event for this action. \gls{orti} allows it to detect when an event
related service routine is executed however, no information about the event
itself is made available.
\begin{code}
\begin{lstlisting}[caption={[Set event] Erika Enterprise uses the
\lstinline{EE_th_event_active} array to keep track of the events set for each
task. If a new event is set the mask is updated by connecting the previous
events and the new event via bitwise or. It is not possible to set an event
for a suspended task.},
label={listing:set_event}]
if ( EE_th_status[TaskID] == SUSPENDED ) {
ev = E_OS_STATE;
} else {
/* Set the event mask only if the task is not suspended */
EE_th_event_active[TaskID] |= Mask;
/* Check if the TASK was waiting for an event we just set */
if ((EE_th_event_waitmask[TaskID] & Mask) != 0U)
{
/* Activate task here */
}
}
\end{lstlisting}
\end{code}
\begin{table}[]
\centering
\begin{tabular}{r|l l}
Action & Variable & Additional Information \\
\hline
wait\_event & \lstinline!EE_th_event_waitmask! & previous wait mask\\
clear\_event & \lstinline!EE_th_event_active! & previous active mask\\
set\_event & \lstinline!EE_th_event_active! & previous active mask\\
all actions & - & event bit from eecfg.h \\
\end{tabular}
\caption[OS specific event mapping]{Erika Enterprise uses two arrays to keep
track of the event states for each task entity. Via write events to these
arrays and the previous event state for a task instance correct \gls{btf}
events can be generated.}
\label{tab:os_event_mapping}
\end{table}
Erika Enterprise uses two arrays to keep track of the event related state of a
task: In \lstinline{EE_th_event_active} the events currently set for a
specific task instance are stored and \lstinline{EE_th_event_waitmask} includes
the information about which events a task entity is waiting for. Each field in
the array corresponds to one task and each bit of a field is related to a
certain event. Whenever a task is terminated both event masks are cleared.
Using these arrays it is possible to create correct events as shown in
\autoref{tab:os_event_mapping}. Whenever an \gls{os} event related service
routine is executed the corresponding event mask is updated. For example, if
an event is set for a specific task, the event mask is updated based on the new
event. This means the events which are currently set for a task and the new
event are connected via the bitwise \emph{or} operation as shown in
\autoref{listing:set_event}.
Hence, a data write event to one of those arrays is created whenever a event
service routine is executed. However, only the new state of the bitmask
becomes available. To determine the event \gls{id} it is necessary to remember
the previous state of the mask. By executing a bitwise \emph{exclusive-or}
operation on previous and current mask, the bit of the current event is
computed.
Unfortunately, this information is still not enough to create a valid \gls{btf}
event. For each bit it is necessary to know the corresponding entity name.
\glsdesc{ee} defines the bitmask for each \gls{os} event in the \emph{eecfg.h}
file which is created during the code generation process. By parsing the event
defines the mapping between bit and event name is retrieved.
\begin{code}
\begin{lstlisting}[caption={[Spin in for global resource request] In case a
global resource (a resource used on multiple cores) is requested, Erika
Enterprise uses a spinlock mechanism to lock the CPU until the resource
becomes available.},
label={listing:get_resource_spin}]
/* if this is a global resource, lock the others CPUs */
if (isGlobal) {
EE_hal_spin_in((EE_TYPESPIN)ResID);
}
\end{lstlisting}
\end{code}
\textbf{Resource} events or in \gls{btf} terms semaphore events, can be
created based on the information provided by \gls{orti} as shown in
\autoref{tab:resource_mapping}. However, certain semaphore events like
waiting can only occur in multi-core systems. In a single-core system it is
not possible that one task polls a resource that is already occupied because
of the priority ceiling protocol.
Erika Enterprise implements inter-core resource requests via spinlocks. If a
task requests a resource that is locked by a task on another core, the service
routine does not return an error code but starts spinning as shown in
\autoref{listing:get_resource_spin}. As a consequence, the mapping for full,
overfull, and waiting actions introduced in the previous section does not
work.
To solve around this problem, it is necessary to understand how spinlocks are
implemented in Erika Enterprise. The state of each spinlock is stored in the
\lstinline{EE_hal_spin_status} array where each field corresponds to a separate
spinlock. A value of one indicates that the spinlock is locked otherwise the
value is zero. The \lstinline{EE_hal_spin_in} method is implemented via the
atomic compare-and-swap operation. This method is used to write a one into a
certain spinlock field, but only if the spinlock is currently free.
Compare-and-swap returns a value that indicates whether the operation was
successful or not. In the latter case the operation is executed again until it
succeeds.
Compare-and-swap operations result in a data access to the variable for which
the operation is executed. Therefore, it is possible to detect when a spinlock
is polled based on data access events to \lstinline{EE_hal_spin_in}. This
information can then be used to create correct semaphore events as shown in
\autoref{tab:os_semaphore_process_mapping}.
Whenever the \emph{resource locker} attribute is read within the context of the
\lstinline{GetResource} service routine, the corresponding resource entity must
be stored in the system state. If the resource is free, a write event to the
\emph{resource locker} attribute follows and the corresponding \gls{btf} events
can be created as described above.
If there is no write event to the \emph{resource locker} attribute the
resource is currently locked and the \gls{os} starts spinning which is
detectable by continuous data access events to the field of
\lstinline{EE_hal_spin_status} relating to the requested semaphore.
Consequently, the running process is assigned to the semaphore via the waiting
action and an overfull action must be created. The process is now in polling
mode. Once there are no further accesses to \lstinline{EE_hal_spin_status}
the request was successful, the task state changes to running and the resource
state to full.
\begin{table}[]
\centering
\begin{tabular}{r l l}
Action & Variable & Additional Information \\
\hline
waiting & \lstinline!EE_hal_spin_status! & running task, requested resource \\
full & \lstinline!EE_hal_spin_status! & requested resource \\
overfull & \lstinline!EE_hal_spin_status! & requested resource \\
\end{tabular}
\caption[OS specific semaphore event mapping]{Not all \gls{btf} semaphore
actions can be created based on \gls{orti} alone for an Erika Enterprise
multi-core application. This is because inter-core resource requests are
implemented via spinlocks. Spinlock operations can be detected via the
\lstinline{EE_hal_spin_status} array.}
\label{tab:os_semaphore_process_mapping}
\end{table}
% }}}

503
content/osek.tex Normal file
View File

@@ -0,0 +1,503 @@
\section{OSEK/VDX OS}
\label{section:osekvdxos}
\Gls{osek} (\glsdesc*{osek}) \cite{osek} is an effort of the German and French
automotive industry to establish common standards for the software architecture
of distributed control units in vehicles. Defining a common architecture for
communication, operating systems, and network management avoids problems that
arise otherwise by using different interfaces and protocols. An abstraction
layer between hardware and software allows \Gls{osek} compliant applications to
be reused on different hardware platforms with minor modifications.
\gls{osekos} specifies the architecture of a real-time operating system for
single processors. Based on the services offered by the \gls{os}, integration
of modules from different manufactures is possible. The \gls{os} meets the
hard real-time requirements demanded by automotive applications. \gls{osekos}
can also be used in multi-core environments. In such cases a separate kernel is
executed on each core. Service routines can be used to interact between
multiple \gls{os} instances.
A high level of flexibility is required for an \gls{os} to support real-time
systems on various target platforms. In order to support low-end and high-end
microcontrollers alike \gls{osek} conformance classes (\glspl{osekcc}) are
specified. Depending on the \gls{osekcc} certain features, e.g.\ multiple task
activations, multiple tasks per priority, and extended tasks are available or
not.
Dynamic creation of system objects like tasks, alarms or events is not
supported by \gls{osekos}. All objects are defined statically and created
during the system generation phase \cite{osekos}. Consequently, all \gls{os}
entities are known before the system execution.
\autoref{fig:os_module_abstraction} illustrates the abstraction of
application modules from hardware resources. Standardized system services
offer functionality that can be used by all application modules. Well-defined
service calls, type definitions, and constants are specified and ensure the
portability of an application to different architectures.
An \glsdesc{io} (\gls{io}) module parallel to the \gls{os} gives access to
microcontroller specific functionality like serial interfaces or
analog-to-digital converters. \gls{io} interfaces are not specified by
\gls{osekos} which is opposing to the idea of easy portability. \gls{osek}'s
follow-up standard \gls{autosar} (\glsdesc{autosar}) \cite{autosar} solves this
problem by adding a \gls{mcal} (\glsdesc{mcal}) to the \gls{autosaros}
specification \cite{autosarbsw}.
In 2003 \gls{autosar} was established by automobile \glspl{oem}, suppliers, and
tool developers pursuing the same goals like \gls{osek}. Different parts of
the \gls{autosar} standard are based on \gls{osek} and \gls{autosaros}
constitutes a superset of \gls{osekos}. Consequently, all features discussed
here are also relevant for \gls{autosaros}. Differences that are important in
the context of this thesis are mentioned explicitly.
\begin{figure}[]
\centering
\includegraphics[width=\textwidth]{./media/osek/os_module_abstraction.pdf}
\caption[\gls{osekos} architecture]{\gls{osek} compliant \glspl{os} abstract
application modules and hardware via an \gls{os} layer. A non standardized
\gls{io} module still results in hardware dependencies.}
\label{fig:os_module_abstraction}
\end{figure}
\subsection{OSEK Architecture}
\label{subsection:osek_architecture}
\gls{osek} provides a specification for the architecture of an embedded
real-time \gls{os}. One of the main purposes of the \gls{os} is to manage the
available computational resources of the CPU\@. Based on different factors
such as priority, task group and scheduling policy, executable entities,
so-called processes are given access to the processor core. The procedure of
deciding which entity is executed next is called scheduling.
There are two types of process entities available: tasks and Interrupt Service
Routines (\glspl{isr}). Former are scheduled on task level while for latter
the interrupt level is used. Entities on interrupt level always have
precedence over entities on task level. Scheduling on interrupt level depends
solely on the priority of an entity and is done by hardware. For task entities
scheduling is done by the \gls{os} and depends on priority, scheduling policy,
and task group.
\textbf{Tasks} are categorized into two types by \gls{osekos}. A basic task
has three states: ready, running, and suspended. An extended task is a basic
task with the additional waiting state. Suspended tasks are passive and can be
activated. A task in the ready state can be allocated to the CPU for
execution which is then indicated by the running state. Only one task per
core can be in the running state at a given point in time. Extended tasks can
wait passively for an event. In that case they reside in waiting state.
Waiting tasks are not allocated to the CPU.
\begin{figure}[]
\centering
\includegraphics[width=0.7\textwidth]{./media/osek/extended_task_state_model.pdf}
\caption[\gls{osekos} task state model]{Task state model of an extended
\gls{osekos} task. A basic task cannot enter the waiting state.}
\label{fig:extended_task_state_model}
\end{figure}
Different task state transitions are possible as shown in
\autoref{fig:extended_task_state_model}. At system initialization all tasks
are suspended. If a task has to be executed it must be activated by a system
service. A task can be started by the \gls{os} in order to be executed. A
task is preempted if a task of higher priority is scheduled. Once a task has
finished execution it terminates and switches to the suspended state. Extended
tasks can wait for system events and are released and switched to ready once
the expected event is set. The previous state of a ready task is not
implicitly known.
\begin{figure}[]
\centering
\includegraphics[width=0.9\textwidth]{./media/osek/isr_example.pdf}
\caption[\gls{isr} scheduling behavior]{\gls{isr} scheduling is done by
hardware and is solely depended on the interrupt priority. \glspl{isr} do not
have a ready state because they are started by hardware.}
\label{fig:isr_example}
\end{figure}
Priorities are assigned to tasks and \glspl{isr} statically. The lowest
priority is zero and greater integers mean a higher priority. If an \gls{isr}
of priority zero is running and another \gls{isr} of priority one is activated,
the first \gls{isr} is preempted and restarts once the second \gls{isr} is
terminated as shown in \autoref{fig:isr_example}. For tasks the same scenario
is dependent on scheduling policy and task group.
\begin{figure}[]
\centering
\includegraphics[width=\textwidth]{./media/osek/non_vs_full_preemptive_scheduling.pdf}
\caption[Non vs full preemptive scheduling]{Scheduling behavior of a non (top)
vs a full (bottom) preemptive task. A non preemptive task finishes execution
even though a task with higher priority is in ready state. Only for certain
system services, for example, an inter-process activation, the other task may
be scheduled. A full preemptive task is preempted if a task with higher
priority is activated. Once this task has terminated, the task with lower
priority can continue running.}
\label{fig:non_vs_full_preemptive_scheduling}
\end{figure}
\gls{osekos} specifies three \textbf{scheduling policies}: non, full, and mixed
preemptive scheduling. For a non preemptive tasks, rescheduling is only
possible if a system routine that causes rescheduling, e.g.\ an inter-process
activation or an explicit scheduler call is executed. A full preemptive task
can be rescheduled at any point in time during its execution if another task of
higher precedence is activated as shown in
\autoref{fig:non_vs_full_preemptive_scheduling}. A mixed preemptive system
contains tasks with both, non, and full preemptive scheduling policies.
Otherwise the system is either non or full preemptive.
\begin{figure}[]
\centering
\includegraphics[width=\textwidth]{./media/osek/task_group_example.pdf}
\caption[Scheduling of tasks in task groups]{Scheduling of task entities is
not only dependent on priority and scheduling priority. \gls{osek} specifies
task groups, which change the priority of tasks inside in relation to tasks
outside a specific group. In this example Task 3 has a greater priority than
Task 2. However, because they are in the same group, Task 2 inherits the
priority of Task 1. Thus, Task 2 is not preempted by Task 3.}
\label{fig:task_group_example}
\end{figure}
The precedence of a task is not necessarily due to its priority. \gls{osekos}
introduces the concept of task groups which allows it to group multiple tasks
into a group. A task which is not within a group has precedence over a task
within a group only, if its priority is higher than the priority of the task
with the highest priority within this group. This means a task acts non
preemptive towards another task if the task with the highest priority within
the group has a greater priority than the other task as shown in
\autoref{fig:task_group_example}.
\textbf{\gls{osek} Conformance Classes} are used to adapt applications to
different hardware capacities such as available memory and CPU speed. Only one
\gls{osekcc} can be active at a time and cannot be changed during runtime.
Basic \glspl{osekcc} (BCC1 and BCC2) allow basic tasks only, while extended
\glspl{osekcc} (ECC1 and ECC2) allow basic and extended tasks. Level one
\glspl{osekcc} (BCC1 and ECC1) allow multiple tasks per priority and multiple
activation requests per task. For level two \glspl{osekcc} (BCC2 and ECC2)
multiple tasks can share the same priority and the same task can be activated
multiple times as shown in \autoref{tab:conformance_class}. This means BCC2
and ECC2 allow \glsdesc{mta} (\glspl{mta}). An active task with pending
activations becomes ready again immediately after termination.
\begin{table}[]
\centering
\begin{tabular}{r|c c c c}
& BCC1 & BCC2 & ECC1 & ECC 2 \\
\hline
\gls{mta} & no & yes & no & yes \\
Multiple tasks per priority & no & yes & no & yes \\
Extended tasks & no & no & yes & yes \\
\end{tabular}
\caption[\gls{osek} conformance classes]{\gls{osekos} specifies multiple
\glspl{osekcc} to respect the computational capacities of different
platforms. Depending on the \gls{osekcc} different features are supported or
not.} \label{tab:conformance_class}
\end{table}
Task scheduling is done by the \gls{os} while \gls{isr} scheduling is done by
hardware. \glspl{isr} can be divided into category one and category two.
Category one \glspl{isr} do not run under \gls{os} control and are thus not
allowed to call \gls{os} services. Category two \glspl{isr} are monitored by
the \gls{os} and are allowed to execute a subset of the available \gls{os}
services. Tasks are always preempted by \glspl{isr} and can only continue
running when all \glspl{isr} have terminated.
Tasks and \glspl{isr} serve as containers for application specific functions.
These functions are not managed by the \gls{os} and must be added to the
process code by the user. \gls{autosar} invented the concept of runnables to
solve problems related to the \gls{vfb} (\glsdesc{vfb}) introduced by the
\gls{autosar} architecture \cite{naumann2009autosar}. A runnable is
essentially the same as a function.
\textbf{Events} are system objects that can be set or not. Each event is owned
by at least one extended task. Only a task that owns an event is allowed to
clear and to wait for it. When waiting for an event a task switches into the
waiting state. It is switched back to ready when the corresponding event
is set.
All tasks and category two \glspl{isr} are allowed to set an event. Events are
used as a binary communication technique. One task can signal another one for
example, if a certain resource has been released. Events are defined and
assigned to tasks before runtime. All events assigned to a task are cleared
when this task is activated.
\textbf{Resource} management is used to manage access to shared objects. An
\gls{osek} resource is basically a mutex. Each resource gets a ceiling
priority that is at least as high as the highest priority of all tasks that
access this resource. When a task accesses a resource and its priority is
lower than the ceiling priority of this resource its priority is raised to the
ceiling priority. The priority is reset to the original value once the task
releases the resource.
This technique ensures that a task that potentially accesses a shared
resource cannot switch into the running state. This prevents priority
inversion and deadlocks. On the downside, tasks with a priority lower than the
ceiling priority may be delayed by a lower priority task.
\textbf{Alarms} are used to activate a task, set an event or execute an
alarm-callback routine. Each alarm has an alarmtime and a cycletime that is
statically defined and measured in ticks. An alarm expires the first time
after alarmtime ticks and afterwards every cycletime ticks. Thus, an alarm can
be used to activate a task or set an event periodically.
Each alarm is assigned to a counter object but each counter can be used by
multiple alarms. Counters are responsible for triggering an alarm after the
specified number of ticks have passed. Each \gls{osekos} offers at least one
counter that is based on a hard- or software timer.
\textbf{Hook routines} can be used to allow user-defined code within OS
internal processing. They cannot be preempted by tasks and \glspl{isr} and
only a subset of the available \gls{os} services is available from their
context.
The StartupHook and ShutdownHook can be used to execute user specified code at
system start and shutdown. \gls{os} errors result in a call to the ErrorHook.
It can be used to execute application specific error handling. Finally,
PreTaskHook and PostTaskHook are called at task start and termination.
\subsection{OSEK OS Services}
\gls{osekos} specifies system services that can be used to interact with
internal \gls{os} mechanisms and objects like tasks or resources. The internal
presentation of system objects is implementation specific. Only specified
system services allow well-defined interaction with \gls{os} objects. A system
service may take zero or more input parameters and may return zero or more
output parameters via call by reference. The return value of an \gls{os}
service is of type \lstinline{StatusType}. \autoref{tab:status_types} shows
defined status types.
\begin{table}[]
\centering
\begin{tabular}{r|l}
Define & Meaning\\
\hline
\lstinline!E_OK! & Service finished correctly. \\
\lstinline!E_OS_ACCESS! & Calling task is not an extended task. \\
\lstinline!E_OS_CALLLEVEL! & Service called from invalid level. \\
\lstinline!E_OS_ID! & Invalid \gls{os} \gls{id}. \\
\lstinline!E_OS_LIMIT! & Number of activations is exceeded. \\
\lstinline!E_OS_NOFUNC! & Alarm or resource is not in use. \\
\lstinline!E_OS_RESOURCE! & A resource is still occupied. \\
\lstinline!E_OS_STATE! & Object is in invalid state. \\
\lstinline!E_OS_VALUE! & Value is not allowed. \\
\end{tabular}
\caption[\gls{osekos} error codes]{\gls{osekos} defines a
\lstinline{StatusType} type that can be used to return an error code from
service routines. This table shows the status types that are defined by
\gls{osek} and their meaning. Users are free to define additional codes.}
\label{tab:status_types}
\end{table}
A task can be activated via alarm or \lstinline{ActivateTask} service routine.
Latter is callable from interrupt and task level. The task to be activated
must be provided as an input parameter. If this task is suspended its state
will be changed to ready. If it is not suspended the pending activations
counter is incremented or \lstinline{E_OS_LIMIT} is returned if the \gls{mta}
limit is exceeded.
\lstinline{TerminateTask} is used to switch a task from running to suspended.
All internal task resources are released and the service will not return if the
call was successful. \lstinline{TerminateTask} will fail with
\lstinline{E_OS_RESOURCE} if resources are still occupied by a task.
\lstinline{ChainTask} is a combination of \lstinline{ActivateTask} and
\lstinline{TerminateTask}. It terminates the current task and activates
another task which is provided via input parameter.
\begin{figure}[]
\centering
\includegraphics[width=\textwidth]{./media/osek/non_vs_full_schedule.pdf}
\caption[Explicit \gls{osekos} schedule call]{An explicit call to the
scheduler can solve the problem of a delayed higher priority task.}
\label{fig:non_vs_full_schedule}
\end{figure}
\lstinline{Schedule} can be called to explicitly trigger a scheduling decision.
This makes sense for non preemptive tasks if a task with higher priority is
ready. Normally the task with higher priority is delayed until the task with
low priority has finished execution as shown in
\autoref{fig:non_vs_full_preemptive_scheduling}. By calling
\lstinline{Schedule} the non preemptive task is preempted and the task with
higher priority is executed as illustrated in
\autoref{fig:non_vs_full_schedule}.
The routines \lstinline{GetResource} and \lstinline{ReleaseResource} can be
used to request and release resources. Nested resource requests are only
allowed in last-in-first-out order, i.e.\ the resource that has been requested
first must be released last. Within a critical section that is protected via a
resource no calls to services that cause rescheduling are allowed. Both
methods can be called from task and \gls{isr} level. If a requested resource
is already occupied \lstinline{E_OS_ACCESS} is returned.
Interaction with event objects is done via \lstinline{SetEvent},
\lstinline{ClearEvent}, \lstinline{GetEvent}, and \lstinline{WaitEvent} service
routines. \lstinline{SetEvent} takes a mask of events that should be set for a
specific task. Events can be deleted from the context of a process owning this
event via \lstinline{ClearEvent}. \lstinline{GetEvent} returns the current
status of all events related to a specified task. A task can wait for one ore
more events using the \lstinline{WaitEvent} service routine. Waiting lasts
until at least on of the specified events is set.
The service routine \lstinline{GetAlarmBase} returns the basic configuration of
an alarm. The remaining ticks until an alarm expires can be retrieved with
\lstinline{GetAlarm}. \lstinline{SetRelAlarm} increases the remaining ticks by
the submitted value while \lstinline{SetAbsAlarm} sets them to an absolute
value. An alarm can be deactivated with \lstinline{CancelAlarm}.
\subsection{OSEK OIL and ORTI}
\label{subsection:osek_oil_and_orti}
\begin{figure}[]
\centering
\includegraphics[width=\textwidth]{./media/osek/osek_code_generation.pdf}
\caption[\gls{osekos} build process]{An \gls{osek} application is compiled
from three sources. The \gls{os} kernel, user created code and \gls{osekos}
object definition files which are created via code generation based on one or
more \gls{oil} files.} \label{fig:osek_code_generation}
\end{figure}
The implementation of system objects is not specified by \gls{osek}.
Therefore, users cannot know how to create system objects because correct
definition is depending on the \gls{os}. \glsdesc{oil} (\gls{oil}) solves this
problem by providing a meta language for defining system objects
\cite{osekoil}. Based on \gls{oil} configuration files code generators
provided by the \gls{os} vendor can produce \gls{os} specific source code. In
combination with kernel and user code an application can be built as shown in
\autoref{fig:osek_code_generation}.
\gls{osek} specifies data types for all system object types. However, the
implementation of the data types is \gls{os} specific. For example, a task is
identified by \lstinline{TaskType}. \lstinline{TaskType} could be implemented
as an integer indexing a global list of task objects or as a pointer to the
task object itself.
Only a minimum amount of data types necessary to interact with service
routines are specified. Consequently, a lot of information is kept internally
by the \gls{os} and is not available for the user. For example, there is no
common interface to get data of the pending activations of a task, the
current state of a resource, or the state of an event.
Application code that needs this information would need to access the \gls{os}
internals directly which results in portability and security issues. Moreover,
external tools like debuggers that want to provide \gls{os} aware debug
information have no standardized interface to relevant internal data.
\glsdesc{orti} (\gls{orti}) was specified to solve this problem. Via
\gls{orti} tool vendors have a standardized interface to \gls{os} internal data
and properties of relevant system objects. The \glsdesc{koil} (\gls{koil})
format is used to exchange relevant information via the \gls{orti} file. This
file contains mappings from \gls{os} object properties to variables that
hold the respective information.
\gls{orti} specifies a set of system properties that must be available for
every \gls{osek} compliant \gls{os}. Operating system vendors are free to add
additional information. Each \gls{os} object is described in a separate
section of the \gls{orti} file. The specified sections that are relevant for
this thesis are \emph{os}, \emph{task}, \emph{alarm}, and \emph{resource}.
Information about the currently running process, the system error state, and
the active service routine can be found in the \gls{os} section shown in
\autoref{tab:os_attributes}. The \emph{servicetrace} attribute is written
whenever a service routine is started or finished along with the \gls{id} of
the corresponding routine. Task and \gls{isr} processes that are currently
running in a system can be retrieved via \emph{runningtask} and
\emph{runningisr2}. The attribute \emph{lasterror} provides information about
the last failure condition.
As shown in \autoref{tab:os_task} the \gls{orti} task section makes the current
\emph{priority}, \emph{state}, and number of open activations
(\emph{currentactivations}) for each task available. \autoref{listing:os_task}
shows the textual representation of the \gls{orti} attributes for a single
task.
\begin{code}
\begin{lstlisting}[caption={[\gls{orti} task example]Textual representation of
the \gls{orti} attributes for a task entity.},
label={listing:os_task}]
TASK T_CylinderResponser {
priority = "osTcbActualPrio[30]";
state = "osTcbTaskState[30]";
currentactivations = "osTcbActivationCount[30]";
};
\end{lstlisting}
\end{code}
\autoref{tab:os_alarm} shows that alarms have an \emph{alarmtime} attribute
that contains the ticks to the next expiry time. A \emph{cycletime} is used
for periodic alarms. If a cyclic alarm expires, \emph{alarmtime} is reset to
this value. An alarm can be running or stopped which is indicated by the
\emph{state} attribute and executes a certain \emph{action} if \emph{alarmtime}
becomes zero.
A resource can be locked or free which is indicated by the \emph{state}
attribute. In the former case the \emph{locker} attribute indicates the
corresponding process as shown in \autoref{tab:os_resource}. The resource
\emph{priority} is also accessible.
\begin{table}[]
\centering
\begin{tabular}{r|l}
Attribute & Content \\
\hline
runningtask & currently running task \\
runningisr2 & currently running category 2 \gls{isr} \\
servictrace & indicates entry and exit to service routines \\
lasterror & contains the last error code set by the system \\
\end{tabular}
\caption[\gls{orti} \gls{os} section]{The \gls{orti} \gls{os} section
provides information about the running task and category 2 \gls{isr}, entry
and exit to service routines and the last system error.}
\label{tab:os_attributes}
\end{table}
\begin{table}[]
\centering
\begin{tabular}{r|l}
Attribute & Content \\
\hline
priority & task priority \\
state & task state (\autoref{fig:extended_task_state_model})\\
currentactivations & number of task activations \\
\end{tabular}
\caption[\gls{orti} task section]{The \gls{orti} task section provides
information about the current task priority, task state and number of
activations. The task priority can be different to the statically defined
value because of the priority ceiling protocol.}
\label{tab:os_task}
\end{table}
\begin{table}[]
\centering
\begin{tabular}{r|l}
Attribute & Content \\
\hline
alarmtime & time till alarm expires \\
cycletime & alarm cycle time of periodic alarms \\
state & alarm state (running or stopped) \\
action & action at alarm expiry time \\
\end{tabular}
\caption[\gls{orti} alarm section]{The \gls{orti} alarm section provides
information about the time that is left until an alarm expires, its cycle
time, the current state and the action that is executed once the alarm
expires.}
\label{tab:os_alarm}
\end{table}
\begin{table}[]
\centering
\begin{tabular}{r|l}
Attribute & Content \\
\hline
state & resource state (locked or unlocked) \\
locker & the task that has locked a resource \\
priority & resource priority \\
\end{tabular}
\caption[\gls{orti} resource section]{The \gls{orti} resource section
provides information about the state of a resource. A resource can be locked
or not. For a locked resource the corresponding task is made available.}
\label{tab:os_resource}
\end{table}
Additional sections and attributes can be found in the \gls{osek} \gls{orti}
specification \cite{osekortib}. Even via \gls{orti} not all \gls{os}
internals become available. Via \emph{servicetrace} it can be detected that a
certain event is set or cleared but no information about the event itself is
available. Consequently, for certain use cases it may still be necessary to
access \gls{os} specific data structures manually.

View File

593
content/result.tex Normal file
View File

@@ -0,0 +1,593 @@
\chapter{System Trace}
\label{chapter:btf}
In \autoref{section:trace_measurement} a trace has been defined as a
sequence of chronological ordered events. An event is a system state change an
evaluator is interested in. Various trace tools exist to record events. They
can be classified into hardware, hybrid and software tools.
The same event can be represented on different levels as shown in
\autoref{fig:trace_event_levels}. An event within the respective level
shall be called hardware, software or system event. An event can be moved
from one level into another via transformation. For example a voltage level
change in memory is a hardware event. The corresponding software event could
be a value change of a certain variable.
Different representation levels do not relate to measurement tools with the
same name. A hardware event can be measured directly via hardware tracing.
But it is also possible to detect the change of a variable via software
tracing. Based on the software event the occurrence of the respective hardware
event can be deduced. This means hardware event detection is not limited to
hardware tracing and the same is true for software events.
A well-defined format for events is required for further processing of recorded
traces. Tools that analyze or visualize a trace must be able to interpret the
recorded data. For example the hardware trace host software must be able to
understand the hardware events that are generated by the on-chip trace device.
Otherwise it is not possible to transform the hardware events into higher level
software events.
Depending on the measurement goal a different event level may be required: A
hardware designer is not interested in the timing behavior of an engine control
unit, rather the correct functionality of a certain hardware register is of
interest. On the other hand, an application architect relies on the correct
functionality of the hardware, but the timing behavior of an application on
architecture level must be analyzable.
\section{BTF}
A system level trace can be used to analyze timing, performance and reliability
of an embedded system. \gls{btf} (\glsdesc{btf}) \cite{btf} was specified to
support these use cases. It assumes a signal processing system, where one
entity influences another entity in the system. This means an event does not
only contain the information about which system state changed, but also the
source for the change. For example, a regular system event could be the
activation of a task with the corresponding timestamp. A \gls{btf} event
additionally contains the information that the task activation was triggered by
a certain alarm.
A \gls{btf} event is defined as an octuple
\begin{equation}
\label{eq:btf_trace}
b_{k} = (t_k,\, \Psi_k,\, \psi_k,\, \iota_k,\, T_k,\, \tau_k,\, \alpha_k,\,
\nu_k),\, k \in \mathbb{N},
\end{equation}
where each element represents a certain \gls{btf} field: $t_k$ is the
\emph{timestamp}, $\Psi_k$ is the \emph{source}, $\psi_k$ is the \emph{source
instance}, $\iota_k$ is the \emph{target type}, $T_k$ is the \emph{target},
$\tau_k$ is the \emph{target instance}, $\alpha$ is the event \emph{action} and
$\nu_k$ is an optional note. A \gls{btf} trace can now be defined as
\begin{equation}
B = \{b_k | t_{k} \leq t_{k+1} \wedge k \leq n\},\, n \in \mathbb{N},
\end{equation}
where $k$ is the index of a certain event and $n$ is the number of elements in
the trace.
The timestamp field is an integer value $t_k \in \mathbb{N}_{0}$. All
timestamps within the same trace must be specified relative to a certain point
in time, which is usually the start of trace measurement. System and trace
start can occur at different points in time. Consequently, neither trace nor
system start must occur at $t = 0$. The time period between two events $b_{k}$
and $b_{k+1}$ can be calculated as $\Delta t = t_{k+1} - t_{k}$. If not
specified otherwise, the time unit for $t_k$ is nanoseconds.
A \gls{btf} event represents the notification of one entity by another. There
exist different entity types corresponding to the software and hardware objects
of an application. Each entity of a certain type has an unique name that must
not be shared by entities of other types. Certain entity types have a life
cycle. This means multiple instances of the same entity can occur within the
same trace. Instance counters are required to distinguish between different
instances of the same entity. This is important for multicore systems where an
entity can run on two processor cores in parallel.
\autoref{fig:entity_inheritance} depicts the relationship between entity type,
entity and entity instance.
\begin{figure}[]
\centering
\includegraphics[width=0.7\textwidth]{./media/btf/entity_inheritance.pdf}
\caption[\gls{btf} entity inheritance]{A \gls{btf} event represents the
impact of one entity by another. Entities have different types and multiple
entities can exist for one type, identified by an unique name. Entities that
have a life cycle, such as runnables, can be instantiated multiple times. An
instance counter is required to distinguish multiple instantiations.}
\label{fig:entity_inheritance}
\end{figure}
A basic entity type is a runnable, which is essentially a simple function. A
system may contain of multiple runnable entities called \emph{Runnable\_1},
\emph{Runnable\_2} and \emph{Runnable\_3}. The life cycle of a runnable starts
with the execution of this runnable and ends when it terminates. A runnable
can execute different actions such as calling another runnable or writing a
variable. In a multicore system the same runnable entity \emph{Runnable\_2}
may be executed by two other runnables that are running in parallel on
different cores. If \emph{Runnable\_2} writes to a variable, it is not known
from which core that write occurred. With the information about which instance
executed the write this problem is solved.
The \emph{source} and \emph{target} fields are strings that represent entities
which are part of the system. The target entity is influenced by the source
entity. \emph{Source instance} and \emph{target instance} are positive integer
values that identify the instance of the respective entity. \emph{Target type}
is the type of the target entity. Types are represented by their corresponding
type \gls{id}. A source type field is not part of a \gls{btf} event even
though it would make sense in certain cases.
The \emph{action} field indicates the way in which way one entity is influenced
by another. Depending on the source and target entity types, different actions
are possible and allowed by the specification. The last field \emph{note} is
optional even though it can be used to carry additional information for certain
events. \autoref{tab:btf_fields} summarizes the meaning of the different
\gls{btf} fields.
\begin{table}[]
\centering
\begin{tabular}{r|l}
Field & Meaning \\
\hline
time & Timestamp relative to a certain point in time. \\
source & Name of the entity that caused an event. \\
source instance & Instance number of the entity that caused an event. \\
target type & Type of the entity that is influenced by an event. \\
target & Name of the entity that is influenced by an event. \\
target instance & Instance of the entity that is influenced by an event. \\
action & The way in which target is influenced by source. \\
note & An optional field that is used for certain events. \\
\end{tabular}
\caption[\gls{btf} event fields]{A \gls{btf} event contains of eight fields.
An event describes the way in which one system entity is influenced by
another one.}
\label{tab:btf_fields}
\end{table}
A \gls{btf} trace can be persisted in a \gls{btf} trace file. This file
contains of two sections: a meta and a data section. The meta section stands
at the beginning of the file. It contains information such as \gls{btf}
version, creator of the trace file, creation date and time unit used by the
time field. Each meta attribute stands in a separate line in the form
\lstinline{#<attribute name> <attribute definition>}. The data section
contains one \gls{btf} event per line in chronological order. The first event
comes at the beginning of the data section and the last event stands at the end
of the file. Comments are denoted by a \lstinline{#} followed by a space.
\autoref{listing:btf_example} shows an example trace file.
\begin{code}
\begin{lstlisting}[caption={[An example \gls{btf} trace file]A \gls{btf} trace
file contains of two sections. A meta section at the beginning of a file
includes information such as creator, creation date and time unit. It is
followed by a data section that contains one event per line. Comments are
denoted by a number sign followed by a space.},
label={listing:btf_example}]
#version 2.1.4
#creator BTF-Writer (15.01.0.537)
#creationDate 2015-02-18T14:18:20Z
#timeScale ns
0, Sim, 0, STI, S_1MS, 0, trigger
0, S_1MS, 0, T, T_1MS_0, 0, activate
100, Core_0, 0, T, T_1MS_0, 0, start
100, T_1MS_1, 0, R, Runnable_0, 0, start
25000, T_1MS_1, 0, R, Runnable_0, 0, terminate
25100, Core_1, 0, T, T_1MS_0, 0, terminate
# | | | | | | |
# time source | | target | action
# source instance | target instance
# target type
#
# Note that a number sign followed by a space denotes
# a comment. Whitespaces in the data section are ignored.
\end{lstlisting}
\end{code}
\section{BTF Entity Types}
\begin{table}[]
\centering
\begin{tabular}{c|c c }
Category & Entity Type & Type \gls{id} \\
\hline
& Task & T \\
Software & \gls{isr} & I \\
& Runnable & R \\
\hline
& Signal & SIG \\
\gls{os} & Semaphore & SEM \\
& Event & EVENT \\
\hline
& Simulation & SIM \\
Other & Core & Core \\
& Stimulus & STI \\
\hline
& Instruction Block & IB \\
& Electronic Control Unit & ECU \\
Not discussed & Processor & Processor \\
& Memory Module & M \\
& Scheduler & SCHED \\
\end{tabular}
\caption[\gls{btf} entity types]{\gls{btf} entity types can be divided into
three categories: software, \gls{os} and other types. Entity types are
represented by their type \gls{id}. Some types are not relevant for this
thesis and are therefore not discussed.}
\label{tab:entity_overview}
\end{table}
\gls{btf} specifies the entity types that can be used for \gls{btf} events.
Each entity type can be influenced by certain other types and vice versa. The
actions or in other words, the way in which one entity can be influenced by
another, are also defined. Different actions are possible for different entity
types. Entity types can be categorized into software, \gls{os} and other
entity types.
Not all entity types specified by \gls{btf} are discussed in detail as shown in
\autoref{tab:entity_overview}. The entity type instruction block (\emph{IB})
represents a sub fraction of a runnable. This concept is used by simulation
but does not translate to a concept used by a real application.
An electronic control unit (\emph{ECU}) consist of a at least one processor
(\emph{Processor}). This concept allows it to represent a system containing of
multiple processors that communicate with each other. The recording of a multi
system aware hardware trace would required a measurement configuration with
multiple trace tools that are synchronized to each other. The design of such a
setup was not in the scope of this thesis which is why ECU and processor
entities are not discussed.
Memory modules (\emph{M}) can be used to represent different memory sections of
a CPU\@. The \gls{btf} specification does not provide further information
about memory modules. Via hardware tracing the information about which memory
sections are accessed by certain data events becomes available. Since the
specification does not provide further details about how to use memory modules,
no further discussion is possible.
The scheduler (\emph{SCHED}) entity type is used to represent actions executed
by the \gls{os} that relate to the scheduling of task and process instances.
Scheduler events become available implicitly via the respective process
actions.
\subsection{Software Entity Types}
\gls{btf} distinguishes three kinds of software entity types: tasks,
\glspl{isr} and runnables, with the respective type \glspl{id} \emph{T},
\emph{I} and \emph{R}. Tasks and \glspl{isr} are collected under the umbrella
term process. Accordingly, they share the same state and transition model as
shown in \autoref{fig:process_state_chart}.
\begin{figure}[]
\centering
\centerline{\includegraphics[width=1.3\textwidth]{./media/btf/process_state_chart.png}}
\caption[Process state figure]{\gls{btf} specifies more process states than
\gls{osek} (see \autoref{fig:extended_task_state_model}). The additional
states polling and parking are required to represent active waiting. Not
initialized and terminated indicate the beginning and end of a process
lifecycle. The green boxes between the states show the name of the \gls{btf}
action for the respective transition.}
\label{fig:process_state_chart}
\end{figure}
\textbf{Process} instances start in the \emph{not initialized} state. From
there they can be \emph{activated} in order to switch into the \emph{active}
state by a stimulus (\emph{STI}) entity. All state transitions except
\emph{activate} are executed by core (\emph{C}) entities. An active process
can be changed into the \emph{running} state by the core on which the process
is scheduled.
A running process can \emph{preempt}, \emph{terminate}, \emph{poll} and
\emph{wait}. Preemption occurs if another process is scheduled to be executed
on the core. In this case, the process can no longer be executed and changes
into the \emph{ready} state. A ready process \emph{resumes} running once the
core becomes available again. If a process has finished execution it
terminates and switches into the \emph{terminated} state. This finishes the
lifecycle of a process instance.
A process that \emph{polls} a resource switches into the active waiting state
\emph{polling}. A process that \emph{waits} for an event switches into the
passive waiting state \emph{waiting}. A \emph{waiting} process is
\emph{released} into the ready state if one of the requested events becomes
available. If a polled resource becomes available, the task continues running
which is indicated by the \emph{run} action.
A polling process that is removed from the core is \emph{parked} and switched
into the \emph{parking} state. If the polled resource becomes available while
the process is parking it is switched into the ready state. This transition is
called \emph{release\_parking}. Otherwise the process continues polling, once
it is reallocated to the core, which is called \emph{poll\_parking}.
\autoref{tab:process_overview} summarizes process state transitions.
\begin{table}[]
\centering
\begin{tabular}{c c c c}
Current state & Next state & Action & Source Entity Types\\
\hline
not initialized & active & activate & STI \\
active & running & start & C \\
ready & running & resume & C \\
running & ready & preempt & C \\
running & terminated & terminate & C \\
running & polling & poll & C \\
running & waiting & wait & C \\
waiting & ready & release & C \\
polling & running & run & C \\
polling & parking & park & C \\
parking & ready & release\_parking & C \\
parking & polling & poll\_parking & C \\
\end{tabular}
\caption[Process state table]{Process entities can be in different states. A
process instance starts in the not initialized state and finishes in the
terminate state. Each state transition has an unique action name. The
activate action can only be triggered by a stimulus entity. All other
actions can only be triggered by a core entity.}
\label{tab:process_overview}
\end{table}
In addition to state transition actions, \gls{btf} specifies process
notification actions. This actions do not trigger a process state change, but
indicate other events related to a process entity. The \emph{mtalimitexceeded}
action is triggered if more task instances than the allowed maximal value are
activated. If this happens, no new task instance is created. Therefore, a
notification event is necessary to make the event available in the trace.
All other process notification actions are related to migration, the
reallocation of a process from one core to another. \gls{osekos} does not
support process migration since a separate kernel is executed on each core.
Thus migration notifications are not relevant for an \gls{osek} compliant
\gls{os}. Additionally migration actions become available implicitly via the
respective process transition actions. If a process instance is preempted on
one core and resumed on another, the resume event will have a different source
core than the preempt event. Consequently, the related migration event is
known.
\textbf{Runnable} instances start in the not initialized state.
Runnables can be \emph{started} by \glspl{isr} and tasks in order to switch
into the \emph{running} state. A runnable that \emph{terminates} switches into
the \emph{terminated} stated and therefore finishes its lifecycle according to
\autoref{tab:runnable_overview}.
Since a runnable can only be executed from a process context it can not
continue running if the respective process is preempted. For this case the
runnable must be \emph{suspended} and switches into \emph{suspended} state.
Once the process resumes execution the runnable can also \emph{resume}.
\begin{table}[]
\centering
\begin{tabular}{c c c c}
Current state & Next state & Action & Source Entity Types\\
\hline
not initialized & running & start & T, I \\
running & terminated & terminate & T, I \\
running & suspended & suspend & T, I \\
suspended & running & resume & T, I \\
\end{tabular}
\caption[Runnable state table]{All runnable actions can be triggered by task
and \gls{isr} entity types. A runnable lifecycle starts when the runnable
first starts execution and ends when the runnable is terminated. A runnable
is suspended and resumed depending on the process context in which it is
executed.}
\label{tab:runnable_overview}
\end{table}
\subsection{OS Entity Types}
\begin{table}[]
\centering
\begin{tabular}{c c c}
Action & Source Entity Types \\
\hline
read & P \\
write & P, STI \\
\end{tabular}
\caption[Signal actions]{Signals can be read or written. For a write event
the new value is provided via the note field.}
\label{tab:signal_overview}
\end{table}
\gls{os} event types are categorized into signal, semaphore and event types.
Signals are identified by \emph{SIG}, semaphores by \emph{SEM} and events by
\emph{EVENT}.
\textbf{Signal} entities represent variables that are relevant for the analysis
of an application. There are only two signal actions: \emph{read} and
\emph{write} as shown in \autoref{tab:signal_overview}. A signal can be read
by a process entity. This means that the value of a variable is retrieved from
memory. A signal entity does not have a lifecycle, thus the instance counter
value for signals can remain constant.
Write actions can be executed by process and stimulus entities. A write action
means that a new value is assigned to a variable. If this assignment is done
from process context, the respective process entity is the source for the write
event. Otherwise a stimulus entity can be used to represent the source, for
example if a signal is changed by the \gls{os} or a hardware module.
For signal writes, the \gls{btf} note field must be used to denote the value
that was assigned to a variable, usually represented by an integer value in
decimal representation. However, \gls{btf} does not specify in which form the
value must be provided. For read events the note field can optionally be used
to indicate the value of the variable that was accessed.
\textbf{Semaphores} can be used to control access to a common resource in
parallel systems. The basic idea is that a process can request a semaphore,
before it enters a critical section, for example a section that contains access
to shared variables. If the semaphore is free, the request is accepted and the
semaphore will be locked. All requests to a locked semaphore fail, thus no
other process can access the shared variables. When the process leaves the
critical section, it releases the semaphore, which then becomes free for other
resources.
There exist different types of semaphores. A counting semaphore may be
requested multiple times. Every time a counting semaphore is requested, a
counter is incremented and every time a counting semaphore is released, the
same counter is decremented. A counting semaphore is locked once the counter
has reached a predefined value and the initial counter value is zero.
\begin{table}[]
\centering
\begin{tabular}{r l}
Action & Meaning \\
\hline
requestsemaphore & Process requests a semaphore \\
exclusivesemaphore & Process requests a semaphore exclusively \\
assigned & Process is assigned as the owner of a semaphore \\
waiting & Process is assigned as waiting to a locked semaphore\\
released & Assignment from process to semaphore is removed \\
increment & Semaphore counter is incremented \\
decrement & Semaphore counter is decremented \\
\end{tabular}
\caption[Semaphore process actions]{Processes can interact with semaphores
in different ways. If a process requests a semaphore successfully, it is
assigned to the semaphore and the counter is incremented, otherwise a waiting
event is triggered. Once a semaphore is released, the assignment is removed
and the counter is decremented.}
\label{tab:semaphore_process}
\end{table}
A binary semaphore is a specialization of a counting semaphore for which the
maximum counter value is one. A mutex is a binary semaphore that supports an
ownership concept. This means a mutex knows all processes that may request it.
This information allows the implementation of priority ceiling protocols in
order to avoid deadlocks and priority inversion. The \gls{osek} term for mutex
is \emph{resource} as described in \autoref{subsection:osek_architecture}.
\gls{btf} semaphore events can be used to represent the different semaphore
types mentioned above. Semaphore actions can be divided into two categories:
Actions triggered by process instances as shown in
\autoref{tab:semaphore_process} and actions executed by a semaphore entity
itself.
A process request to a semaphore is indicated by \emph{requestsemaphore}. If a
request is successful (the semaphore is not locked), the semaphore counter is
\emph{incremented} and the process is \emph{assigned} to the semaphore. The
\emph{exclusivesemaphore} action represents a semaphore request that only
succeeds, if the semaphore is currently not requested by any other process,
i.e.\ the counter value is zero. If a process fails to request a semaphore and
switches into polling mode, in order to wait for this semaphore, this is
indicated by the \emph{waiting} action. A process that releases a semaphore
\emph{decrements} the semaphore counter and the respective semaphore is
\emph{released}, the process is no longer assigned to it.
\begin{figure}[]
\centering
\centerline{\includegraphics[width=1.2\textwidth]{./media/btf/semaphore_state_chart.png}}
\caption[Semaphore states and actions]{Semaphore entities do not have a
lifecycle. Nevertheless, they must be initialized before they are ready for
the first time. A semaphore can be unlocked or locked. A counting semaphore
can be requested multiple times in which cases it changes into the used state.
If there are no requests the semaphore is free. A semaphore that has at least
as many requests as allowed if full and changes into the locked state.
Further requests in the locked stated result in an overfull action.}
\label{fig:semaphore_state_chart}
\end{figure}
Semaphores do not have a lifecycle, which is why their instant counter remains
constant. Nevertheless, a semaphore must be moved from the \emph{not
initialized} to the \emph{free} state by the \emph{ready} action before it is
requested for the first time as shown in figure
\autoref{fig:semaphore_state_chart}.
A free semaphore is not requested by any processes. Once it is requested for
the first time, the behavior is dependent on the semaphore type. A mutex or
binary semaphore is \emph{locked} and moved into the \emph{full} state. A
counting semaphored is changed into the \emph{used} state which is indicated by
the \emph{used} action. The used action is repeated for a counting semaphore
for each further request and release of the semaphore, as long as the counter
value stays greater than zero and smaller than the maximum value. If the
counter value of a used semaphore becomes zero this semaphore is \emph{freed}.
If the maximum counter value is reached the semaphore state becomes \emph{full}
which is indicated by the \emph{lock\_used} action.
When a full binary semaphore or mutex is released, it is \emph{unlocked} and
becomes free again, while a counting semaphore is changed back to the used
state, indicated by the \emph{unlock\_full} action. A request to a full
semaphore entity results in an \emph{overfull} action and the state is changed
to \emph{overfull}. The overfull state indicates that there is at least one
process polling a semaphore. Each additional request also results in an
overfull action. Once there are no more processes waiting for a semaphore,
this semaphore becomes full again which is indicated by the \emph{full} action.
\autoref{tab:semaphore_semaphore} summarizes semaphore states and their
meaning.
\begin{table}[]
\centering
\begin{tabular}{r|l}
State & Meaning \\
\hline
not initialized & Semaphore is not ready \\
free & No process is assigned to the semaphore \\
used & At least one process is assigned \\
full & Maximum number of processes are assigned \\
overfull & Semaphore is full and there are further requests \\
\end{tabular}
\caption[Semaphore state overview]{Semaphores can be in five different
states. Before a semaphore entity can be used, it must be moved into the
free state. No process is assigned to a free semaphore. A counting
semaphore that has not yet reached its maximum request value, is in used
state. Once no further requests are accepted, a semaphore is full. A full
semaphore that is requested by another process is said to be overfull.}
\label{tab:semaphore_semaphore}
\end{table}
\textbf{Events} are objects used for inter process communication, provided by
the \gls{os}. One process can use an event to notify another one, for example
when a computation finishes or a resource becomes available. Consequently, the
source entity for an event action must be a task or \gls{isr}. Event entities
do not have a lifecycle, therefore, no instance counter value is required.
There exist three event actions: \emph{wait\_event}, \emph{clear\_event} and
\emph{set\_event}. A process, that waits for an event, changes into passive
waiting mode, until the respective event is set. An event can be set by
another process. For the \emph{set\_event} action it is necessary to specify
for which process entity the respective event is set. This information is
provided via the \gls{btf} note field. An event can be cleared by the process
for which the event was set.
\subsection{Other Entity Types}
There are three other entity types: simulation, core and stimulus entities.
The type \gls{id} for simulation is \emph{SIM}, for core \emph{C} and for
stimulus \emph{STI}.
\textbf{Stimul} are used to depict application behavior that cannot be
represented by other entity types. The only stimulus action is \emph{trigger}.
A stimulus can be triggered by process and simulation entities, Once a
stimulus is triggered, it can be used for the actual event, i.e.\ the
activation of a task instance. Multiple stimulus instances can exist in a
system at a certain point in time. Thus the instance counter field must be
used for stimulus events.
\begin{table}[]
\centering
\begin{tabular}{r|l}
Action & Meaning \\
\hline
finalize & Initialization of system environment completed \\
error & An error record during trace recording \\
tag & Transmit meta information about the source entity \\
description & Provide a description for the source entity \\
\end{tabular}
\caption[Simulation actions]{The simulation entity can be used to provide
meta information about the trace environment. Simulation is misleading since
it must also be used in a \gls{btf} trace recorded on hardware. That is why
the term \emph{system} is more appropriate. The system entity can be used to
trigger stimulus instances. For tag and description events the note field
is used to provide meta information.}
\label{tab:simulation_actions}
\end{table}
\textbf{Simulation} entities are used to provide meta information in a
simulated \gls{btf} trace as shown in \autoref{tab:simulation_actions}.
Nevertheless, it can, and must, also be used in a hardware trace. For example,
a stimulus entity that activates a task can be triggered by a process or
simulation entity. In the first case, the resulting \gls{btf} events represent
an inter-process activation. However, in the case that a process gets
activated by an alarm, process is not the correct source type and simulation
must be used instead. Since a simulation entity does not make too much sense
in a hardware trace, \emph{system} is a more appropriate term to denote the
concept represented by the simulation entity type.
\textbf{Core} entities are used to provide an execution context for process
entities. Only one process can be allocated to a core at the same time. Core
entities do not have a lifecycle.

495
content/system_trace.tex Normal file
View File

@@ -0,0 +1,495 @@
\section{System Trace}
\label{chapter:btf}
A trace is defined as a sequence of events. Events depict a change in the
state of a system and can be represented on different levels of abstraction.
These are discussed in more detail in \autoref{section:trace_measurement}.
For the timing analysis of embedded multi-core real-time systems a trace on
system level is required.
Tools that analyze or visualize traces must be able to interpret the recorded
events. For example, the software that interacts with hardware trace devices
must be able to understand the hardware events that are generated on-chip.
Otherwise it is not possible to transform the hardware events into higher level
software events. For that reason a well-defined format for events is required
for further processing of recorded traces.
Depending on the goal pursued with a trace measurement, one level of
abstraction can be more appropriate than another. On the one hand, a software
engineer who implements a feedback control system is mainly interested in the
functions and variables that correspond to that particular task. A system
engineer on the other hand, who integrates a variety of different modules into
a single application, is not interested in the details of each individual
module. Instead the functionality of the system as a whole is of interest.
\subsection{BTF Specification}
A trace on system level can be used to analyze timing, performance, and
reliability of an embedded system. \glsdesc{btf} (\gls{btf}) \cite{btf} was
specified to support these use cases. It assumes a signal processing system
where one entity influences another entity in the system. This means an event
does not only contain which system state changes but also the source of that
change. For example, an observed event on system level could be the activation
of a task with the corresponding timestamp. Then a \gls{btf} event
additionally contains the information that the task activation was triggered by
a certain alarm.
Let $k$ be an index in $\mathbb{N}_{0}$ denoting an individual event
occurrence then a \gls{btf} event can be defined as an octuple
\begin{equation}
\label{eq:btf_trace}
b_{k} = (t_k,\, \Psi_k,\, \psi_k,\, \iota_k,\, T_k,\, \tau_k,\, \alpha_k,\, \nu_k)
\end{equation}
where each element maps to a \gls{btf} field: $t_k$ is the \emph{timestamp},
$\Psi_k$ is the \emph{source}, $\psi_k$ is the \emph{source instance},
$\iota_k$ is the \emph{target type}, $T_k$ is the \emph{target}, $\tau_k$ is
the \emph{target instance}, $\alpha$ is the event \emph{action} and $\nu_k$ is
an optional \emph{note}.
A \gls{btf} trace can then be defined as a sequence of \gls{btf} events where
$n \in \mathbb{N}_{0}$ is the number of events in the trace:
\begin{equation}
B = (b_1, b_2, \dots, b_n)
\end{equation}
\begin{table}[]
\centering
\begin{tabular}{r|l}
Field & Meaning \\
\hline
time $(t)$ & Timestamp relative to a certain point in time. \\
source $(\Psi)$ & Entity that caused an event. \\
source instance $(\psi)$ & Entity instance that caused an event. \\
target type $(\iota)$ & Type of the entity that is influenced by an event. \\
target $(T)$ & Entity that is influenced by an event. \\
target instance $(\tau)$ & Entity instance that is influenced by an event. \\
action $(\alpha)$ & The way in which target is influenced by source. \\
note $(\nu)$ & An optional field that is used for certain events. \\
\end{tabular}
\caption[\gls{btf} event fields]{A \gls{btf} event consists of eight fields.
An event describes the way in which one system entity is influenced by
another one.}
\label{tab:btf_fields}
\end{table}
A \gls{btf} event can be represented textually as a comma-separated list where
each field maps to an element as shown in the following listing.
\vspace{1cm}
\begin{lstlisting}
12891, TASK_200MS, 3, SIG, EngineSpeed, 0, write, 42
\end{lstlisting}
\vspace{1cm}
The first field (\lstinline{12891}) represents the timestamp of the event. A
\gls{btf} trace contains the chronological order of events that occurred in a
system. Therefore, for each timestamp $t_k \in \mathbb{N}_{0}$ in a trace it
holds that $t_{k} \leq t_{k+1}$. All timestamps within the same trace must be
specified relative to a certain point in time, that can be chosen arbitrarily.
Hence, neither trace nor system start must occur at $t_0 = 0$. The time period
between two events $b_{k}$ and $b_{k+1}$ can be calculated as $\Delta t =
t_{k+1} - t_{k}$. If not specified otherwise, the unit for time is
nanoseconds.
A \gls{btf} event represents the notification of one entity by another. Each
entity has an unique name. In the previous example, the source entity $\Psi$
has the name \lstinline{TASK_200MS} and the target entity $T$ is called
\lstinline{EngineSpeed}.
The fourth field \lstinline{SIG} is the short representation of the target
entity type $\iota$. \autoref{tab:entity_overview} gives an overview of all
entity types and their corresponding short \glspl{id}. Entity types are
discussed in more detail in \autoref{subsection:btf_entity_types}. In this
example, the target entity \lstinline{EngineSpeed} is a signal. The source
entity type is not part of a \gls{btf} event.
Some entities, tasks, \glspl{isr}, runnables, and stimuli have a lifecycle.
This means at a certain point in time an entity becomes active in the system
and eventually it leaves the system. For example, the lifecycle of a task
starts with its activation and ends when it terminates. If \glspl{mta} are
allowed for an application, it is possible that multiple \emph{instances} of a
task are active at the same time. For those cases where multiple instances
of an entity are currently active, it is consequently not clear to which
instance of the entity the event refers.
Instance counter fields $\psi$ and $\tau$ are used to distinguish between
multiple instances of the same entity. The counters are integer values $\psi,
\tau \in \mathbb{N}_{0}$ that are incremented for each new entity becoming
active in the system. The first instance of an entity gets the counter value
$0$. \lstinline{TASK_200MS} has an instance counter value of \lstinline{3}
which means the event refers to the fourth instance of this entity. For
entities that do not have a lifecycle like signals, the counter field is not
relevant and $0$ can be used as a placeholder value.
The seventh field $\alpha$ represents the way in which the target entity is
influenced by the source entity. In this example \lstinline{TASK_200MS}
writes a new value to the signal entity \lstinline{EngineSpeed}. Depending on
source and target entity type, different actions are allowed by the
specification as discussed in \autoref{subsection:btf_actions}.
For signal write events the note field $\nu$ is used to denote the value that
is written to the signal in this case \lstinline{42}. The note field is only
required for certain events. \autoref{tab:btf_fields} summarizes the meaning
of the different \gls{btf} fields.
A \gls{btf} trace can be persisted in a \gls{btf} trace file. This file
contains two parts: a meta and a data section. The meta section is written at
the beginning of the file. It contains general information on the trace such
as \gls{btf} version, creator of the trace file, creation date, and time unit
used by the time field. Each meta attribute uses a separate line, starting
with a \lstinline{#}, followed by the attribute name, a space, and the
attribute definition.
\begin{code}
\begin{lstlisting}[caption={[An example \gls{btf} trace file]A \gls{btf} trace
file contains of two sections. A meta section at the beginning of a file
includes information such as creator, creation date and time unit. It is
followed by a data section that contains one event per line. Comments are
denoted by a number sign followed by a space.},
label={listing:btf_example}]
#version 2.1.4
#creator BTF-Writer (15.01.0.537)
#creationDate 2015-02-18T14:18:20Z
#timeScale ns
0, Sim, 0, STI, S_1MS, 0, trigger
0, S_1MS, 0, T, T_1MS_0, 0, activate
100, Core_0, 0, T, T_1MS_0, 0, start
100, T_1MS_1, 0, R, Runnable_0, 0, start
25000, T_1MS_1, 0, R, Runnable_0, 0, terminate
25100, Core_1, 0, T, T_1MS_0, 0, terminate
\end{lstlisting}
\end{code}
In the data section one \gls{btf} event is written per line in chronological
order. The first event of a trace is located directly after the meta section
and the last event at the end of the file. Comments are denoted by a
\lstinline{#} followed by a space. \autoref{listing:btf_example} shows an
example trace file.
\subsection{BTF Entity Types}
\label{subsection:btf_entity_types}
As shown in \autoref{tab:entity_overview} \gls{btf} specifies fourteen entity
types that can be classified into five categories: environment, software,
hardware, operating system, and information. Some entity types are not
relevant for this thesis and therefore only discussed briefly. The actions
or in other words the way in which one entity can be influenced by another
are defined for each entity type as discussed in
\autoref{subsection:btf_actions}. Actions for types that are classified as not
relevant are not considered.
\begin{table}[]
\centering
\begin{tabular}{c|c c c}
Category & Entity Type & Type \gls{id} & Relevant \\
\hline
Environment & Stimulus & STI & X \\
\hline
& Task & T & X \\
Software & \gls{isr} & I & X \\
& Runnable & R & X \\
& Instruction Block & IB & \\
\hline
& Electronic Control Unit & ECU & \\
Hardware & Processor & Processor & \\
& Core & C & X \\
& Memory Module & M & \\
\hline
& Scheduler & SCHED & \\
Operating System & Signal & SIG & X \\
& Semaphore & SEM & X \\
& Event & EVENT & X \\
\hline
Information & Simulation & SIM & X \\
\end{tabular}
\caption[\gls{btf} entity types]{\gls{btf} entity types can be divided into
five categories. Types that are relevant in the context of this thesis are
marked by an X.}
\label{tab:entity_overview}
\end{table}
\textbf{Environment} contains only the stimulus entity type. Stimuli are used
to depict application behavior that cannot be represented by other entity
types. A stimulus can be used to activate a task or \gls{isr} and to set a
signal value. Multiple stimulus instances can exist in a system at a certain
point in time. Thus, the instance counter field is required for stimulus
entities.
\textbf{Software} contains the task, \gls{isr}, runnable, and instruction block
types. Tasks and \glspl{isr} summarized by the term process are containers
for application software and discussed in \autoref{section:osekvdxos}.
Runnable is a term established by \gls{autosar} and relates to the concept of C
type functions. A runnable can be executed from the context of processes and
contains application specific functionality. Multiple runnables can be active
in a system at the same time for example, if the same runnable is executed by
two different tasks allocated to distinct cores. Hence, an instance counter is
required for runnable entities.
Instruction blocks are used to represent execution time within the context of
runnables. Since these execution times become available implicitly via the
corresponding runnable events, the addition of instruction blocks to a
\gls{btf} trace is optional and does not provide any immediate benefits.
\textbf{Hardware} contains the electronic control unit (ECU), processor, core,
and memory module types. An ECU consists of one or more processors. This
allows it to represent a multi-processor system. Generally, tracing only
supports the recording of a single processor. Multi-processor setups require a
way to synchronize the measurement between multiple trace measurement tools.
The design of such a setup is not in the scope of this thesis.
A processor is composed of one or more cores and recording multiple cores on
the same chip is feasible via tracing. Cores are necessary to map software and
\gls{os} events to the corresponding hardware entities. Since this information
is important for the analysis of embedded systems, cores are relevant for this
thesis.
Memory modules model different memory sections on a chip. They allow it to
represent memory related processes on the CPU such as access times to variables
or cache misses. According to Helm \cite{christianmaster}, direct measurement
of memory access times is not possible. Instead, dedicated code must be added
to the application in order to determine the execution times for different
memory access operations. Due to the intrusiveness of this approach it is not
feasible for real applications. Therefore, memory modules are not supported in
this thesis.
\textbf{Operating System} covers scheduler, signal, semaphore, and event
entity types. The scheduler entity type is used to represent actions executed
by the \gls{os} that relate to the scheduling of process instances. Scheduler
events become available implicitly via the respective process actions and are
thus not considered in this thesis.
Signals represent access to variables that are relevant for the analysis of an
application. Consequently, signal events must be added to a \gls{btf} trace
that is recorded from hardware.
Semaphores entities are used to control access to common resources in parallel
systems. A process can request a semaphore before it enters a critical
section, e.g.\ a section that contains an access to a memory region that is
vulnerable to race conditions. If the semaphore is free the request is
accepted, the semaphore is locked and all subsequent requests fail. Once the
process has left the critical section it releases the semaphore.
Events are objects for inter-process communication provided by the \gls{os}.
One process can use an event to notify another one for example, when a
computation finishes or a resource becomes available. Event entities do not
have a lifecycle therefore, no instance counter value is required.
\textbf{Information} contains only the simulation entity type. This entity
type has two purposes. Firstly, it can be used to provide information about
errors that occurred during trace recording. Secondly, it is required to
trigger stimulus events. Since stimulus events are mandatory to represent task
activations by non process objects, the simulation entity must be considered in
the context of this thesis. Because \emph{simulation} does not make sense in a
trace recorded from hardware \emph{system} can be used as a more appropriate
term.
\subsection{BTF Actions}
\label{subsection:btf_actions}
\gls{btf} specifies different actions. The available actions are dependent on
the source and target entity types of the respective event.
\textbf{Stimuli} only support the \emph{trigger} action. A stimulus can be
triggered by process and simulation entities. Once a stimulus is triggered it
can be used for the actual event: the activation of a task or \gls{isr} or to
set the value of a signal.
\begin{figure}[]
\centering
\centerline{\includegraphics[width=\textwidth]{./media/btf/process_state_chart.png}}
\caption[Process state figure]{\gls{btf} \cite{btf} specifies more process
states than \gls{osek} (compare \autoref{fig:extended_task_state_model}). The
additional states polling and parking are required to represent active
waiting. Not initialized and terminated indicate the beginning and end of a
process lifecycle. The green boxes between the states show the name of the
\gls{btf} action for the respective transition.}
\label{fig:process_state_chart}
\end{figure}
\textbf{Process} entities support the actions shown in
\autoref{fig:process_state_chart}. A process instance starts in the \emph{not
initialized} state. From there it can be \emph{activated} in order to switch
into the \emph{active} state by a stimulus entity. All state transitions
except \emph{activate} are executed by core entities. An active process is
changed into the \emph{running} state as soon as it is scheduled by the
\gls{os}.
A running process can \emph{preempt}, \emph{terminate}, \emph{poll}, and
\emph{wait}. Preemption occurs if another process is scheduled to be executed
on the core. In this case, the current process changes into the \emph{ready}
state. A ready process \emph{resumes} running once the core becomes available
again. If a process finishes execution it \emph{terminates} and switches into
the \emph{terminated} state. This finishes the lifecycle of a process
instance.
A process that \emph{polls} a resource switches into the active waiting state
\emph{polling}. If the resource becomes available, the process continues
running which is indicated by the \emph{run} action. A process that
\emph{waits} for an event switches into the passive waiting state
\emph{waiting}. A \emph{waiting} process is \emph{released} into the ready
state if one of the requested events becomes available.
A polling process that is removed from the core is \emph{parked} and switched
into the \emph{parking} state. If the resource becomes available while the
process is parking it is switched into the ready state. This transition is
called \emph{release\_parking}. Otherwise the process continues polling, once
it is reallocated to the core which is called \emph{poll\_parking}.
In addition to state transition actions, \gls{btf} specifies process
notification actions. These actions do not trigger a process state change but
indicate other events related to a process entity. The \emph{mtalimitexceeded}
action is triggered if more process instances than allowed are activated in
parallel. If this happens, no new task instance is created. Therefore, a
notification event is necessary to make the event visible in the trace.
All other process notification actions are related to migration the
reallocation of a process from one core to another. \gls{osekos} does not
support process migration since a separate kernel is executed on each core.
Thus migration notifications are not relevant for an \gls{osek} compliant
\gls{os}. Additionally, migration actions become available implicitly via the
respective process transition actions. If a process instance is preempted on
one core and resumed on another, the resume event has a different source core
than the preempt event. Consequently, the related migration event is known.
\begin{figure}[]
\centering
\centerline{\includegraphics[width=0.8\textwidth]{./media/btf/runnable_state_chart.png}}
\caption[Runnable state figure]{\gls{btf} runnable states and state
transitions \cite{btf}.}
\label{fig:runnable_state_chart}
\end{figure}
\textbf{Runnable} instances start in the \emph{not initialized} state as shown
in \autoref{fig:runnable_state_chart}. Runnables can be \emph{started} by
\glspl{isr} and tasks in order to switch into the \emph{running} state. A
runnable instance that \emph{terminates} switches into the \emph{terminated}
stated and therefore finishes its lifecycle.
Because a runnable can only be executed from process context, it can not
continue running if the respective process is preempted. In this case the
runnable must be \emph{suspended}. Once the process resumes execution the
runnable can also \emph{resume}.
\textbf{Core} entities are used to provide an execution context for process
entities and cannot be used as a target entity themselves. Consequently, no
\gls{btf} core actions are specified. Only one process can be allocated to a
core at the same time and core entities do not have a lifecycle.
\textbf{Signal} entities can be influenced by two actions: \emph{read} and
\emph{write}. A signal can be read within the context of a process entity.
This means that the value of a variable is retrieved from memory. A signal
entity does not have a lifecycle thus, the instance counter value for signals
can remain constant.
Write actions can be executed by process and stimulus entities. They indicate
that a new value is assigned to a variable. If this assignment is done from
process context, the respective process entity is the source for the write
event. Otherwise, a stimulus entity can be used to represent the source for
example, if a signal is changed by the \gls{os} or a hardware module.
For signal writes, the note field must denote the value that was assigned to a
variable. For read events the note field can optionally indicate the value of
the variable that was accessed.
\textbf{Semaphores} can be categorized into different types. Counting
se\-ma\-phores can be requested multiple times. They have an initial counter
value of zero. For every request, this counter is incremented and every time
it is released the value is decremented. A counting semaphore is locked once
the counter has reached a predefined value.
A binary semaphore is a specialization of a counting semaphore for which the
maximum counter value is one. A mutex is a binary semaphore that supports an
ownership concept. This means a mutex knows all processes that may request it.
This information allows the implementation of priority ceiling protocols in
order to avoid deadlocks and priority inversion. The \gls{osek} term for mutex
is \emph{resource}, resources are discussed in
\autoref{subsection:osek_architecture}.
\gls{btf} semaphore events can represent all mentioned semaphore types.
Semaphore actions can be divided into two categories: actions triggered by
process instances as shown in \autoref{tab:semaphore_process} and actions
executed by a semaphore entity itself as shown in
\autoref{fig:semaphore_state_chart}.
\begin{table}[]
\centering
\begin{tabular}{r l}
Action & Meaning \\
\hline
requestsemaphore & Process requests a semaphore. \\
exclusivesemaphore & Process requests a semaphore exclusively. \\
assigned & Process is assigned as the owner of a semaphore. \\
waiting & Process is assigned as waiting to a locked semaphore.\\
released & Assignment from process to semaphore is removed. \\
increment & Semaphore counter is incremented. \\
decrement & Semaphore counter is decremented. \\
\end{tabular}
\caption[Semaphore process actions]{Processes can interact with semaphores in
different ways. If a process requests a semaphore successfully, it is
\emph{assigned} to the semaphore and the counter is \emph{incremented},
otherwise a \emph{waiting} event is triggered. Once a semaphore is
\emph{released}, the assignment is removed and the counter is
\emph{decremented}.}
\label{tab:semaphore_process}
\end{table}
\begin{figure}[]
\centering
\centerline{\includegraphics[width=\textwidth]{./media/btf/semaphore_state_chart.png}}
\caption[Semaphore states and actions]{\gls{btf} \cite{btf} semaphore entities
do not have a lifecycle. Nevertheless, they must be \emph{initialized} before
they are ready for the first time. A semaphore can be \emph{unlocked} or
\emph{locked}. A counting semaphore can be requested multiple times in which
case it changes into the \emph{used} state. If there are no requests the
semaphore is \emph{free}. A semaphore that has at least as many requests as
allowed is \emph{full} and changes into the \emph{locked} state. Further
requests in the locked stated result in an \emph{overfull} action.}
\label{fig:semaphore_state_chart}
\end{figure}
A process request to a semaphore is indicated by \emph{requestsemaphore}. If a
request is successful the semaphore counter is \emph{incremented} and the
process is \emph{assigned} to the semaphore. The \emph{exclusivesemaphore}
action represents a semaphore request that only succeeds, if the semaphore is
currently not requested by any other process, i.e. the counter value is zero.
If a process fails to request a semaphore and switches into polling mode,
indicated by the \emph{waiting} action. A process that releases a semaphore
\emph{decrements} the semaphore counter and the respective semaphore is
\emph{released}, the process is no longer assigned to it.
Semaphores do not have a lifecycle which is why their instant counter remains
constant. Nevertheless, a semaphore must be moved from the \emph{not
initialized} to the \emph{free} state by the \emph{ready} action before it is
requested for the first time.
A free semaphore is not requested by any process. At the first request the
behavior is dependent on the semaphore type. A mutex or binary semaphore is
\emph{locked} and moved into the \emph{full} state. A counting semaphored is
changed into the \emph{used} state which is indicated by the \emph{used}
action. The used action is repeated for a counting semaphore for each further
request or release as long as the counter value stays greater than zero and
smaller than the maximum value. If the counter value of a used semaphore
becomes zero this semaphore is \emph{freed}. If the maximum counter value is
reached the semaphore state becomes \emph{full} which is indicated by the
\emph{lock\_used} action.
When a full binary semaphore or mutex is released, it is \emph{unlocked} and
becomes free again, while a counting semaphore is changed back to the used
state, indicated by the \emph{unlock\_full} action. A request to a full
semaphore entity results in an \emph{overfull} action and the state is changed
to \emph{overfull}. The overfull state indicates that there is at least one
process polling a semaphore. Each additional request also results in an
overfull action. Once there are no more processes waiting for a semaphore,
this semaphore becomes \emph{full} again.
\textbf{Events} can be influenced by three different actions. If a process
starts waiting for an event, this is indicated by the \emph{wait\_event}
action. Another process can set an event via the \emph{set\_event} action.
For this action it is necessary to provide the entity for which the event is
set via the \gls{btf} note field. An event can be cleared by the process for
which the event was set which is indicated by \emph{clear\_event}.

450
content/testbench.tex Normal file
View File

@@ -0,0 +1,450 @@
\section{Evaluation Test Bench}
\label{section:evaluation_test_bench}
To make the results of the validation process comprehensible and reproducible
for others it is important to document the hardware and software setup, the
configuration of all tools in use, as well as the ways in which the traces are
compared.
\subsection{Software Setup}
\textbf{Simulation} is used to validate the \gls{btf} traces obtained from
hardware via tracing and transformation. It allows analyzing of embedded
real-time systems by generating an event trace. A simulation is easy to
configure and executable without hardware. This is an advantage in
the early design stages of an application when the final target platform is
not yet defined.
Advanced simulation tools allow it to take platform dependent timing behavior
into account. It is possible to select the \gls{os} and processor platform in
use. Therefore, more accurate simulation results can be achieved. For
example, memory access times \cite{christianmaster} and timing overheads caused
by \gls{os} service routines \cite{maxmaster} can be taken into consideration.
\glsdesc{ta} provides the simulation software used for validation
\cite{tasimulator}. The \gls{ta} Simulator is based on a discrete-event system
simulation approach \cite{cassandras2008introduction, banks2000dm}. It has
already been used successfully in research projects to evaluate scheduling
algorithms in multi-core systems \cite{deubzer2011robust}, to examine
synchronization protocols \cite{alfranseder2013modified}, and to validate
optimization algorithms for embedded applications \cite{stefanmaster}. In this
thesis version {15.02.1} of the \gls{ta} Simulator is in use.
\begin{figure}[]
\centering
\begin{tikzpicture}[]
\tikzstyle{level 1}=[sibling distance=30mm]
\tikzstyle{level 2}=[sibling distance=40mm]
\node{RTE Model}
child {node {Hardware Model}}
child {node {OS Model}}
child {node {Software Model}}
;
\end{tikzpicture}
\caption[RTE model parts]{A \gls{rte} model consists of a hardware, an
\gls{os} and a software part.}
\label{fig:rte_model}
\end{figure}
\begin{figure}[]
\centering
\begin{tikzpicture}[]
\tikzstyle{level 1}=[sibling distance=20mm]
\tikzstyle{level 2}=[sibling distance=20mm]
\node{Software Model}
child {node {Processes}}
child {node {Runnables}}
child {node {Signals}}
child {node {OS Events}}
child {node {Stimuli}}
;
\end{tikzpicture}
\caption[Software model parts]{The software model represents the entities of
an application that are executed by the \gls{os} and the hardware.}
\label{fig:software_model}
\end{figure}
\textbf{Timing Models} describe the architecture and timing of an embedded
system. Model based development is a software development paradigm where the
design of an application is created in form of a timing model. This can be
done before the actual application software is implemented. Based on the
timing model requirements and constraints can be specified and validated via
simulation.
Timing models can provide different levels of granularity depending on the use
case. \gls{ta} uses the \glsdesc{rte} (\gls{rte}) model format which consists
of three parts as shown in \autoref{fig:rte_model}.
The hardware model includes the processor with all cores, quartzes, and memory
modules. Quartzes are used as a clock source for cores and memory modules.
Memory modules can be connected with each other and to the processor cores to
represent the architecture of the real chip. Vendor specific hardware models
are available for certain processor families for example, the Infineon Aurix
and the Freescale Matterhorn.
The \gls{os} model defines the scheduling policy for an application as well as
\gls{os} related timing overheads. Implementation of service routines varies
depending on the \gls{os} vendor. Consequently, the timing overhead resulting
from this routines is also different which makes it necessary to take their
runtime into account in order to get more accurate simulation results. Vendor
specific \gls{os} models are available for certain \glspl{os} for example,
Elektrobit Autocore OS \cite{autocore}.
\begin{figure}[]
\centering
\begin{tikzpicture}[]
\tikzstyle{level 1}=[sibling distance=20mm]
\tikzstyle{level 2}=[sibling distance=20mm]
\tikzstyle{level 3}=[sibling distance=30mm]
\node{Application}
child {node {Task}}
child {node {Task}
child {node {Runnable}}
child {node {Runnable}
child {node {Instructionblock}}
child {node {Signal Read}}
child {node {Instructionblock}}
child {node {Signal Write}}
}
child {node {\lstinline{ActivateTask}}}
child {node {\lstinline{TerminateTask}}}
}
child {node {Task}}
;
\end{tikzpicture}
\caption[Software model hierarchy]{The software model allows it to represent
the runtime behavior of an application. All relevant software entities are
part of the system model and stand in relation to each other. For example, a
task can call a runnable which itself writes a signal value and runs for a
certain amount of processor instructions which is represented by an
instruction block.}
\label{fig:software_hierarchy}
\end{figure}
The software model represents how hardware and \gls{os} are used by an
application. Hardware and \gls{os} model remain the same for all tests and
only the software part is changed depending on the different test scenarios.
\autoref{fig:software_model} depicts the system entities that are part of the
software model.
Processes and runnables are ordered in a hierarchical structure as shown in
\autoref{fig:software_hierarchy}. Processes can call system routines and
runnables, while runnables can access signals, request, and release semaphores
and execute instruction blocks. The latter represents a certain number of clock
cycles required to execute a code section. It is required to mimic the runtime
behavior of a real application. The number of instructions taken by an
instruction block can be configured to be static or to vary depending on a
specific distribution, e.g., Weibull distribution.
Stimuli are used to activate process entities. Similar to alarms they can
activate processes periodically or only once. Additionally, it is possible to
trigger stimuli to represent more complex activation patterns for example,
arrival curves. Since runtime and activation patterns based on random
distributions are tough to represent in C code instruction blocks and stimuli
with constant values are used for the test models.
\textbf{Code Generation} is used to create C code based on the timing model of
an application. A template based model export was specified and implemented in
the context of this thesis. The solution is already in production and allows
it to create C code and the corresponding \gls{oil} files automatically.
The idea is to iterate over all software entities and create the appropriate
code dependent on the entity type. Transformation of most model entities is
straightforward. Runnable calls map to function calls in C. A signal read
access occurs if one signal is assigned to another variable. Accordingly, a
write access is represented by assigning a value to a signal. Task, event, and
semaphore actions are created based on the respective \gls{osek} service
routines discussed in \autoref{section:osekvdxos}.
An instruction block is the only software model entity that cannot be mapped to
C code straightforwardly. As discussed before, an instruction block represents
a certain amount of clock cycles required to execute a code section. Normally,
this value is set based on measurement results or empirical values from other
applications. For code generation it is necessary to create code whose
execution takes the same amount of clock cycles as specified in the model.
\begin{code}
\begin{lstlisting}[caption={[Instructionblock]
The function takes the specified amount of clock cycles to be executed.
This code is dependent on hardware and compiler in use and must
therefore be adapted to other platforms.},
label={listing:instructionblock}]
void executeInstructionsConst(int clockCycles) {
int i;
clockCycles /= 2;
for (i = 0; i < clockCycles; i++) {
__asm("nop");
__asm("nop");
}
}
\end{lstlisting}
\end{code}
The obvious way to do so is a for loop however, the exact code is dependent on
compiler and hardware. \autoref{listing:instructionblock} shows the code
necessary to get the desired behavior for the hardware used in this thesis. It
works because the Infineon Aurix processor family features zero overhead loops.
This means a for loop with one \lstinline{nop} instruction takes exactly one
clock cycle because loop condition check, loop incrementation, and loop content
are executed in parallel.
It is important to add multiple \lstinline{nop} instructions per loop cycle.
The Aurix trace device implements a compressed program flow trace. This means
trace messages are only created for certain function events. Since the
\lstinline{loop} assembly instructions is one of the commands that cause the
creation of a trace message, a loop with a single \lstinline{nop} would cause
the trace buffer to overflow if the value of \lstinline{clockCycles} exceeds a
certain value. By adding additional \lstinline{nop} commands less trace
messages are created per time unit and the function events can be transmitted
off-chip without overflowing.
\textbf{\glsdesc{ee}} is an \gls{osek} compliant real-time operating system.
It is free of charge and open-source which makes it an excellent choice for
this thesis. Without access to the \gls{os} internal code creation of many
\gls{btf} events would not have been feasible. The \gls{ee} software packet
contains the complete \gls{os} source code as well as RT-Druid, the code
generation tool to create \gls{os} specific source code from the \gls{oil}
file. In this thesis the \glsdesc{ee} and RT-Druid 2.4.0 release is used.
\begin{code}
\begin{lstlisting}[caption={[\gls{ee} \gls{oil} config] Subset of the \gls{ee}
\gls{oil} \gls{os} attributes used for validation. Attributes that are not
mentioned are set to the default value described in the \gls{ee} RT-Druid
reference manual.},
label={listing:oilconfig}]
EE_OPT = "EE_EXECUTE_FROM_RAM";
EE_OPT = "EE_ICACHE_ENABLED";
EE_OPT = "EE_DCACHE_ENABLED";
REMOTENOTIFICATION = USE_RPC;
CFLAGS = "-O2";
STATUS = EXTENDED;
ORTI_SECTIONS = ALL;
KERNEL_TYPE = ECC2;
COMPILER_TYPE = GNU;
\end{lstlisting}
\end{code}
\autoref{listing:oilconfig} shows the \gls{oil} attributes set for validation.
All attributes that are not mentioned take their default value as documented by
the RT-Druid reference manual \cite{rtdruidref}. The test applications are
executed from RAM, instruction and data caching is enabled, and the
\lstinline{O2} optimization level is configured. Inter-core communication is
implemented via remote procedure calls. All \gls{orti} attributes and extended
error codes are logged by the \gls{os}. The configuration is created in a way
that allows maximum traceability combined with decent performance.
Consequently, a similar configuration could also be used in a production
system.
The \textbf{Hightec Compiler} \cite{hightec} is used to compile the C code
generated by code generation and RT-Druid. It is based on GCC and \gls{ee}
generates appropriate makefiles automatically if \lstinline{GNU} is set as
compiler. For the tests Hightec Compiler v4.6.5.0 is used.
\textbf{TRACE32} \cite{trace32} is used as the hardware trace host software.
Configuration of this part of the test setup is the most complex. Different
vendor specific properties, like the number of processor observation blocks,
must be taken into consideration to create a setup that produces optimal
results. The used hardware and the corresponding configuration is discussed in
the next section.
\subsection{Hardware Setup}
\label{subsection:hardware_setup}
An \textbf{Infineon TriBoard TC298} evaluation board assembled with the
Infineon \textbf{SAK-TC298TF-128} microcontroller is used for evaluation. This
board provides an \glsdesc{imds} together with an \glsdesc{agbt}. According to
\autoref{tab:trace_tool_overview} and \autoref{tab:interfaces} this setup
allows for optimal trace performance.
\begin{code}
\begin{lstlisting}[caption={[\gls{ee} ECU config] \gls{ee} ECU config to
support the Infineon TC27x microcontroller family and the TC2X5 evaluation
board. Source code changes are necessary to support the hardware used in this
thesis.},
label={listing:ecu_config}]
MCU_DATA = TRICORE {
MODEL = TC27x;
};
BOARD_DATA = TRIBOARD_TC2X5;
\end{lstlisting}
\end{code}
\gls{ee} provides support for the Infineon TC27x processor family which can be
activated in the \gls{oil} file as shown in \autoref{listing:ecu_config}.
TC27x and TC29x are quite similar. Nevertheless, it is important to adapt the
configuration to the TC298TF processor. This is done by changing the includes
in \lstinline{./cpu/tricore/inc/ee_tc_cpu.h} from
\lstinline{<tc27xa/Ifx_reg.h>} to \lstinline{<tc29xa/Ifx_reg.h>}. The layout
of the evaluation board is the same.
Based on \lstinline{MCU_DATA} \gls{ee} configures the controller in the correct
way during system initialization. The \gls{oil} \lstinline{CPU_CLOCK}
attribute can be used to set the desired CPU frequency. The configuration done
by \gls{ee} is sufficient to put the controller into a usable state. However,
there are problems regarding the frequency of the Multi-Core Debug System
($f_{mcds}$). The TC298TF can run at a frequency up to \unit[300]{MHz}.
\gls{ee} does not configure the MCDS clock divisor at all and consequently
$f_{mcds}$ is equal to the system frequency. However, the TC29xA user manual
states that the maximum allowed value for $f_{mcds}$ is \unit[160]{MHz}
\cite{tc29xa}.
\begin{figure}[]
\centering
\includegraphics[width=\textwidth]{./media/eval/clocks.png}
\caption[Evaluation clock configuration]{Correct clock settings are essential
to record valid hardware traces for the Infineon TC298TF microcontroller. The
multi-core debug system frequency must be lower or equal to \unit[160]{MHz}
and the ratio between CPU and MCDS frequency must be $1:1$.}
\label{fig:clocks}
\end{figure}
Incorrect clock configuration may result in data and function events being
dropped randomly. According to the manual it is necessary to set the
$f_{system}$ to $f_{mcds}$ ratio to $2:1$ to avoid this problem. Despite using
the proclaimed configuration event dropping still occurred during the
validation. After consultation with the hardware experts from Lauterbach GmbH
it turned out that a ratio of $1:1$ between system and MCDS clock is the only
way to guarantee the reception of all trace events. Thus, the \gls{ee} clock
configuration must not be changed, but the system frequency must be smaller or
equal to \unit[160]{MHz}. \autoref{fig:clocks} shows a configuration with a
system frequency of \unit[100]{MHz} as used in this thesis.
The \textbf{PowerTrace II} by Lauterbach is used for trace recording. \gls{ee}
creates so called Lauterbach PRACTICE Scripts \cite{cmmref} also called cmm
scripts during the compilation process. These scripts can be used to operate
the TRACE32 software automatically. The generated scripts by \gls{ee} are
inadequate for the requirements in this thesis. Thus, it is necessary to
improve the scripts in a way that allows continues data and function trace as
shown in \autoref{listing:trace32_config}.
\begin{code}
\begin{lstlisting}[caption={[TRACE32 config]
Script to configure TRACE32 and the on-chip trace device. The setup allows for
continues function and data trace.},
label={listing:trace32_config}]
SYStem.CPU TC298TF
trace.method.analyzer
trace.mode.stream
mcds.source.cpumux0.program on
mcds.source.cpumux0.readaddr on
mcds.source.cpumux0.writeaddr on
mcds.source.cpumux0.writedata on
break.set symbol.begin(foo)--symbol.end(bar) /r /w /tracedata
Go
wait 1.s
break
printer.filetype csv
printer.file data.csv
winprint.trace.findall , cycle readwrite /list %timefixed \
ti.zero varsymbol cycle data
trace.export.csvfunc func.csv
\end{lstlisting}
\end{code}
Firstly, it is necessary to select the correct CPU (line 1). This is
important because otherwise the TRACE32 trace decoder is not able to interpret
the hardware trace events in the correct way. The trace method
\lstinline{analyzer} is required for real-time tracing and trace mode
\lstinline{stream} means that the trace data is sent to the host computer in a
continuous way (lines 2 and 3).
Next, the processor and bus observation blocks are configured to detect all
function and data events (lines 5-9). This is done via the multi-core debug
system. Setting the \lstinline{program} attribute to \lstinline{on} activates
the function trace. The other three attributes are necessary to record all
data events.
A complete data trace may still overexert the bandwidth of the setup. Via
\lstinline{break.set} filters as described in
\autoref{section:trace_measurement} can be created (line 10). The trace device
is configured to record data read and write events for all variables in the
memory range defined by \lstinline{symbol.begin(foo)--symbol.end(bar)}. Here
\lstinline{foo} is a variable that has a lower address than the variable
\lstinline{bar}. Using the configuration described in this section, the
compiler allocates the array \lstinline{EE_as_rpc_services_table} at the
beginning of the \gls{os} memory section and \lstinline{EE_th_status} at the
end. So those two variables provide a convenient boundary to detect all
\gls{os} data events of interest.
Trace recording is started via the \lstinline{Go} command (lines 12-14). The
\lstinline{wait} command waits for an eligible amount of time and recording is
stopped by the \lstinline{break} command.
Now the data and function traces can be exported (lines 16-21). For the data
export it is first necessary to configure the desired output file type
(\lstinline{csv}) and output filename (\lstinline{data.cvs}). Via the
\lstinline{winprint} command the data export process is started and
\lstinline{trace.export.csvfunc} exports the function trace.
TRACE32 creates multiple graphical user interfaces one for each core of the
target platform. Accordingly, the export commands must be executed for each
core or in other words for each GUI\@. The resulting files
\lstinline{data.csv} and \lstinline{func.csv} contain one event per line. The
following listing shows a data event.
\begin{lstlisting}
-0083448136,0.0004372600,"EE_ORTI_servicetrace","wr-data",43
\end{lstlisting}
A Lauterbach data event consists of five comma separated fields. In
\autoref{eq:data_event} the elements of a data event are defined. The second
field is the timestamp $t_i$ in seconds, the third field is the name of the
accessed variable $\pi_i$, the fourth field specifies in which way $a_i$ the
variable is accessed (a data write in this case), and the fifth field contains
the value of the data access event $v_i$. Since one trace data trace file is
exported per core, the core name $c_i$ is the same for all events from one
file. Accordingly, the next listing shows a Lauterbach function event
consisting of three fields.
\begin{lstlisting}
+437050; EE_as_StartCore; fentry
\end{lstlisting}
In \autoref{eq:function_event} the elements of a function event are defined.
Analogous to data events, the core name $c_j$ is the same for all events within
a file. The first field maps to the timestamp $t_j$, the second field is the
name of the function $\pi_j$ that is affected by the event, and the third field
indicates the way $\theta_j$ in which the function is affected.
\subsection{Validation Techniques}
\label{subsection:validation_techniques}
Traces can differ in two ways. A temporal difference exits for two traces
$B^1$ and $B^2$ with the same length $n$ if there is at least one event pair
with the index $i \in (1,2,\dots,n)$ for which $t^1_i \neq t^2_i$. As
discussed before, the \gls{ta} Simulator is capable of taking hardware and
\gls{os} specific behavior into account. Nevertheless, simulating a trace for
which all timestamps are equal to the corresponding hardware trace is not
feasible by definition \cite{balci1995principles}.
This problem is bypassed in two steps. At first the general accuracy of the
trace setup is validated by tracing events whose timing characteristics are
precisely known in advance. Secondly, for the actual test models, a
plausibility test based on certain metrics such as task activate-to-active and
task response time is conducted.
The second way in which two traces can differ is called semantic difference.
It exists for two traces $B^1$ and $B^2$ with the same length $n$ if there is
an event pair with the index $i \in (1,2,\dots,n)$ for that at least one of the
following cases is true: source or target entity differ ($\Psi^1_i \neq
\Psi^2_i \vee T^1_i \neq T^2_i$), source or target instance differ ($\psi^1_i
\neq \psi^2_i \vee \tau^1_i \neq \tau^2_i$), target type differs ($\iota^1_i
\neq \iota^2_i$), event action differs ($\alpha^1_i \neq \alpha^2_i$), or note
differs ($\nu^1_i \neq \nu^2_i$).
If two traces $B^1$ and $B^2$ have a different length $|B^1| \neq |B^2|$ they
also differ semantically. Assuming the trace and simulation setup is correct
a difference in length can have two reasons: either the trace times differ or
one trace includes entities that do not occur in the other trace. In the
former case, the disparity can be fixed by removing the events at the end of
the longer trace until both traces have the same length. In the latter case,
events for entities that are not contained in both traces may be removed in
order to achieve semantic equality.

629
content/tests.tex Normal file
View File

@@ -0,0 +1,629 @@
\section{Test Cases}
As discussed in the previous section traces can differ in a temporal and in a
semantic way. To exclude the appearance of temporal discrepancies due to a
wrong trace setup, the timing accuracy is tested based on code with known
event-to-event durations. Next, the semantic correctness of the trace mapping
is validated based on manually created test models. Finally, randomized models
are generated in order to detect semantic errors that may not be detected by
the manually created models due to selection bias \cite{geddes1990cases}.
\subsection{Timing Precision}
In \autoref{listing:instructionblock} code to execute a fixed number of
instructions is introduced. This code is now used to evaluate the timing
precision of the trace setup. According to
\autoref{subsection:hardware_tracing} the setup should allow for cycle accurate
trace measurement.
The Infineon Aurix processor family provides performance counters
\cite{tc29xa}. Once started, these counters are incremented based on the CPU
core frequency. A frequency of \unit[100]{MHz} is used for the validation,
consequently an increment occurs every \unit[10]{ns}. The counter can be
started at an arbitrary point in time for example, at program start. By
reading the counter value at the beginning and at the end of a critical
section the clock cycles that expired between these two points can be
determined.
\begin{code}
\begin{lstlisting}[caption={[Trace setup accuracy validation]
Code to validate the timing precision of the trace setup.},
label={listing:accuracy_validation}]
EE_UINT32 i;
EE_UINT32 ccntStart;
EE_UINT32 ccntEnd;
EE_UINT32 n = N / 4;
__asm("nop");
ccntStart = EE_tc_get_CCNT();
__asm("nop");
for (i = 0; i < n; i++) {%
__asm("nop");
__asm("nop");
__asm("nop");
__asm("nop");
}
__asm("nop");
ccntEnd = EE_tc_get_CCNT();
\end{lstlisting}
\end{code}
\autoref{listing:accuracy_validation} shows the code that is used to check the
timing precision. \gls{ee} provides the API function
\lstinline{EE_tc_get_CCNT} to read out the performance counter register. As
described above, the performance counters are read out before and after the
critical section.
The critical section is guarded with two additional \lstinline{nop} assembly
instruction to avoid compiler optimization. Additionally, the generated
assembly code was examined manually to verify that no unwanted instructions
were added by the compiler. A for loop is used to execute a predefined number
of instructions. The number of repetitions is depended on the define
\lstinline{N} which should be a multiple of four.
The code is now executed for different values of \lstinline{N}. For each event
the expected number of clock cycles $c_e$, the actual number of clock cycles
$c_a$, the expected time difference $t_e$ in nanoseconds, and the actual time
difference $t_a$ in nanoseconds between the writes to \lstinline{ccntStart} and
\lstinline{ccntEnd} are listed in \autoref{tab:precision_validation}.
The expected number of clock cycles is calculated by $c_e = N + 2$. The value
two is added because of the additional \lstinline{nop} instructions. The
expected time is calculated by $t_e = c_e * \frac{1}{f}$ where $f$ is the
processor frequency.
The actual number of clock cycles is calculated by $c_a = ccntEnd - ccntStart$.
The actual time is calculated by $t_a = t_j - t_i$ where $j$ is the index of
the write event to \lstinline{ccntEnd} and $i$ is the index of the write event
to \lstinline{ccntStart}.
Four different values for \lstinline{N}, $128$, $1024$, $4096$, and $65536$
are chosen and for each value $101$ measurement samples are taken. The
results for all samples with the same value of \lstinline{N} are equal. It can
be observed that for all values of \lstinline{N} the execution of the critical
section takes four ticks more than the expected value $e_c$. This is because
the additional instruction executed by the second call to
\lstinline{EE_tc_get_CCNT} are not taken into consideration.
Consequently, the expected and the actual execution time differ by
\unit[40]{ns}. Besides this differences, the result is as expected and the
conclusion that the setup is in fact able to measure hardware events on a
cycle accurate basis can be drawn.
\begin{table}[]
\centering
\begin{tabular}{c|c c c c}
N & 128 & 1024 & 4096 & 65536 \\
\hline
$c_e\, [1]$ & 130 & 1026 & 4098 & 65538 \\
$c_a\, [1]$ & 134 & 1030 & 4102 & 65542 \\
$t_e\, [us]$ & 1.300 & 10.260 & 40.980 & 655.380 \\
$t_a\, [us]$ & 1.340 & 10.300 & 41.020 & 655.420 \\
samples & 101 & 101 & 101 & 101 \\
\end{tabular}
\caption[Trace setup measurement precision]{Experiment to validate the
accuracy of the trace setup. A code snippet that takes a known number of
instructions $c_e$ is executed. Based on the number of instructions the
expected execution time $t_e$ can be calculated. If cycle accurate
measurement is supported, the actual execution time $t_a$ should be equal to
$t_e$. The execution times differ by \unit[40]{ns} because the expected
number of instructions is off by four cycles. If this deviation is taken
into consideration $t_e$ and $t_a$ coincide.}
\label{tab:precision_validation}
\end{table}
\subsection{Systematic Tests}
\label{subsection:systematic_tests}
In this section test models are created systematically to validate the complete
software to \gls{btf} event mapping discussed in \autoref{chapter:mapping}.
For each test application a simulated and a hardware based \gls{btf} trace is
generated as shown in \autoref{fig:eval_idea}. The traces are then compared in
three steps.
\begin{itemize}
\item A basic plausibility test based on the Gantt chart of the TA Tool Suite
is conducted.
\item The semantic equality is validated.
\item Different real-time metrics are compared and discussed.
\end{itemize}
Five test models as shown in the following list are required to cover all
\gls{btf} actions for which a mapping has been provided.
\begin{itemize}
\item task-runnable-signal
\item task-event
\item task-resource-release-parking
\item task-resource-poll-parking
\item task-MTA
\end{itemize}
Each model represents a periodic system where a defined sequence of events is
executed every \unit[10]{ms}. UML sequence diagrams \cite{fowler2004uml} are
used to illustrate the behavior of the test applications during one period.
\subsubsection{Task-Runnable-Signal Test}
\begin{figure}[]
\centering
\centerline{\includegraphics[width=\textwidth]{./media/eval/task_runnable_signal.pdf}}
\caption[Task-runnable-signal test sequence]{Test application to validate
basic task and signal read and write events.}
\label{fig:task_runnable_signal}
\end{figure}
The task-runnable-signal application is depicted in
\autoref{fig:task_runnable_signal}. Task \lstinline{T_1} is activated
periodically by the stimulus \lstinline{STI_T_1} every \unit[10]{ms}.
\lstinline{T_1} activates \lstinline{T_2} on another core via \gls{ipa} and
then executes runnable \lstinline{R_1}. \lstinline{T_2} executes a runnable
\lstinline{R_2_1} which executes another runnable \lstinline{R_2_2}. Once
execution of \lstinline{R_1} has finished, \lstinline{T_1} activates another
task \lstinline{T_3} on the second core which has a higher priority then
\lstinline{T_2}. Consequently, \lstinline{T_2}, \lstinline{R_2_1}, and
\lstinline{R_2_2} are preempted as indicated by the light green and light blue
colors. \lstinline{T_3} calls a runnable \lstinline{R_3}. The runnables
\lstinline{R_1} and \lstinline{R_3} both read and write the signal
\lstinline{SIG_1}. Once \lstinline{T_3} has terminate, \lstinline{T_2} and the
corresponding runnables resume execution. The purpose of this test application
is to cover the following \gls{btf} actions:
\begin{itemize}
\item Stimulus: trigger by alarm and \gls{ipa}
\item Task: activate, start, preempt, resume, terminate
\item ISR: activate, start, terminate
\item Runnable: start, resume, suspend, terminate
\item Signal: read, write
\end{itemize}
\begin{figure}[]
\centering
\centerline{\includegraphics[width=\textwidth]{./media/eval/task_runnable_signal.png}}
\caption[Task-runnable-signal test gantt chart]{Hardware and software trace
for the task-runnable-signal test model. Attention must be directed to the
signal read and write accesses to \lstinline{SIG_1}. Additionally, the nested
runnables must be suspended when the respective task \lstinline{T_2} is
preempted.}
\label{fig:task_runnable_signal_gantt}
\end{figure}
Based on the Gantt chart of the TA Tool Suite the \gls{btf} trace can be
compared visually. The hardware trace is shown in the upper part and the
simulated trace in the lower part of each picture. Both traces use the same
time scale so that semantic and temporal comparison is feasible.
\autoref{fig:task_runnable_signal_gantt} shows one period of the
task-runnable-signal test application in the Gantt chart of the \gls{ta} Tool
Suite. The figure depicts that \lstinline{R_2_2} is called from the context of
\lstinline{R_2_1}. When \lstinline{T_2} is preempted, both runnables must be
suspended too, indicated by the light blue color in contrast to the stronger
blue when a runnable is running. Runnable entities are not shown in the traces
for the other test models for clarity. A running task is colored in dark
green, while preempted tasks are shown in light green.
A separate row in the Gantt chart is used to depict signal accesses from the
context of tasks. Whenever a horizontal line is drawn the corresponding signal
is read or written. The former is indicated by an arrow pointing up at the
bottom of the row. The latter is indicated by an arrow pointing down at the
top of the row. It can be seen that the signal accesses are recorded on
hardware as expected.
The hardware trace shows two additional \glspl{isr} that are not part of the
simulation trace. \lstinline{EE_tc_system_timer_handler} is a timer interrupt
which is executed every \unit[1]{ms} and serves as clock source for the system
counter. \lstinline{EE_TC_iirq_handler} is used for remote procedure calls.
Two traces can not be semantically identical if entities exist in one trace
that are not part of the other trace. There are two ways two solve this
problem. Either the \glspl{isr} are added to the system model and therefore
considered during simulation or all \gls{btf} events related to the
\glspl{isr} are removed from the hardware trace.
A script that checks the semantic equality of two traces based on the criteria
established in \autoref{subsection:validation_techniques} is used for the
second validation step. However, semantic equality could not be shown for the
test cases in this and the next section. The reason for this is discussed in
\autoref{subsection:randomized_tests}.
The TA Inspector is capable of calculating a variety of real-time metrics based
on \gls{btf} traces. Selected metrics are shown to discuss the similarities
and discrepancies between hardware and simulation trace. Common metric types
are activate-to-activate (A2A), response time (RT), net execution time (NET),
and CPU core load. The upper part of each metric table shows the hardware
trace metrics abbreviated by \emph{HW} and the lower part shows the
simulation trace metrics abbreviated by \emph{Sim}.
\begin{table}[]
\centering
\begin{tabular}{c c|c c c c}
& & A2A $[ms]$ & RT $[ms]$ & Load Core\_1 $[\%]$ & Load Core\_2 $[\%]$ \\
\hline
& T\_1 & 10.005998 & 3.025510 & 30.124423 & 0.000000 \\
HW & T\_2 & 10.005990 & 6.516440 & 0.000000 & 49.950032 \\
& T\_3 & 10.005987 & 1.506300 & 0.000000 & 15.000495 \\
\hline
& Sum & - & - & 30.12 & 64.95 \\
&&&&& \\
& T\_1 & 10.000000 & 3.000100 & 30.000000 & 0.000000 \\
Sim & T\_2 & 10.000000 & 6.500200 & 0.000000 & 50.000000 \\
& T\_3 & 10.000000 & 1.500100 & 0.000000 & 15.000000 \\
\hline
& Sum & - & - & 30.00 & 65.00 \\
\end{tabular}
\caption[Task-runnable-signal metrics table]{Metrics of the
task-runnable-signal test application. Activation-to-activation (A2A) and
response time (RT) are average values calculated over all instances of the
respective entity.}
\label{tab:task_runnable_signal}
\end{table}
\autoref{tab:task_runnable_signal} shows selected real-time metrics for the
task-runnable-signal application. In the first approximation all values seem
identical so the basic configuration of the complete setup is likely to be
correct. Nevertheless, the activate-to-activate times between hardware and
simulation differ by almost \unit[6]{us} which is non-negligible.
The reason for this deviation can be found by examining the
activate-to-activate times of the timer \gls{isr}
\lstinline{EE_tc_system_timer_handler}. The average A2A time for the \gls{isr}
is \unit[600]{ns} greater than expected. Since \lstinline{T_1} is activated
every \unit[10]{ms} or in other words for every tenth instance of the timer
\gls{isr}, the expected deviation can be calculated as $d_{A2A} = 10 \cdot
600\,ns = 6\,us$.
To detect why the A2A times of the timer \gls{isr} diverge, it is necessary to
read the corresponding source code. Whenever the timer \gls{isr} is executed
the time delta to the next instance is calculated based on the current number
of counter ticks in the timer register. There is a time delta between the
point where the last counter ticks value is read and the point where the newly
calculated value is written. This is the delta that causes the delay of
\unit[600]{ns}. By doubling the frequency the delta reduces to \unit[300]{ns}
by halving the frequency it increases to \unit[1200]{ns} as expected.
\subsubsection{Task-Event Test}
\begin{figure}[]
\centering
\centerline{\includegraphics[width=\textwidth]{./media/eval/task_event.pdf}}
\caption[Task-event test sequence]{Test application to validate \gls{btf}
event actions.}
\label{fig:task_event}
\end{figure}
\autoref{fig:task_event} shows the task-event test case. \lstinline{T_1} is
activated in the same way as in the first test case. Again, it activates
\lstinline{T_2} on a second core via \gls{ipa}. \lstinline{T_2} executes a
runnable \lstinline{R_2}. After execution of the runnable \lstinline{T_2}
waits for the event \lstinline{EVENT_1}. Since the event is not set it
changes into the waiting state indicated by the orange color. After
activating \lstinline{T_2}, \lstinline{T_1} executes a runnable \lstinline{R_1}
and sets the event \lstinline{EVENT_1}. \lstinline{T_2} returns from the
waiting state, calls \lstinline{R_2} again, and clears the event
\lstinline{EVENT_1}. The purpose of this test application is to cover the
following \gls{btf} actions:
\begin{itemize}
\item Process: wait, release
\item Event: wait\_event, set\_event, clear\_event
\end{itemize}
\begin{figure}[]
\centering
\centerline{\includegraphics[width=\textwidth]{./media/eval/task_event.png}}
\caption[Task-event test gantt chart]{Comparison of hardware (top) and
simulated (bottom) trace of the task event test application.}
\label{fig:task_event_gantt}
\end{figure}
\autoref{fig:task_event_gantt} shows the Gantt chart for the task-event test
case. As before \lstinline{T_1} is interrupted by the timer \gls{isr} multiple
times. A separate row in the Gantt chart is used to indicate the current state
of the event entity. An upward pointing arrow indicates that a process starts
waiting for an event. The waiting period is colored in orange. A downward
pointing arrow indicates that a process sets an event. Finally, the event is
cleared which is indicated by an downward pointing arrow in red.
\begin{table}[]
\centering
\begin{tabular}{c c|c c c c}
& & A2A $[ms]$ & RT $[ms]$ & CPU Waiting Core\_2 $[\%]$ \\
\hline
HW & T\_1 & 10.006198 & 2.023460 & 0.000000 \\
& T\_2 & 10.006189 & 3.018570 & 10.046955\\
Sim & T\_1 & 10.000000 & 2.000100 & 0.000000 \\
& T\_2 & 10.000000 & 3.000100 & 9.999000 \\
\hline
\end{tabular}
\caption[Task-event metrics table]{Metrics of the task-event test application.}
\label{tab:task_event}
\end{table}
\autoref{tab:task_event} shows the resulting metrics for the task-event test
case. The activate-to-activate times depict the same behavior like the
previous test application as expected. The relative waiting time on hardware
is greater than it is for the simulated trace.
A possible reason might be the longer runtime of the \lstinline{set_event}
routine on-target. The task on core \lstinline{Core_1} sets the event for the
task on the second core. Therefore, a \glsdesc{rpc} is necessary to
set the event. Since the \gls{rpc} via \lstinline{EE_TC_iirq_handler} is not
taken into consideration in the simulation, the time in the waiting state is
longer on hardware.
Response times are also significantly longer on real hardware compared to
the simulated trace. The response time measures the period from task
activation to termination of a task instance. The difference in response time
sums up from different factors.
Firstly, the initial ready time, i.e.\ the period from task activation to start
is longer on hardware. It takes about \unit[2]{us}. Secondly, \lstinline{T_1}
is preempted by the timer \gls{isr} two times. Category two \glspl{isr}
require a context switch which costs additional task execution time. Finally,
the \gls{ipa} and \lstinline{TaskTerminate} routines take longer on real
hardware. By measuring the execution times of the respective system services
it could be shown that the response times are equal if the measured overhead is
taken into consideration. As mentioned before, these effects could be
respected for the simulation by adding the execution times of the routines to
the \gls{os} part of the timing model.
\subsubsection{Task-Resource Tests}
\begin{figure}[]
\centering
\centerline{\includegraphics[width=\textwidth]{./media/eval/task_resource_release_polling.pdf}}
\caption[Task-resource-poll-parking test sequence]{Test application to validate
semaphore events, especially the poll\_parking action.}
\label{fig:task_resource_poll_parking}
\end{figure}
\begin{figure}[]
\centering
\centerline{\includegraphics[width=\textwidth]{./media/eval/task_resource_release_parking.pdf}}
\caption[Task-resource-release-parking test sequence]{Test application to validate
semaphore events, especially the release\_parking action.}
\label{fig:task_resource_release_parking}
\end{figure}
The third and fourth test case are similar except for one difference as shown
in \autoref{fig:task_resource_poll_parking} and
\autoref{fig:task_resource_release_parking}. As before, \lstinline{T_1} is
activated by a periodic stimulus and activates \lstinline{T_2} on another core
via \gls{ipa}. \lstinline{T_1} executes the runnable \lstinline{R_1_1} which
requests the semaphore \lstinline{SEM_1}. \lstinline{T_2} tries to request the
same semaphore which is now locked and changes into the active polling state
indicated by the red color. As soon as \lstinline{R_1_1} finishes,
\lstinline{T_1} activates the task \lstinline{T_3} which has a higher priority
than \lstinline{T_2}, on the second core. Consequently, \lstinline{T_2} is
deallocated and changed into the parking state.
At this point the two models differ. In first model
\emph{task-resource-poll-parking} \lstinline{T_3} has a shorter execution time
than in the model \emph{task-resource-release-parking}. Consequently, in the
former model \lstinline{T_2} is resumed while \lstinline{SEM_1} is still locked
and a poll\_parking action takes place.
In the latter case when \lstinline{T_3} has a longer execution time,
\lstinline{SEM_1} becomes free while \lstinline{T_2} is still preempted. This
results in a release\_parking action and \lstinline{T_2} changes into the ready
state. Once \lstinline{T_3} has terminated \lstinline{T_2} continues running
immediately. The purpose of these applications is it to test the following
actions.
\begin{itemize}
\item Process: park, poll\_parking, release\_parking, poll, run
\item Semaphore: ready, lock, unlock, full, overfull
\item Process-Semaphore: requestsemaphore, assigned, waiting, released
\end{itemize}
\begin{figure}[]
\centering
\centerline{\includegraphics[width=\textwidth]{./media/eval/task_resource_release_polling.png}}
\caption[Task-resource-poll-parking test gantt chart]{Comparison of hardware
(top) and simulated (bottom) trace of the task-resource-poll-parking test
application.}
\label{fig:task_resource_poll_parking_gantt}
\end{figure}
\begin{figure}[]
\centering
\centerline{\includegraphics[width=\textwidth]{./media/eval/task_resource_release_parking.png}}
\caption[Task-resource-release-parking test gantt chart]{Comparison of hardware (top) and
simulated (bottom) trace of the task-resource-release-parking test application.}
\label{fig:task_resource_release_parking_gantt}
\end{figure}
\begin{table}[]
\centering
\begin{tabular}{c c|c c c}
& & RT $[ms]$ & Polling Time $[ms]$ & Parking Time $[ms]$ \\
\hline
& T\_1 & 2.524897 & 0.000000 & 0.000000 \\
HW & T\_2 & 3.269190 & 0.751730 & 0.508011 \\
& T\_3 & 0.506321 & 0.000000 & 0.000000 \\
\hline
& T\_1 & 2.500140 & 0.000000 & 0.000000 \\
Sim & T\_2 & 3.250040 & 0.749800 & 0.500100 \\
& T\_3 & 0.500100 & 0.000000 & 0.000000 \\
\end{tabular}
\caption[Task-resource-poll-parking metrics table]{Metrics of the
task-resource-poll-parking test application.}
\label{tab:task_resource_poll_parking}
\end{table}
\begin{table}[]
\centering
\begin{tabular}{c c|c c c}
& & A2A $[ms]$ & RT $[ms]$ & CPU Parking Core\_2 $[\%]$ \\
\hline
& T\_1 & 10.005997 & 2.026420 & 0.000000 \\
HW & T\_2 & 10.005989 & 2.772670 & 4.984965 \\
& T\_3 & 10.005984 & 0.756450 & 0.000000 \\
\hline
& T\_1 & 10.000000 & 2.000140 & 0.000000 \\
Sim & T\_2 & 10.000000 & 2.750240 & 4.949010 \\
& T\_3 & 10.000000 & 0.750100 & 0.000000 \\
\end{tabular}
\caption[Task-resource-release-parking metrics table]{Metrics of the
task-resource-release-parking test application.}
\label{tab:task_resource_release_parking}
\end{table}
\autoref{fig:task_resource_poll_parking_gantt} and
\autoref{fig:task_resource_release_parking_gantt} show the comparison of the
traces for the two resource test applications. For both test cases
\lstinline{T_1} requests \lstinline{SEM_1} as indicated by an upward pointing
arrow. The semaphore is now locked and \lstinline{T_2} changes into the
polling mode when requesting it. This is indicated by the yellow color. Once
\lstinline{T_3} is activated \lstinline{T_2} changes into the parking mode
indicated by the orange color.
In \autoref{fig:task_resource_poll_parking_gantt} \lstinline{T_3} has a runtime
of \unit[500]{us} and resumes running before the semaphore is released. Thus,
it returns into the polling state until the semaphore is released. The release
event is depicted by a downward pointing arrow.
In \autoref{fig:task_resource_release_parking_gantt} the execution time is longer
and \lstinline{T_1} releases the semaphore earlier. Consequently,
\lstinline{SEM_1} becomes free while \lstinline{T_2} is still deallocated from
the core and changes into the ready state.
For both resource test applications the \gls{btf} traces recorded from hardware
match the simulated traces as shown in the previous figures. The metrics in
\autoref{tab:task_resource_poll_parking} and
\autoref{tab:task_resource_release_parking} show similar results compared to the
previous tables and are therefore not discussed again.
\subsubsection{Task-MTA Test}
\begin{figure}[]
\centering
\centerline{\includegraphics[width=\textwidth]{./media/eval/task_mta.pdf}}
\caption[Task-MTA test sequence]{Test application to validate mtalimitexceeded
events.}
\label{fig:task_mta}
\end{figure}
The purpose of the last specified test application is to validate the
correctness of \gls{mta} and mtalimitexceeded events. \autoref{fig:task_mta}
shows the sequence diagram of the respective test model. In this example
\lstinline{T_2} is allowed to have two activations. This means two instances
of the task may be active in the system at the same point in time.
Like in the previous tests \lstinline{T_1} is activated by \lstinline{STI_T_1}
periodically. \lstinline{T_1} then activates \lstinline{T_2} three consecutive
times via inter-core \gls{ipa}. The runnable \lstinline{R_1} is executed to
consume some time between the activations. After the first activation the task
starts running as expected. The second activation is stored by the \gls{os}.
Once \lstinline{T_2} terminates, it changes into the ready state and starts
running again. The third activation is not allowed by the \gls{os} as
indicated by the red box. An error message is created and a mtalimitexceeded
event must be added to the \gls{btf} trace.
\begin{figure}[]
\centering
\centerline{\includegraphics[width=\textwidth]{./media/eval/task_mta.png}}
\caption[Task-MTA test gantt chart]{Comparison of hardware (top) and
simulated (bottom) trace of the task-MTA test application.}
\label{fig:task_mta_gantt}
\end{figure}
\autoref{fig:task_mta_gantt} shows the comparison of the \gls{btf} traces
created by simulation and from hardware for the task-MTA test model. The
hardware traces illustrates the procedure for an inter-core process activation
really well. At first the activation is triggered on \lstinline{Core_1} as
shown in the row \lstinline{IPA_T_1}. This results in the execution of the
inter-core communication \gls{isr} \lstinline{EE_TC_iirq_handler}.
The \gls{isr} then activates \lstinline{T_2} which changes into the ready state
indicated by the gray color. During the second activation \lstinline{T_2} is
already in the running state. Consequently, the activation is only illustrated
by a downward pointing arrow. In the simulated trace the task keeps running
during the activation process. In the hardware trace the task is preempted by
the inter-core \gls{isr} and the activation takes place while the task is in
the ready state.
During the third activation two instances of \lstinline{T_2} are already active
in the system. Thus, no further activations are allowed and a mtalimitexceeded
event is created. This is indicated by a downward pointing red arrow. At
around \unit[81925]{us} the first instance of \lstinline{T_2} terminates and
the next instances becomes ready immediately. Shortly after that the next
instance starts running.
\subsection{Randomized Tests}
\label{subsection:randomized_tests}
Randomized tests are used to avoid insufficient test coverage due to selection
bias in the creation of the test applications. A tool for generating random
models automatically with respect to predefined constraints has been developed
in previous research projects \cite{sailer2014reconstruction}. It allows the
creation of an arbitrary number of test models and works with respect to
user-defined distributions for example, for the number of cores, tasks, and
runnables. Based on these values models can be generated randomly.
\begin{table}[]
\centering
\begin{tabular}{c|c c c c}
Entities & min & max & average & distrbution \\
\hline
Cores $[1]$ & 2 & - & - & const \\
Tasks $[1]$ & 9 & 22 & 15 & weibull \\
Runnables/Task $[1]$ & 6 & 13 & - & uniform \\
Instructions/Runnable $[10^3]$& 10 & 50 & 30 & weibull \\
Activation $[ms]$ & 1 & 20 & 1000 & weibull \\
Signals $[1]$ & 3 & 11 & 17 & weibull \\
Signals/Runnable $[1]$ & 3 & 7 & - & uniform \\
\end{tabular}
\caption[Randomized model configuration]{The configuration used for creating
test models randomly.}
\label{tab:rand_config}
\end{table}
\autoref{tab:rand_config} shows the distributions for the number of entities
that should be created for each entity type. This configuration is used for
each of the ten models that are tested in this section. The distributions for
\emph{cores} and \emph{tasks} represent the number of entities of the
respective type in the system. The metric \emph{runnables per task} determines
how many runnables are called from the context of each task. Each task is
activated by a periodic stimulus with a period depending on the
\emph{activation} value. \emph{Signals} specifies the number of signal
entities in the system and \emph{signals per runnable} the accesses to these
signals within the context of each runnable. Event and resource entities
cannot be generated by the random model generator and are therefore not covered
by randomized tests.
Validating these models manually is not feasible. Therefore, only the semantic
equality is tested because this can be done without user interaction. In
previous work a closed loop model based development process was created to
conduct the proceeding shown in \autoref{fig:eval_idea} automatically
\cite{felixproject2}. This process was extended to support the model generator
and semantic comparison of two traces.
\begin{figure}[]
\centering
\centerline{\includegraphics[width=0.55\textwidth]{./media/eval/semantic_impossible.pdf}}
\caption[Semantic comparision problem]{Semantic comparison of multi-core
systems is not feasible if the execution time of service routines varies
between hardware and simulation.}
\label{fig:task_runnable_signal}
\end{figure}
As mentioned before semantic equality could not be shown for any of the test
applications. The reason for this is depicted in
\autoref{fig:task_runnable_signal}. Assuming that one task activates another
task on a different core and executes multiple other actions afterwards. The
position in which the start event of the second task is added depends on the
time that vanishes between activation and start. This means two traces may be
semantically different even though they show the same behavior. Consequently,
the definition of semantic equality used in this thesis is not sufficient for
the comparison of multi-core systems. Nevertheless, by randomized comparison
of the traces the correctness of the mappings could be validated manually.
However, this fallback solution is not sufficient for validating a wide range
of test cases.

View File

@@ -0,0 +1,31 @@
\begin{figure}[]
\centering
\includegraphics[width=\textwidth]{./media/eval/eval_idea.pdf}
\caption[Mapping validation concept]{The general idea for the validation of
the software event to \gls{btf} event mapping. A model that represents a
certain system is created. Based on the model, a simulation, and a hardware
trace are generated. By comparing those traces errors in the transformation
process can be detected.}
\label{fig:eval_idea}
\end{figure}
In this chapter the software to system mappings are validated as depicted in
\autoref{fig:eval_idea}. A timing model of an application is created and a
\gls{btf} trace is generated from this model via discrete event simulation.
The simulated trace represents the expected result for the trace
recorded from hardware.
Next, C code is generated from the model. The code is compiled, executed on
hardware, and the runtime behavior is recorded via hardware tracing. The
resulting software level trace is transformed to system level according to
the respective mappings. The \gls{btf} trace recorded from hardware is then
compared to the simulated trace. Since both traces result from the same timing
model they are expected to represent the same system behavior.
Nevertheless, two kinds of deviations are expected. Firstly, timestamps of
otherwise identical events might differ. This is unavoidable because
simulation is an abstraction of reality and is not capable of taking all subtle
effects influencing the timing on real hardware into consideration. Secondly,
events may indicate a different software behavior. For example, a task starts
a runnable in one trace but not in the other. In this case, the deviation
must be examined because it might point to a mapping error.