TUD Logo

TUD Startseite » ... » Lehre » Beleg- und Diplomaufgaben » Ausführliche Aufgaben


Ausführliche Beschreibung der Aufgaben

Please, contact the supervisors for further details.


Size of hot memory region

The size of the hot memory region is, while hard to measure, an intuitive concept. If it's smaller than the available cache-size, there are some techniques to determine it in offline-measurements [1], but for bigger sizes, the usual mechanisms are fundamentally lacking. Usually, a direct solution of the problem is avoided [2, 3]. One idea to get this information is to track all unique memory accesses in a given time-window. A data structure called bloom filter might help with that.

The goal of this task is to implement and evaluate the proposed hardware extension on top of cachegrind (the valgrind tool) (or using different technology, if discussed with the supervisors), and find values for the variables involved.

[1] Cache contention and application performance prediction for multi-core systems

[2] Region Scheduling: Efficiently Using the Cache Architectures via Page-level Affinity

[3] Featherlight Reuse-Distance Measurement

Supervisor: Michael Roitzsch, Martin Küttler


Reproducing Measurements

An important principle of scientific work is to check results by reproducing experiments. A publication should contain all information that is needed to achieve that. While common in other scientific disciplines, experiment reproduction is done rarely in computer science and especially in "systems". In this work, an attempt will be made to

- check assumptions commonly made in published results

- reproduce measurements.

Together with the student, we will select some prominent examples for reproduction. We suggest to start with measurements of the cost of preemption and migration in modern computer architectures with large caches and deep cache hierarchies and of interrupt latencies. And interesting side result of that work will (hopefully) be an understanding of the impact of architectural changes on systems.

Supervisor: Prof. Hermann Härtig


Using ATLAS to Predict Measures Other Than Execution Time

The ATLAS [1] scheduler uses metrics to predict task execution time. However, it is conceivable to use the same or a similar predictor to derive other application- and workload-related measures such as cache behavior, memory accesses, or energy consumption.

The thesis should experiment with the ATLAS predictor to analyse its ability and accuracy in predicting these other measures.

[1] Michael Roitzsch, Stefan Wächtler, Hermann Härtig: ATLAS – Look-Ahead Scheduling Using Workload Metrics. RTAS 2013, https://os.inf.tu-dresden.de/papers_ps/rtas2013-mroi-atlas.pdf

Supervisor: Michael Roitzsch, Hannes Weisbach

ATLAS Integration for Recent FFmpeg

Execution time prediction within ATLAS [1] is based on metrics. A metric is a value from the application domain, which is required to correlate with the amount of work to be performed. Roitzsch [2] proposed a set of 13 metrics to be used in the execution time prediction of decoding times of H.264-frames. These metrics were embedded as metadata in H.264-streams. An ATLAS-enabled media player would extract these metadata and pass the metrics to the ATLAS runtime. Since the original implementation, FFmpeg experienced a number of changes during its development, so that the original implementation cannot be easily re-used.

This programming-oriented thesis should implement the H.264 metrics handling in a recent version of FFmpeg and export them into an ATLAS-enabled media player (FFplay, for example). The written work should discuss tooling to analyse the FFmpeg code and how to extract the metrics.

[1] Michael Roitzsch, Stefan Wächtler, Hermann Härtig: ATLAS – Look-Ahead Scheduling Using Workload Metrics. RTAS 2013, https://os.inf.tu-dresden.de/papers_ps/rtas2013-mroi-atlas.pdf

Supervisor: Michael Roitzsch, Hannes Weisbach

Cache Coloring SMP Systems

OS-Controlled cache partitioning has been proposed to limit the cache-induced interference between processes on one CPU, especially with the purpose to make cache more usable in Real-Time systems. In this work, the technique will be extended to multi processors of various architectures. What seems simple at the first view becomes much more difficult if modern cache architectures are included in the investigation. An example are smart caches which attempt to partition caches dynamically without SW influence.

Literature: OS-Controlled Cache Partitioning

Supervisor: Prof. Hermann Härtig

Time Partitions

Provide means to schedule a compartment of L4 tasks.

A set of L4 tasks, also called a compartment, needs to be scheduled in a uniform fashion. This work will build a mechanism that allows to define a CPU share for a set of tasks that will be scheduled as one group.

Supervisor: Alexander Warg, Adam Lackorzynski


Access rights in capability systems

To support starting processes in a Capability-system, it is necessary to figure out what kind of accesses the process is going to perform. Just granting access to (mostly) everything defeats the purpose of having a fine-grained access control. Luckily, most of the information needed for that is contained in the arguments that a program accepts. Mostly some library is used to parse these, and it could be instrumented to extract this information, and potentially accept additional information, as needed.

This task consists of two parts. The first is to gain an overview of libraries for command-line-argument parsing, and instrument one such library to extract the usage-information (including files needed, and respective access rights) for programs that use it. The form this extraction should take is not yet fixed. A sufficiently flexible format should be chosen, with some intuition about which use-cases can not be represented with it. Of course, access rights not represented in the arguments should be included.

The second part is using this information (or some dummy information, if this part is done separately) to do something with it. There are multiple options here, and the student is free to choose one that they prefer. One possibility is to adopt a shell to grant only the necessary rights to processes it spawns. One could also parse the Lua config files used to start processes in L4Re to validate that the necessary caps are provided. Or build a graphical tool to create configurations that can be used for starting processes.

The two parts can be handed out separately. While it is clearly preferable for the same person to design creation and usage of the information, addressing only one is possible. Details can be discussed with the supervisors.

Supervisor: Martin Küttler, Adam Lackorzynski

User-level components

Asynchronous API for M³

M³ [1] is a new hardware/operating system co-design that targets very heterogeneous platforms, containing mixed ISAs and accelerators. M³ uses a microkernel design that places components onto different cores and uses a dedicated hardware component for message-based communication between these components.

Currently, most APIs in M³ are synchronous. That is, after component A sent a message to component B, component A idles until the reply from component B arrives. Since M³ prefers to place components onto different cores (having both performance and security advantages), this potentially wastes time, because component A's core cannot be used for other work during that time.

The goal of this thesis is to introduce asynchronous APIs to use the idle times for other work. The API could be similar to node.js, which expects a callback for every asynchronous operation and calls the callback on completion of that operation. The implementation can be done for either C++ or Rust. The thesis should evaluate the benefits of the asynchronous API for different use cases.

[1] Nils Asmussen, Marcus Völp, Benedikt Nöthen, Hermann Härtig, Gerhard Fettweis: M³ – A Hardware/Operating-System Co-Design to Tame Heterogeneous Manycores. ASPLOS 2016, http://os.inf.tu-dresden.de/papers_ps/asmussen-m3-asplos16.pdf

Supervisor: Nils Asmussen

Rewriting M³FS in Rust

M³FS is a file system for M³ [1] that has been designed with the demands of accelerators in mind. To this end, M³FS stores the file's data as extents and the message-passing-based interface allows clients to get direct access to these extents. M³FS can be used with both a disk- and memory backend and uses a buffer cache to speedup the access to frequently used blocks.

Rust is a new language for systems programming that is memory-safe and data-race free and provides at the same time C-like performance. M³ is already partially written in Rust due to these benefits, but M³FS as an important system service is still written in C++.

The goal of this thesis is to rewrite M³FS in Rust and evaluate the performance impact. The challenge is in particular to find a concept for objects living in the buffer cache such as inodes or directory entries. These objects are only valid until they are evicted from the buffer cache. The question is whether and how Rusts strong memory-safety guarantees can be provided in this case.

[1] Nils Asmussen, Marcus Völp, Benedikt Nöthen, Hermann Härtig, Gerhard Fettweis: M³ – A Hardware/Operating-System Co-Design to Tame Heterogeneous Manycores. ASPLOS 2016, http://os.inf.tu-dresden.de/papers_ps/asmussen-m3-asplos16.pdf

Supervisor: Nils Asmussen

The Price of Reliability

Protocols like TCP provide reliable, connection-oriented data transfer over unreliable network connections. This feature is highly convenient for users, who do not need to take care of error checking, potential retransmissions and duplicate message handling. However, there is a price to be paid in the form of additional overhead. As not all applications can tolerate these overheads or do not require reliable connection in the first place, unreliable protocols such as UDP exist. In High Performance Computing (HPC), high-bandwidth, low-latency networks such as Infiniband are prevalent. These networks also offer different modes of operation and reliable connections incur additional overhead here as well.

The goal of this task is to add support for unreliable Infiniband transfers to UCX [1] and evaluate the gains (latency and bandwidth) and drawbacks (e.g. rate of lost messages) in a real system. Depending on the type of thesis (Bachelor/Beleg or Master/Diplom) this task can be expanded to include implementing/adapting existing fault-tolerant algorithms, like Corrected Gossip [2], to profit from the low overhead of unreliable data transfers and the robustness of reliable connections.

[1] http://www.openucx.org/

[2] https://htor.inf.ethz.ch/publications/img/corrected_gossip.pdf

Supervisor: Jan Bierbaum


Multiplexer support for a RISC-V-based accelerator

M³ [1] is a new hardware/operating system co-design that targets heterogeneous platforms and provides strong isolation between hardware tiles and software components. On the hardware side, M³ introduces a new hardware component into each tile called trusted communication unit (TCU) to isolate tiles from each other, while supporting cross-tile communication channels. On the software side, M³ uses a microkernel design that places components on different tiles by default.

To share a core among multiple applications, M³ uses a core-local multiplexer, called PEMux, that builds upon lightweight multiplexing features in the TCU. However, PEMux is built for general-purpose cores with virtual-memory support. This thesis should investigate how a multiplexer can be built for a RISC-V-based accelerator with a local scratchpad memory and without virtual-memory support to allow multiple applications to use such an accelerator simultaneously.

[1] Nils Asmussen, Marcus Völp, Benedikt Nöthen, Hermann Härtig, Gerhard Fettweis: M³ – A Hardware/Operating-System Co-Design to Tame Heterogeneous Manycores. ASPLOS 2016

Supervisor: Nils Asmussen


Does I/O scheduling have a place in Cloud and HPC centres in the age of fast NVMe storage?

The recently published K2 I/O scheduler [1] was developed at our group to reduce access latency on NVMe storage devices for critical workloads. While the original work was published in a real-time context, we believe the benefits of K2 are also applicable to other areas of computing and processing where a critical but relatively low-bandwidth application has to contend with a non-critical, high-bandwidth application for data rate and access latency of local storage.

We imagine Burst Buffer access in HPC systems or workload co-location in Cloud settings to be such contexts. One use of Burst Buffers is the local caching of an application checkpoint before it is written to a parallel file system, while the application continues to run. Co-location of I/O-bound and CPU-intensive workloads on the same machine can be used to boost utilisation and hence increase cost-effectiveness.

Depending on the type of thesis (Bachelor, Master, Diploma) one or more scenarios should be selected, typical workloads chosen, implemented, and evaluated.

[1] Till Miemietz, Hannes Weisbach, Michael Roitzsch, Hermann Härtig; K2: Work-Constraining Scheduling of NVMe-Attached Storage. 40th IEEE Real-Time Systems Symposium (RTSS 2019), December 2019

Supervisor: Michael Roitzsch, Hannes Weisbach

Stand: 8.6.2020, 15:45 Uhr
Autor: Webmaster